人生苦短,我学python!
最近准备看看机会,看了好多的jd上,都要求会一点python,shell脚本,就在空闲的时间里面学习了一下,刚刚入门,还是一个小菜鸡,不过能写一两个小爬虫了,嘿嘿嘿
在这里给大家推荐一下我自学的网站,讲的很简单,https://www.liaoxuefeng.com/wiki/1016959663602400,那就是廖雪峰大佬的博客,好东西就是分享.我的第一语言是java,学了这点python之后,我是真觉的 人生苦短,我用python! 说的是真对.
程序员大多都是很懒,python 会让你变得更懒,好多东西都已经封装好了,因一个包就能直接用,so easy!这篇文章先来分享一个我自己写的一个爬取图片的小程序,写的很烂,命名方面和java差很多,高抬贵嘴,莫喷
复制代码
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78# -*- coding: UTF-8 -*- import requests, os, time, random from bs4 import BeautifulSoup from urllib.request import urlretrieve """ 爬取图片网站的demo http://www.shuaia.net/ """ headers = { "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36" } params = {"tagname": "美女"} def get_pageurl(j, target_urls): url = r"http://www.shuaia.net/e/tags/index.php?page=%d&line=25&tempid=3" % (j) response = requests.get(url=url, headers=headers, params=params) if response.status_code != 200: return None print(response.url) response.encoding = 'utf-8' soup = BeautifulSoup(response.text, 'lxml') find_all = soup.find_all(class_='item-img') for item in find_all: target_urls.append(item.img.get('alt') + '=' + item.get('href')) return target_urls if __name__ == '__main__': while True: j = 0 target_urls = [] target_urls = get_pageurl(j, target_urls) if None == target_urls: continue print(target_urls) j = j + 1 for item in target_urls: detail = item.split("=") fileName = detail[0] print(fileName) file_name = fileName + ".jpg" if fileName not in os.listdir(): os.makedirs(fileName) fileUrl = detail[1] print("下载 -》》》》" + fileName) response_img = requests.get(fileUrl) response_img.encoding = 'utf-8' html = response_img.text img_html = BeautifulSoup(html, 'lxml') html_find = img_html.find_all('div', class_='wr-single-content-list') img_bf_2 = BeautifulSoup(str(html_find), 'lxml') img_url = 'http://www.shuaia.net' + img_bf_2.div.img.get('src') urlretrieve(url=img_url, filename=fileName + '/' + file_name) print(img_url) url_end = '' time.sleep(random.randint(0, 5)) fileUrl = fileUrl[0:len(fileUrl) - 5] i = 1 while True: url_end = '_' + str(i + 1) + '.html' crl_file_url = fileUrl + url_end crl_response_img = requests.get(crl_file_url) if crl_response_img.status_code != 200: break crl_response_img.encoding = 'utf-8' crl_html = crl_response_img.text crl_img_html = BeautifulSoup(crl_html, 'lxml') crl_html_find_1 = crl_img_html.find_all('div', class_='wr-single-content-list') crl_img_bf_2_1 = BeautifulSoup(str(crl_html_find_1), 'lxml') crl_img_url = 'http://www.shuaia.net' + crl_img_bf_2_1.div.img.get('src') urlretrieve(url=crl_img_url, filename=fileName + '/' + fileName + str(i + 1) + ".jpg") i = i + 1 time.sleep(random.randint(0, 5))
最后
以上就是文艺翅膀最近收集整理的关于人生苦短,我用Python-----爬取图片的全部内容,更多相关人生苦短内容请搜索靠谱客的其他文章。
本图文内容来源于网友提供,作为学习参考使用,或来自网络收集整理,版权属于原作者所有。
发表评论 取消回复