Skip to content

Commit 3e33985

Browse files
committed
🎉 对于知乎spider的更新
1 parent a069253 commit 3e33985

File tree

32 files changed

+27
-3
lines changed

32 files changed

+27
-3
lines changed

README.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,16 @@ some python spiders with BeautifulSoup & requests
1515
- 获取单个电影的短评 [comment.py](bs4/douban/douban_comment.py)
1616
- 获取单个电影的所有大图海报 [photosR.py](bs4/douban/douban_photosR.py)
1717

18+
### [知乎](bs4/zhihu)
19+
- 获取单个问题下面的回答所有图片,并存在以问题的名字命名的文件夹中 [quesImg.py](bs4/zhihu/zhihu_quesImg.py) 例子:[问题:你收藏的最美的萝莉图片是怎样的](bs4/zhihu/你收藏的最美的萝莉图片是怎样的?)
20+
- 获取知乎用户的个人信息 [userInfo.py](bs4/zhihu/zhihu_userInfo.py)
21+
22+
### [股票交易数据](bs4/stock)
23+
- 获取该支股票的指定年度指定季度的数据并写入csv [stock2csv.py](bs4/stock/stock2csv.py)
24+
- 获取该支股票的所有交易数据并写入txt [stock2txt.py](bs4/stock/stock2txt.py)
25+
- 获取该支股票的所有交易数据并写入以其股票代码命名的csv中 [stock2csvALL.py](bs4/stock/stock2csvALL.py)
26+
- 获取该支股票的所有交易数据并写入以其股票代码和名字命名的csv中 [stock2csvALLWithName.py](bs4/stock/stock2csvALLWithName.py)
27+
1828
##scarpy
1929

2030
### [wiki](scarpy/wikiSpider)

by-bs4/stock/README.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,10 @@
1010

1111
爬取的结束年份:2016
1212

13+
例如:中石油全部的交易数据,截至到2016-11-16 (data/601857.csv)
14+
1315
不用担心某一年份的某一季度数据为空的情况,不会报错,不会存入空数据,会直接跳过。
16+
1417
考虑到之后会用pandas或者别的库处理csv文件,文件名含有中文可能比较麻烦。所以爬取所有数据的爬虫分为两个:只有股票代码 和 股票代码+中文名字。
1518

1619
- 获取该支股票的所有交易数据并写入以其股票代码命名的csv中 [stock2csvALL.py](stock2csvALL.py)

by-bs4/zhihu/4zhihuImgs.py renamed to by-bs4/zhihu/zhihu_quesImg.py

Lines changed: 14 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,15 @@
1+
# coding:utf-8
12
import requests
23
import urllib
34
from bs4 import BeautifulSoup
45
import time
6+
import os
57

6-
url = 'https://www.zhihu.com/question/47371654'
8+
9+
quesNumStr = str(input("请输入问题数字:"))
10+
11+
12+
url = 'https://www.zhihu.com/question/'+quesNumStr
713

814
headers = {
915
'User-Agent':'', # your user-Agent here
@@ -13,17 +19,22 @@
1319
data = requests.get(url, headers=headers)
1420
soup = BeautifulSoup(data.text, 'lxml')
1521
imgs = soup.select('div.zm-editable-content > img')
22+
title = soup.select('#zh-question-title > h2 > span')[0].get_text()
1623

1724
img_link = []
18-
folder_path = 'E://python/4weeks/jiandan/zhihuimg/'
25+
folder_path = './'+title+'/'
26+
if os.path.exists(folder_path) == False:
27+
# create the image folder
28+
os.mkdir(folder_path)
1929

2030
for i in imgs:
2131
img_link.append(i.get('data-actualsrc'))
2232
# print i.get('data-actualsrc')
2333

2434
# if folder_path == False:
2535
# open(folder_path,'wr')
26-
print len(img_link)
36+
print title
37+
print str(len(img_link))+'张图片'
2738

2839
for index,item in enumerate(img_link):
2940
urllib.urlretrieve(item, folder_path + str(index)+'.jpg')
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)