Skip to content

Mxiaoyu/BackupSohu

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 

Repository files navigation

BackupSohu

backup the page of "http://m.sohu.com" once 60s by python
实现方式为urllib2+beautifulsoup Backup类:实现页面的访问和解析,实现了两个方法,download_page,访问页面,取得页面源码;para_page,解析页面源码,将src和href属性的图片,js,css下载并更改其显示路径

使用方法:

python main.py -d 60 -u http://m.sohu.com -o /tmp/backup

待完善

  • 广告图片由于隐藏未能找到合适的解决方法
  • 最下面广告由于采iframe框架,也为能够爬取

About

backup the page of "http://m.sohu.com" once 60s by python

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages