内容采集,从各个cp方采集书籍到植宇内容中台

tgz 899e527b3c README.md 2 年之前
content_spider cdc8eb06a5 liuyue 2 年之前
.gitignore ab0342a55b spider init 2 年之前
README.md 899e527b3c README.md 2 年之前
scrapy.cfg ab0342a55b spider init 2 年之前

README.md

采集脚本

cd /home/www/wangdu_spider

7lou采集

  • 全部采集,有去重 scrapy crawl 7lou
  • 部分采集,不去重 scrapy crawl zbone -a bid=xx,xx,xxx

趣阅采集

  • 全部采集,有去重 scrapy crawl shuangduxs
  • 部分采集,不去重 scrapy crawl sdone -a bid=xxx,xxx,xxx

百川采集zy_baichuan:

文件目录: content_spider/spiders
采集命令: scrapy crawl baichuanzw
更新命令: scrapy crawl baichuanzwupdate
更新完结状态:  scrapy crawl baichuanzwbookstatusinfo
覆盖命令:  scrapy crawl baichuanzwfix -a bid=bid1,bid2

wangyou忘忧:

文件目录: content_spider/spiders/wangyou 采集命令: scrapy crawl wangyou 更新命令: scrapy crawl wangyouupdate 更新完结状态: scrapy crawl wangyoubookinfo 覆盖命令: scrapy crawl wangyoufix -a bid=bid1,bid2

feiyuyuedu飞鱼阅读:

文件目录: content_spider/spiders/feiyuyuedu 采集命令: scrapy crawl feiyuyuedu 更新命令: scrapy crawl feiyuyueduupdate 更新完结状态: scrapy crawl feiyuyuedubookinfo 覆盖命令: scrapy crawl feiyuyuedufix -a bid=bid1,bid2

liuyue六月:

文件目录: content_spider/spiders/liuyue 采集命令: scrapy crawl liuyue 更新命令: scrapy crawl liuyueupdate 更新完结状态: scrapy crawl liuyuebookinfo 覆盖命令: scrapy crawl liuyuefix -a bid=bid1,bid2

judian据点:

文件目录: content_spider/spiders/judian 采集命令: scrapy crawl judian 更新命令: scrapy crawl judianupdate 更新完结状态: scrapy crawl judianbookinfo 覆盖命令: scrapy crawl judianfix -a bid=bid1,bid2

futian伏天:

文件目录: content_spider/spiders/futian 采集命令: scrapy crawl futian 更新命令: scrapy crawl futianupdate 更新完结状态: scrapy crawl futianbookinfo 覆盖命令: scrapy crawl futianfix -a bid=bid1,bid2

haoyue豪阅:

文件目录: content_spider/spiders/haoyue 采集命令: scrapy crawl haoyue 更新命令: scrapy crawl haoyueupdate