内容采集,从各个cp方采集书籍到植宇内容中台

zhaoyang 8cbb325af4 南京落尘 пре 1 година
content_spider 8cbb325af4 南京落尘 пре 1 година
.gitignore ab0342a55b spider init пре 2 година
README.md 59436d4af0 南京落尘 пре 1 година
scrapy.cfg ab0342a55b spider init пре 2 година

README.md

采集脚本

cd /project/www/zhiyu_content_spider

看书网zy_kanshu:

文件目录: content_spider/spiders

采集命令: scrapy crawl kanshuzw
更新命令: scrapy crawl kanshuzwupdate
覆盖命令:  scrapy crawl kanshuzwfix -a bid=bid1,bid2

百川采集zy_baichuan:

文件目录: content_spider/spiders

采集命令: scrapy crawl baichuanzw
更新命令: scrapy crawl baichuanzwupdate
更新完结状态:  scrapy crawl baichuanzwbookstatusinfo
覆盖命令:  scrapy crawl baichuanzwfix -a bid=bid1,bid2

wangyou忘忧:

文件目录: content_spider/spiders/wangyou

采集命令: scrapy crawl wangyou
更新命令: scrapy crawl wangyouupdate
更新完结状态:  scrapy crawl wangyoubookinfo
覆盖命令:  scrapy crawl wangyoufix -a bid=bid1,bid2

feiyuyuedu飞鱼阅读:

文件目录: content_spider/spiders/feiyuyuedu

采集命令: scrapy crawl feiyuyuedu
更新命令: scrapy crawl feiyuyueduupdate
更新完结状态:    scrapy crawl feiyuyuedubookinfo
覆盖命令:  scrapy crawl feiyuyuedufix -a bid=bid1,bid2

liuyue六月:

文件目录: content_spider/spiders/liuyue

采集命令: scrapy crawl liuyue
更新命令: scrapy crawl liuyueupdate
更新完结状态: scrapy crawl liuyuebookinfo
覆盖命令:  scrapy crawl liuyuefix -a bid=bid1,bid2

judian据点:

文件目录: content_spider/spiders/judian

采集命令: scrapy crawl judian
更新命令: scrapy crawl judianupdate
更新完结状态: scrapy crawl judianbookinfo
覆盖命令:  scrapy crawl judianfix -a bid=bid1,bid2

futian伏天:

文件目录: content_spider/spiders/futian

采集命令: scrapy crawl futian
更新命令: scrapy crawl futianupdate
更新完结状态: scrapy crawl futianbookinfo
覆盖命令:  scrapy crawl futianfix -a bid=bid1,bid2

haoyue豪阅:

文件目录: content_spider/spiders/haoyue
采集命令: scrapy crawl haoyue
更新命令: scrapy crawl haoyueupdate

aiyouhuyu哎呦互娱:

文件目录: content_spider/spiders/aiyouhuyu

采集命令: scrapy crawl aiyouhuyu
更新命令: scrapy crawl aiyouhuyuupdate
更新完结状态: crapy crawl aiyouhuyubookinfo
覆盖命令: scrapy crawl aiyouhuyufix -a bid=bid1,bid2

yuyuedu娱阅读:

文件目录: content_spider/spiders/yuyuedu

采集命令: scrapy crawl yuyuedu
更新命令: scrapy crawl yuyueduupdate
更新完结状态: scrapy crawl yuyuedubookinfo
覆盖命令: scrapy crawl yuyuedufix -a bid=bid1,bid2

banquanmao娱阅读:

文件目录: content_spider/spiders/banquanmao

采集命令: scrapy crawl banquanmao
更新命令: scrapy crawl banquanmaoupdate
更新完结状态: scrapy crawl banquanmaobookinfo
覆盖命令: scrapy crawl banquanmaofix -a bid=bid1,bid2

wan万读:

文件目录: content_spider/spiders/wandu

采集命令: scrapy crawl wandu
更新命令: scrapy crawl wanduupdate
更新完结状态: scrapy crawl wandubookinfo
覆盖命令: scrapy crawl wandufix -a bid=bid1,bid2

xiwen溪文:

文件目录: content_spider/spiders/xiwen

采集命令: scrapy crawl xiwen
更新命令: scrapy crawl xiwenupdate
更新完结状态: scrapy crawl xiwenbookinfo
覆盖命令: scrapy crawl xiwenfix -a bid=bid1,bid2

南京落尘:

文件目录: content_spider/spiders/nanjingluochen

采集命令: scrapy crawl nanjingluochen
更新命令: scrapy crawl nanjingluochenupdate
更新完结状态: scrapy crawl nanjingluochenbookinfo
覆盖命令: scrapy crawl nanjingluochenfix -a bid=bid1,bid2