听说你们写毕业设计没有动态数据？Python教你一步完成!

前言

???????? ?最近又到了写毕业论文的季节了，有好多粉丝朋友私信我说老哥能不能帮我爬点数据让我来写论文，这时正好有位小女生正在打算买只小喵咪，于是老哥在全网搜索于是发现了下面的网站只好动动自己的小手，来完成这个艰巨的任务了，有喜欢爬虫的同学，或有需要爬取数据的同学可以私聊老哥。

页面分析

???????? ?我们通过访问一下地址：http://HdhCmsTestmaomijiaoyi测试数据/index.php?/chanpinliebiao_pinzhong_38.html ???????? 这时我们可以看到一些喵咪的列表，但是通过F12观看实际是返回的一个页面，而不是我们常用的Json，此时我们还需要将返回的页面打开才能获取到具体喵咪的详细信息，例如：价格、电话、年龄、品种、浏览次数等等。

这时我们需要做的

解析返回的列表将地区数据解析出来请求喵咪的具体信息解析返回的页面将数据保存csv文件

CSV 文件

???????? ?启动程序将会保存一下内容：

代码实现

1、导入依赖环境

import?requests?#?返送请求?pip?install?requests?
import?parsel?#?html页面解析器?pip?install??parsel??
import??csv?#?文本保存

2、获取喵咪的列表

url?=?"http://HdhCmsTestmaomijiaoyi测试数据/index.php?/chanpinliebiao_pinzhong_37_"+str(i)+"--24.html"
????headers?=?{
????????'User-Agent':?'Mozilla/5.0?(Windows?NT?10.0;?Win64;?x64)?AppleWebKit/537.36?(KHTML,?like?Gecko)?Chrome/90.0.4430.72?Safari/537.36'
????}
????data?=?requests.get(url=url,?headers=headers).text
????selector?=?parsel.Selector(data)
????urls?=?selector.css('div?.content:nth-child(1)?a::attr(href)').getall()

3、根据去获取喵咪的具体数据

?for?s?in?regionAndURL:
????????url?=?"http://HdhCmsTestmaomijiaoyi测试数据"?+?s[0]
????????address?=?s[1]
????????data?=?requests.get(url=url,?headers=headers).text
????????selector?=?parsel.Selector(data)
????????title?=?selector.css('.detail_text?.title::text').get().strip()??##?标签
????????price?=?selector.css('.info1?span:nth-child(2)::text').get().strip()??##?价格
????????viewsNum?=?selector.css('.info1?span:nth-child(4)::text??').get()??##?浏览次数
????????commitment?=?selector.css('.info1?div:nth-child(2)?span::text??').get().replace("卖家承诺:?",?"")??#?卖家承诺
????????onlineOnly?=?selector.css('.info2?div:nth-child(1)?.red::text??').get()??#?在售只数
????????variety?=?selector.css('.info2?div:nth-child(3)?.red::text??').get()??#?品种
????????prevention?=?selector.css('.info2?div:nth-child(4)?.red::text??').get()??#?预防
????????contactPerson?=?selector.css('.user_info?div:nth-child(1)?.c333::text??').get()??#?联系人姓名
????????phone?=?selector.css('.user_info?div:nth-child(2)?.c333::text??').get()??##?电话
????????shipping?=?selector.css('.user_info?div:nth-child(3)?.c333::text??').get().strip()??#?运费
????????purebred?=?selector.css('.item_neirong?div:nth-child(1)?.c333::text').get().strip()??#?是否纯种
????????quantityForSale?=?selector.css('.item_neirong?div:nth-child(3)?.c333::text').get().strip()??#?待售数量
????????catSex?=?selector.css('.item_neirong?div:nth-child(4)?.c333::text').get().strip()??#?猫咪性别
????????catAge?=?selector.css('div.xinxi_neirong?.item:nth-child(2)??div:nth-child(2)?.c333::text').get().strip()??#?猫咪年龄
????????dewormingSituation?=?selector.css(
????????????'div.xinxi_neirong?.item:nth-child(2)??div:nth-child(3)?.c333::text').get().strip()??#?驱虫情况
????????canWatchCatsInVideo?=?selector.css(
????????????'div.xinxi_neirong?.item:nth-child(2)??div:nth-child(4)?.c333::text').get().strip()??#?可视频看猫咪

4、将数据保存为csv文件

f?=?open('喵咪.csv',?mode='a',?encoding='utf-8',?newline='')

csvHeader?=?csv.DictWriter(f,
???????????????????????????fieldnames=['地区',?'标签',?'价格',?'浏览次数',?'卖家承诺',?'在售只数',?'地区',?'品种',?'预防',?'联系人姓名',?'电话',
???????????????????????????????????????'运费',?'是否纯种',?'待售数量',?'猫咪性别',?'猫咪年龄',?'驱虫情况',?'可视频看猫咪',?'详情地址'])
#设置头
csvHeader.writeheader()

??dis?=?{
????????????'地区':?address,
????????????'标签':?title,
????????????'价格':?price,
????????????'浏览次数':?viewsNum,
????????????'卖家承诺':?commitment,
????????????'在售只数':?onlineOnly,
????????????'品种':?variety,
????????????'预防':?prevention,
????????????'联系人姓名':?contactPerson,
????????????'电话':?phone,
????????????'运费':?shipping,
????????????'是否纯种':?purebred,
????????????'待售数量':?quantityForSale,
????????????'猫咪性别':?catSex,
????????????'猫咪年龄':?catAge,
????????????'驱虫情况':?dewormingSituation,
????????????'可视频看猫咪':?canWatchCatsInVideo,
????????????'详情地址':?url
????????}
????????csvHeader.writerow(dis)

彩蛋

???????? ?本篇文章到这就结束了，喜欢爬虫的小伙伴可以私信我或者搜索公众号【大数据老哥】回复：【喵咪】获取源码，有需要老哥帮你爬取数据的朋友也可以关注我，一起跟老哥加油。

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://www.haodehen.cn/did127239

更新时间：2022-11-28 阅读：45次