Python爬取张国荣最火的8首歌，60000评论看完泪奔！

昨天是 4 月 1 日。

每年的这一天，

一部分人搜索枯肠想尽办法整蛊，

一部分人感怀四月该很好，倘若你还在。

甚至有人，用 AI 的方式来还原你。

但那终究不是你啊。

图片来源见水印

你走了 16 年了，那些当年在课桌前趁老师不注意偷偷听你歌的少年们，或许早已为人夫为人妇。

即便如此，每年却有很多人在这人间四月天里，借着你给世界留下的不绝回音，去思念你、去给你留言，即便明明知道，永远也不会收到回复。

眼下，我们选择以科技的方式，来纪念哥哥。

我们爬取了你在网易云音乐上，评论最多的八首歌曲。

它们依次是：《沉默是金》《春夏秋冬》《倩女幽魂》《当爱已成往事》《我》《风继续吹》《玻璃之情》《风再起时》。

总共 64540 条的评论中，出现最多的是“生日快乐”、“哥哥”、“加油”、“你若尚在场”、“新年快乐”和“哥哥，生日快乐”。

词云图里很少有“4 月 1 日”、“愚人节”的字眼，这并不是这一天去给你评论的人少，而是在这个日子，实在不适合对你说“快乐”。

来，先给大家看看评论的代码。

# coding:utf-8
import json
import time
import requests
from fake_useragent import UserAgent
import random
import multiprocessing
import sys
#reload(sys)
#sys.setdefaultencoding('utf-8')

ua = UserAgent(verify_ssl=False)

song_list = [{'186453':'春夏秋冬'},{'188204':'沉默是金'},{'188175':'倩女幽魂'},{'188489':'风继续吹'},{'187374':'我'},{'186760':'风雨起时'}]
headers = {
    'Origin':'https://music.163测试数据',
    'Referer': 'https://music.163测试数据/song?id=26620756',
    'Host': 'music.163测试数据',
    'User-Agent': ua.random
}

def get_comments(page,ite):
    # 获取评论信息
    # """
    for key, values in ite.items():
        song_id = key
        song_name = values
    ip_list = [IP列表]
    url = 'http://music.163测试数据/api/v1/resource/comments/R_SO_4_'+ song_id +'?limit=20&offset=' + str(page)
    proxies = get_random_ip(ip_list)
    try:
        response = requests.get(url=url, headers=headers,proxies=proxies)
    except Exception as e:
        print (page)
        print (ite)
        return 0
    result = json.loads(response.text)
    items = result['comments']
    for item in items:
        # 用户名
        user_name = item['user']['nickname'].replace(',', '，')
        # 用户ID
        user_id = str(item['user']['userId'])
        print(user_id)
        # 评论内容
        comment = item['content'].strip().replace('\n', '').replace(',', '，')
        # 评论ID
        comment_id = str(item['commentId'])
        # 评论点赞数
        praise = str(item['likedCount'])
        # 评论时间
        date = time.localtime(int(str(item['time'])[:10]))
        date = time.strftime("%Y-%m-%d %H:%M:%S", date)

八首歌的歌词代码：

import requests
from bs4 import BeautifulSoup
import re
import json
import time
import random
import os

headers = {
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/66.0.3355.4 Safari/537.36',
    'Referer': 'http://music.163测试数据',
    'Host': 'music.163测试数据'
}


# 获取页面源码
def GetHtml(url):
    try:
        res = requests.get(url=url, headers=headers)
    except:
        return None
    return res.text


# 提取歌手歌词信息
def GetSongsInfo(url):
    print('[INFO]:Getting Songs Info...')
    html = GetHtml(url)
    soup = BeautifulSoup(html, 'lxml')
    links = soup.find('ul', class_='f-hide').find_all('a')
    if len(links) < 1:
        print('[Warning]:_GetSongsInfo <links> not find...')
    Info = {'ID': [], 'NAME': []}
    for link in links:
        SongID = link.get('href').split('=')[-1]
        SongName = link.get_text()
        Info['ID'].append(SongID)
        Info['NAME'].append(SongName)
    # print(Info)
    return Info


def GetLyrics(SongID):
    print('[INFO]:Getting %s lyric...' % SongID)
    ApiUrl = 'http://music.163测试数据/api/song/lyric?id={}&lv=1&kv=1&tv=-1'.format(SongID)
    html = GetHtml(ApiUrl)
    html_json = json.loads(html)
    temp = html_json['lrc']['lyric']
    rule = re测试数据pile(r'\[.*\]')
    lyric = re.sub(rule, '', temp).strip()
    print(lyric)
    return lyric


def main():
    SingerId = input('Enter the Singer ID:')
    url = 'http://music.163测试数据/artist?id={}'.format(SingerId)
    # url = "http://music.163测试数据/artist?id=6457"
    Info = GetSongsInfo(url)
    IDs = Info['ID']
    i = 0
    for ID in IDs:
        lyric = GetLyrics(ID)
        SaveLyrics(Info['NAME'][i], lyric)
        i += 1
        time.sleep(random.random() * 3)
        # print('[INFO]:All Done...')


def SaveLyrics(SongName, lyric):
    print('[INFO]: Start to Save {}...'.format(SongName))
    if not os.path.isdir('./results'):
        os.makedirs('./results')
    with open('./results/{}.txt'.format(SongName), 'w', encoding='utf-8') as f:
        f.write(lyric)

不知道现在的你，还唱歌吗？还演戏吗？

你知不知道，很多人都在想你。

谢谢你，留给我们这么多歌曲和电影。

愿你在另外一个世界，永远没有忧郁和伤心。

以上。

本篇文章到这里就已经全部结束了，更多其他精彩内容大家可以关注PHP中文网的Python视频教程栏目！！！

以上就是Python 爬取张国荣最火的 8 首歌，60000 评论看完泪奔！的详细内容，更多请关注Gxl网其它相关文章！

声明：本文来自网络，不代表【好得很程序员自学网】立场，转载请注明出处：http://www.haodehen.cn/did81096

更新时间：2022-10-19 阅读：55次