Twint 是一个用 Python 写的 Twitter 抓取工具,允许从 Twitter 配置文件中抓取推文,不使用 Twitter 的 API。
Twint 利用 Twitter 的搜索语法让您从特定用户那里搜索推文,特定主题,主题标签和相关的推文,或者从推文中挑选敏感信息,如电子邮件和电话号码。
Twint 还对 Twitter 进行了特殊查询,允许您搜索 Twitter 用户的关注者,用户喜欢的推文,以及他们在 API,Selenium 或模拟浏览器的情况下关注的用户。
使用 Twint 和 Twitter API 的一些好处: 可以获取几乎 所有的 推文(Twitter API 限制只能持续 3200 个推文); 快速初始设置 ; 可以匿名使用,无需 Twitter 注册 ; 没有速率限制 Twitter 的限制Twitter 会限制用户可以浏览的时间线。这意味着通过 .Profile 或者 .Favorites 你只可以看到~3200 条推文。
更多的就看github项目twint吧。
Installation:
git+pip3:
git clone https://github.com/twintproject/twint.git pip3 install -r requirements.txt pip3 install twint
or pip3+pipenv:
pip3 install --user --upgrade -e git+https://github.com/twintproject/[email protected]/master#egg=twint pipenv install -e git+https://github.com/twintproject/twint.git#egg=twint
You may meet module cannot found error when you try to run twint after installation. On ubuntu, add ~/.local/bin into your PATH by:
export PATH=$PATH:~/.local/bin
You may edit ~/.bashrc file to permanately add the ‘~/.local/bin‘ into your PATH.
Usage:
Running the twint cmd with arguments can give you results. A few simple examples to help you understand the basics:
twint -u username - Scrape all the Tweets from user ‘s timeline. twint -u username -s pineapple - Scrape all Tweets from the user ‘s timeline containing pineapple . twint -s pineapple - Collect every Tweet containing pineapple from everyone‘s Tweets. twint -u username --year 2014 - Collect Tweets that were tweeted before 2014. twint -u username --since "2015-12-20 20:30:15" - Collect Tweets that were tweeted since 2015-12-20 20:30:15. twint -u username --since 2015-12-20 - Collect Tweets that were tweeted since 2015-12-20 0. twint -u username -o file.txt - Scrape Tweets and save to file.txt. twint -u username -o file.csv --csv - Scrape Tweets and save as a csv file. twint -u username --email --phone - Show Tweets that might have phone numbers or email addresses. twint -s "Donald Trump" --verified - Display Tweets by verified users that Tweeted about Donald Trump. twint -g="48.880048,2.385939,1km" -o file.csv --csv - Scrape Tweets from a radius of 1km around a place in Paris and export them to a csv file. twint -u username -es localhost:9200 - Output Tweets to Elasticsearch twint -u username -o file.json --json - Scrape Tweets and save as a json file. twint -u username --database tweets.db - Save Tweets to a SQLite database. twint -u username --followers - Scrape a Twitter user‘s followers. twint -u username --following - Scrape who a Twitter user follows. twint -u username --favorites - Collect all the Tweets a user has favorited (gathers ~3200 tweet). twint -u username --following --user-full - Collect full user information a person follows twint -u username --profile-full - Use a slow, but effective method to gather Tweets from a user‘s profile (Gathers ~3200 Tweets, Including Retweets). twint -u username --retweets - Use a quick method to gather the last 900 Tweets (that includes retweets) from a user‘s profile. twint -u username --resume resume_file.txt - Resume a search starting from the last saved scroll-id.More detail about the commands and options are located in the wiki
{{uploading-image-878565.png(uploading...)}}