I'm looking for a solution to a probably quite simple problem and really would appreciate some help or a hint. I have basic knowledge of python and webscraping.
I want to explore a certain hashtag and the community behind it on twitter. Using twint I've downloaded all tweets mentioning the hashtag into a .csv file. After that I cleaned up the .csv so that there aren't multiple entries of the same user (from multiple tweets with the same hashtag) and saved it as a .txt. I now would like to get some more information about those approximately 1.500 users in said list – mainly the date they joined twitter, number of tweets would be a bonus.
What I've tried: Twint should be able to do this, but it didn't work (I'm using the docker image provided on their github). I tried to get the user info with:
twint --userlist /bin/userlist.txt --user-full -o userlistfull.csv --csv
Twint puts out a long error message, which if I understand it correctly is related to an open bug in twint:
CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/usr/local/lib/python3.6/site-packages/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/usr/local/lib/python3.6/site-packages/twint/output.py", line 177, in Users
user = User(u)
File "/usr/local/lib/python3.6/site-packages/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
Traceback (most recent call last):
File "/usr/local/bin/twint", line 8, in <module>
sys.exit(run_as_command())
File "/usr/local/lib/python3.6/site-packages/twint/cli.py", line 339, in run_as_command
main()
File "/usr/local/lib/python3.6/site-packages/twint/cli.py", line 324, in main
run.Lookup(c)
File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 386, in Lookup
run(config)
File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 329, in run
get_event_loop().run_until_complete(Twint(config).main(callback))
File "/usr/local/lib/python3.6/asyncio/base_events.py", line 488, in run_until_complete
return future.result()
File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 235, in main
await task
File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 270, in run
await self.Lookup()
File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup
await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
File "/usr/local/lib/python3.6/site-packages/twint/get.py", line 228, in User
await Users(j_r, config, conn)
File "/usr/local/lib/python3.6/site-packages/twint/output.py", line 177, in Users
user = User(u)
File "/usr/local/lib/python3.6/site-packages/twint/user.py", line 31, in User
_usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
I've tried to loop over the list and let twint lookup each username individually but it doesnt work either:
import twint
import os
import sys
import nest_asyncio
nest_asyncio.apply()
c = twint.Config()
with open("userlist.txt", "r") as a_file:
for line in a_file:
stripped_line = line.strip()
stripped_line = c.Username
twint.run.Search(c)
Running it with Google Colab I gives me
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 1.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 8.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 27.0 secs
CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
sleeping for 64.0 secs
What I'm looking for What is the easiest solution to get the join dates of those users in the list? Should I use a different library? Could I loop over the list with something like beautifulsoup and scrape the join dates? How would I do this?
Help would be very much aprreciated, thanks in advance!