Scrape join-dates/user info from a list (csv) of Twitter-users

Question

I'm looking for a solution to a probably quite simple problem and really would appreciate some help or a hint. I have basic knowledge of python and webscraping.

I want to explore a certain hashtag and the community behind it on twitter. Using twint I've downloaded all tweets mentioning the hashtag into a .csv file. After that I cleaned up the .csv so that there aren't multiple entries of the same user (from multiple tweets with the same hashtag) and saved it as a .txt. I now would like to get some more information about those approximately 1.500 users in said list – mainly the date they joined twitter, number of tweets would be a bonus.

What I've tried: Twint should be able to do this, but it didn't work (I'm using the docker image provided on their github). I tried to get the user info with:

twint --userlist /bin/userlist.txt --user-full -o userlistfull.csv --csv

Twint puts out a long error message, which if I understand it correctly is related to an open bug in twint:

CRITICAL:root:twint.get:User:'url'
ERROR:root:twint.run:Twint:Lookup:Unexpected exception occurred.
Traceback (most recent call last):
  File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup
    await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
  File "/usr/local/lib/python3.6/site-packages/twint/get.py", line 228, in User
    await Users(j_r, config, conn)
  File "/usr/local/lib/python3.6/site-packages/twint/output.py", line 177, in Users
    user = User(u)
  File "/usr/local/lib/python3.6/site-packages/twint/user.py", line 31, in User
    _usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'
Traceback (most recent call last):
  File "/usr/local/bin/twint", line 8, in <module>
    sys.exit(run_as_command())
  File "/usr/local/lib/python3.6/site-packages/twint/cli.py", line 339, in run_as_command
    main()
  File "/usr/local/lib/python3.6/site-packages/twint/cli.py", line 324, in main
    run.Lookup(c)
  File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 386, in Lookup
    run(config)
  File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 329, in run
    get_event_loop().run_until_complete(Twint(config).main(callback))
  File "/usr/local/lib/python3.6/asyncio/base_events.py", line 488, in run_until_complete
    return future.result()
  File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 235, in main
    await task
  File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 270, in run
    await self.Lookup()
  File "/usr/local/lib/python3.6/site-packages/twint/run.py", line 307, in Lookup
    await get.User(self.config.Username, self.config, db.Conn(self.config.Database))
  File "/usr/local/lib/python3.6/site-packages/twint/get.py", line 228, in User
    await Users(j_r, config, conn)
  File "/usr/local/lib/python3.6/site-packages/twint/output.py", line 177, in Users
    user = User(u)
  File "/usr/local/lib/python3.6/site-packages/twint/user.py", line 31, in User
    _usr.url = ur['data']['user']['legacy']['url']
KeyError: 'url'

I've tried to loop over the list and let twint lookup each username individually but it doesnt work either:

import twint 
import os
import sys
import nest_asyncio 
nest_asyncio.apply()

c = twint.Config()

with open("userlist.txt", "r") as a_file:

  for line in a_file:

    stripped_line = line.strip()
    stripped_line = c.Username
    twint.run.Search(c)

Running it with Google Colab I gives me

 CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
    sleeping for 1.0 secs
    CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
    sleeping for 8.0 secs
    CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
    sleeping for 27.0 secs
    CRITICAL:root:twint.run:Twint:Feed:noDataExpecting value: line 1 column 1 (char 0)
    sleeping for 64.0 secs

What I'm looking for What is the easiest solution to get the join dates of those users in the list? Should I use a different library? Could I loop over the list with something like beautifulsoup and scrape the join dates? How would I do this?

Help would be very much aprreciated, thanks in advance!

score 0 · Answer 1 · answered May 20 '21 at 16:24

0

Try to install it using

pip3 install --user --upgrade git+https://github.com/twintproject/twint.git@origin/master#egg=twint

and make sure your python version higher than 3.6 source

answered May 20 '21 at 16:24

Dery Sudrajat

88
1
7

score 0 · Answer 2 · answered Oct 24 '21 at 19:42

0

Just replace this line in twint/user.py:

_usr.url = ur['data']['user']['legacy']['url']

to this:

try:
    _usr.url = ur['data']['user']['legacy']['url']
except:
    _usr.url = ''

answered Oct 24 '21 at 19:42

Mohammad Zarchi

111
1
8

1

Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Oct 24 '21 at 20:30

Scrape join-dates/user info from a list (csv) of Twitter-users

2 Answers2