0

I want to scrape tweets of only Urdu language for my project using python. I started researching how to scrape Twitter tweets. Three prominent ways I found so far.

  1. Tweepy Using Twitter API
  2. Twint Using Twitter API
  3. Selenium

However, I still can't figure out how to specially target Urdu language tweets. I will be very highly grateful if anyone can provide any help, guidance, or lead in this regard. Thanks

Umair Mayo
  • 43
  • 7

2 Answers2

0

After researching more on the topic: Two ways: One can use define the tweets language using Twint.Lang('tweet_language_code').

import twint
c = twint.Config()
c.Username = "elonmusk"
c.Limit = 100
c.Store_csv = True
c.Output = "none3.csv"
c.Lang = "en" # en code for english
twint.run.Search(c)

(Note: The above method didn`t worked for me. Thereby, I strived for the other methods)

Second, Using snscraper module. set the language in the query. (Working nicely)

import snscrape.modules.twitter as sntwitter
query = 'lang:ur' #ur is code for urdu
#limit = 10
urduTweets = sntwitter.TwitterSearchScraper(query).get_items()
Umair Mayo
  • 43
  • 7
0
for tweet in tweepy.Cursor(api.search_tweets, q=keyword, lang='en', count=450, since_id='2021-01-01').items(50000):

The above snippet will give you 50K tweets in English.

*Note: To access tweets older than 1 week, you need Twitter API's Academic Access, general API will only fetch you the past 1 week of data.

Jay Patel
  • 31
  • 4