Sagemaker Notebook instance very slow with tweepy api

Question

I collect tweets from file has more than 700,000 ids for tweets by using tweepy . Locally I have I7 processor and 16 GB ram and I make for loop on these ides to get the full text , geo , coordinates and place attributes.

And by using tqdm library ,It showed that 2.5 iteration done by second , that is very slow

After that I used sagemaker instances : 1 - ml.m5.4xlarge = 64 GB ram & 16 vCPU 2 - ml.m5.12xlarge = 192 GB ram & 48 vCPU

And the two instances gave me the same results also about 2.5 iteration per second , the same result , how could be ?

Then I used sagemaker studio lab which is free and give me 12 hours per day, it was fastest one it gave me about 6 iteration per second

sample = anger['id'][:100000]
tweets = []
geos =[]
coords =[]
places =[]
ids = []
for i in tqdm(sample.index):
    
    id = sample[i]
    # get the tweet
    #ids.append(id)
  
    try:
        status = api.get_status(id, tweet_mode='extended')
        text = status._json['full_text']
        geo = status.geo
        coordinates = status.coordinates
        place = status.place
        
        ids.append(id)
        tweets.append(text)
        geos.append(geo)
        coords.append(coordinates)
        places.append(place)
        
    except tweepy.errors.TweepyException as e:
        #print(i)
        pass

creating a DataFrame

data = {
'id' : ids,    'text': tweets,    'geo': geos,    'coordinates': coords,    'places':places   } 

 anger_dataframe =pd.DataFrame(data)

anger_dataframe.to_csv('anger_dataframe.csv', encoding='utf-8-sig')

Blockquote

My First Question : why the sagemaker instance with these resources give me the same results and not different from my local resource . And also sagemaker studio lab give me about 6 iteration per second but it not enough for me ,I have 6 million ides to collect.

Second Question : Is there any enhancing in for loop to run faster .

Resources :

1- https://studiolab.sagemaker.aws/

2- https://pypi.org/project/tqdm/

Sagemaker Notebook instance very slow with tweepy api

creating a DataFrame

0 Answers0