0

I am trying to follow a tutorial on twitter data mining, the steps emulated as as follows:

 tweets_data_path = '/home/ambijat/ipythonnbs/twitter/twitter_data.txt'
    tweet_data = []
 tweets_file = open(tweets_data_path, "r")
for line in tweets_file:
try:
    tweet = json.loads(line)
    tweet_data.append(tweet)
except:
    continue

And then:

   tweets = pd.DataFrame()
  tweets['text'] = map(lambda tweet: tweet['text'], tweet_data)
   tweets['lang'] = map(lambda tweet: tweet['lang'], tweet_data)
  tweets['country'] = map(lambda tweet: tweet['place']['country'] if tweet['place'] != None else None, tweet_data)

And the outcome is:

    ---------------------------------------------------------------------------
    TypeError                                 Traceback (most recent call last)
  <ipython-input-9-a42fce63cc05> in <module>()
     1 tweets = pd.DataFrame()
----> 2 tweets['text'] = map(lambda tweet: tweet['text'], tweet_data)
  3 tweets['lang'] = map(lambda tweet: tweet['lang'], tweet_data)
     4 tweets['country'] = map(lambda tweet: tweet['place']['country'] if tweet['place'] != None else None, tweet_data)

   <ipython-input-9-a42fce63cc05> in <lambda>(tweet)
  1 tweets = pd.DataFrame()
   ----> 2 tweets['text'] = map(lambda tweet: tweet['text'], tweet_data)
  3 tweets['lang'] = map(lambda tweet: tweet['lang'], tweet_data)
  4 tweets['country'] = map(lambda tweet: tweet['place']['country'] if tweet['place'] != None else None, tweet_data)

   TypeError: 'int' object has no attribute '__getitem__'

Could someone please help me in locating my mistake I am practically novice.

ambrish dhaka
  • 689
  • 7
  • 27

1 Answers1

1

You can also directly pass tweet_data list to json_normalize:

from pandas.io.json import json_normalize    
tweets = json_normalize(tweet_data)[["text", "lang", "place.country"]]

    text                                                lang    place.country
0   This not the 1st. They hv 1 in Faisalabad alre...   en      پاکستان
1   RT @TOLOnews: Pakistan Trying To Create Third ...   en      NaN
2   RT @murtazasolangi: JuD establishes parallel "...   en      NaN
ayhan
  • 70,170
  • 20
  • 182
  • 203
  • Thanks to all, I have figured out one more thing, the real reason why my pasted quote did not work has been that I had some garbage data (numbers) in first two lines of the text file collecting all the tweets. I removed these first two lines and then the code worked just fine. – ambrish dhaka Apr 08 '16 at 01:54
  • I have a little more task at hand, I want to save the output from running the cell `tweets['country']` into text file. It gives me output something like this: The output is truncated. The output gets suppressed, I want entire output. I am using `%%capture cap --no-stderr tweets['country'] with open('output.txt', 'w') as f: f.write(cap.stdout)` – ambrish dhaka Apr 08 '16 at 04:34