0

I have a question regarding rehydrate of the tweet's text. Any help would be appreciated.

This is the source of my data; which is about corona tweets:

source of data set

I have downloaded a data set from it which is in the photo (named 01-feb-2020)

photo of my data set

Then, I filter this data to show me the only tweets from 'GB' which is almost 24000 tweets

totall number of my tweet id

I have used twarc to hydrate my tweets' text as below :

first, install twarc using pip

then, type this in the command line: twarc configure

then, inter consumer key and secret key

then, write a command:

twarc hydrate id.txt > tweet_hydrated.jsonl

But, I only get 18 tweet text out of 24000 tweet id

all that I could hydrate

I have used the hydrator app as well but the result was the same. what am I doing wrong? Is it logical to get 18 out of that large amount of data? Any new suggestion for hydrating tweet text world is appreciated. (sorry for my bad english I am not the naive speaker)

zaraa s
  • 25
  • 4
  • How are you getting from JSON format downloaded, into a CSV format? I'm wondering whether the Tweet ID values are valid. – Andy Piper Aug 05 '20 at 12:51
  • I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc. – Andy Piper Aug 05 '20 at 13:23
  • you are correct. I was able to get data after changing my method of getting Tweet ID. At first, since the amount was small I just copy-paste Tweet ID. But then, I have been told to write proper code for that to get tweet id. which solves my problem. thank you so much. – zaraa s Aug 05 '20 at 18:15
  • If you add your comment as an answer I can mark it as the accepted answer. – zaraa s Aug 05 '20 at 18:27

2 Answers2

1

The Tweet ID collection method (which was copy-pasting ) was not correct. After writing a proper code to save tweet ID into text file, the problem has been solved.

Also, Andy Piper mentioned the same thing in the comment part which I copy past here.

How are you getting from JSON format downloaded, into a CSV format? I'm wondering whether the Tweet ID values are valid. – Andy Piper 5 hours ago

I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc

zaraa s
  • 25
  • 4
0

I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc.

Andy Piper
  • 11,422
  • 2
  • 26
  • 49