why I could not rehydrate more than 18 tweets out of 24000 tweet ids using TWARC/ hydrator app? Does any one know a better way?

Question

I have a question regarding rehydrate of the tweet's text. Any help would be appreciated.

This is the source of my data; which is about corona tweets:

source of data set

I have downloaded a data set from it which is in the photo (named 01-feb-2020)

Then, I filter this data to show me the only tweets from 'GB' which is almost 24000 tweets

I have used twarc to hydrate my tweets' text as below :

first, install twarc using pip

then, type this in the command line: twarc configure

then, inter consumer key and secret key

then, write a command:

twarc hydrate id.txt > tweet_hydrated.jsonl

But, I only get 18 tweet text out of 24000 tweet id

I have used the hydrator app as well but the result was the same. what am I doing wrong? Is it logical to get 18 out of that large amount of data? Any new suggestion for hydrating tweet text world is appreciated. (sorry for my bad english I am not the naive speaker)

How are you getting from JSON format downloaded, into a CSV format? I'm wondering whether the Tweet ID values are valid. — Andy Piper, Aug 05 '20 at 12:51
I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc. — Andy Piper, Aug 05 '20 at 13:23
you are correct. I was able to get data after changing my method of getting Tweet ID. At first, since the amount was small I just copy-paste Tweet ID. But then, I have been told to write proper code for that to get tweet id. which solves my problem. thank you so much. — zaraa s, Aug 05 '20 at 18:15
If you add your comment as an answer I can mark it as the accepted answer. — zaraa s, Aug 05 '20 at 18:27

score 1 · Answer 1 · answered Aug 05 '20 at 18:24

The Tweet ID collection method (which was copy-pasting ) was not correct. After writing a proper code to save tweet ID into text file, the problem has been solved.

Also, Andy Piper mentioned the same thing in the comment part which I copy past here.

How are you getting from JSON format downloaded, into a CSV format? I'm wondering whether the Tweet ID values are valid. – Andy Piper 5 hours ago

I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc

score 0 · Accepted Answer · answered Aug 06 '20 at 12:21

I've managed to reproduce this now, and I believe that in the process of converting your JSON input to CSV / Excel to a list of Tweet IDs to hydrate, you are probably using JavaScript (?) and the Tweet IDs are losing their accuracy. The clue was when I noticed all of the Tweet IDs ending in 0000 in my Excel column. You'll need to use a more precise method of getting the Tweet IDs into twarc.

why I could not rehydrate more than 18 tweets out of 24000 tweet ids using TWARC/ hydrator app? Does any one know a better way?

2 Answers2