I am extracting extra fields from a JSONL file using json2csv.py (compiled using twarc), and am having trouble extracting some text fields that are held within an array. This is the array, and I want to be able to pull out just the hashtag text.
"entities": {
"hashtags": [
{
"text": "NoJusticeNoPeace",
"indices": [
65,
82
]
},
{
"text": "justiceforNaledi",
"indices": [
83,
100
]
},
I am able to extra other fields that don't have arrays using this code:
python json2csv.py tweets_may.jsonl -e full_text retweeted_status.extended_tweet.full_text > testfull_text.csv
However, I can't work out how to pull out the array, or elements of it. Individual hashtag text can be identified using the following retweeted_status.extended_tweet.entities.hashtags.0.text
I've tried using:
python json2csv.py tweets_may.jsonl -e all_hashtags retweeted_status.extended_tweet.entities.hashtags.0.text > testhash.csv
But this just returns an empty column. Ideally I would like to be able to pull out all occurrences of 'text' within the 'hashtag' array into either a single column or separate columns.