0

I have a list in which each item contains JSON data, so I am trying to parse the data using Ijson since the data load will be huge.

Image for the list

This is what I am trying to achieve:

article_data=#variable which contains the list
parser = ijson.parse(article_data)
for id in ijson.items(parser, 'item'):
    if(id['article_type'] != "Monthly Briefing" and id['article_type']!="Conference"):
        data_article_id.append(id['article_id'])
        data_article_short_desc.append(id['short_desc'])
        data_article_long_desc.append(id['long_desc'])

This is the error I get:

AttributeError: 'generator' object has no attribute 'read'

I thought of converting the list into string and then try to parse in Ijson, but it fails and gives me the same error.

Any suggestions please?

data_article_id=[] 
data_article_short_desc=[] 
data_article_long_desc=[] 

for index in article_data: 
    parser = ijson.parse(index)
    for id in ijson.items(parser, 'item'):
        if(id['article_type'] != "Monthly Briefing" and id['article_type']!="Conference"):
            data_article_id.append(id['article_id'])
            data_article_short_desc.append(id['short_desc'])
            data_article_long_desc.append(id['long_desc'])

since it is in list, i tried this one also .. but it is giving me the same error.

'generator' object has no attribute 'read'

  • Is each element in your list a json object? You may need to call parse on each one individually. I recommend removing lines from your code or stepping through it until you can find exactly the line giving you this error. – scowan Aug 18 '17 at 11:05
  • I assume `article_data` should be a File Handle instead of Type `list`. – stovfl Aug 18 '17 at 18:20

1 Answers1

1

I am assuming that you have a list of byte string json object that you want to parse.

ijson.items(JSON, prefix) takes a readable byte object as input. That is it takes a opened file or file-like object as input. Specifically, the input should be bytes file-like objects.

If you are using Python 3, you can use io module with io.BytesIO to create a in-memory binary stream.

Example

Suppose input is [b'{"id": "ab"}', b'{"id": "cd"}']

list_json = [b'{"id": "ab"}', b'{"id": "cd"}']
for json in list_json:
    item = ijson.items(io.BytesIO(json), "")
    for i in item:
        print(i['id'])
Output: 
    ab
    cd
Tai
  • 7,684
  • 3
  • 29
  • 49