21

I'm doing a loop to get json api, here is what I have in my loop:

response_item = requests.request('GET',url_item,params=None,verify=False)
response_item = json.loads(response_item.text)
response_item = ast.literal_eval(json.dumps(response_item, ensure_ascii=False).encode('utf8'))

I scan around 45000 json objects, I generate "url_item" variable for each iteration. Each object is the same, I can get something like 7000 object and I have the following error when I reach the 7064th:

Traceback (most recent call last):
  File "C:\Python27\tools\api_item.py", line 47, in <module>
    response_item = ast.literal_eval(json.dumps(response_item, ensure_ascii=False).encode('utf8'))
  File "C:\Python27\lib\ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "C:\Python27\lib\ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "C:\Python27\lib\ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "C:\Python27\lib\ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "C:\Python27\lib\ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "C:\Python27\lib\ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

I used to print the second and third "response_item". Of course in this case the third one isn't displayed since I have the error just before, here what I have for the print after the json.load:

{u'restrictions': [], u'name': u'Sac \xe0 dos de base', u'level': 0, u'rarity': u'Basic', u'vendor_value': 11, u'details': {u'no_sell_or_sort': False, u'size': 20}, u'game_types': [u'Activity', u'Wvw', u'Dungeon', u'Pve'], u'flags': [u'NoSell', u'SoulbindOnAcquire', u'SoulBindOnUse'], u'icon': u'https://render.guildwars2.com/file/80E36806385691D4C0910817EF2A6C2006AEE353/61755.png', u'type': u'Bag', u'id': 8932, u'description': u'Un sac de 20 emplacements pour les personnages d\xe9butants.'}

Every item I get before this one has the same type, same format, and I don't have any error except for the 7064th !

Thank you for your help!

Łukasz Rogalski
  • 22,092
  • 8
  • 59
  • 93
Aurélien
  • 409
  • 1
  • 4
  • 12

2 Answers2

47

You should not use ast.literal_eval() on JSON data. JSON and Python literals may look like the same thing, but they are very much not.

In this case, your data contains a boolean flag, set to false in JSON. A proper Python boolean uses title-case, so False:

>>> import json, ast
>>> s = '{"no_sell_or_sort": false, "size": 20}'
>>> json.loads(s)
{u'no_sell_or_sort': False, u'size': 20}
>>> ast.literal_eval(s)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 80, in literal_eval
    return _convert(node_or_string)
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 63, in _convert
    in zip(node.keys, node.values))
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 62, in <genexpr>
    return dict((_convert(k), _convert(v)) for k, v
  File "/Users/mj/Development/Library/buildout.python/parts/opt/lib/python2.7/ast.py", line 79, in _convert
    raise ValueError('malformed string')
ValueError: malformed string

Other differences include using null instead of None, and Unicode escape sequences in what to Python 2 looks like a plain (bytes) string, using UTF-16 surrogates when escaping non-BMP codepoints.

Load your data with json.loads(), not ast.literal_eval(). Not only will it handle proper JSON just fine, it is also faster.

In your case, it appears you are using json.dumps() then try to load the data again with ast.literal_eval(). There is no need for that step, you already had a Python object.

In other words, the line:

response_item = ast.literal_eval(json.dumps(response_item, ensure_ascii=False).encode('utf8'))

is redundant at best, and very, very wrong, at worst. Re-encoding response_item to a JSON string does not produce something that can be interpreted as a Python literal.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • And safer, might I add. – This company is turning evil. Sep 21 '15 at 13:04
  • 2
    @Kroltan: `ast.literal_eval()` is perfectly safe. It parses the string to an abstract syntax tree, then only maps literals (strings, lists, numbers, etc) to Python objects. Everything else raises an exception. – Martijn Pieters Sep 21 '15 at 13:05
  • 2
    I've deleted the line with ast.literal_eval() as you told me. Now I don't have the error, but I have something with unicode characters "u'" , I've tried manythings to remove those characters because they are inserted in database then. That's the eason why I added the line with ast.. – Aurélien Sep 21 '15 at 14:01
  • @Aurélien: All JSON data is *Unicode* data. If you see `u` prefixes show up in your database, you have a different problem there, where you are inserting `repr(value)` into the database rather that `value` itself. – Martijn Pieters Sep 21 '15 at 14:37
  • @Aurélien: You need to fix that code; the problem lies there, not here in how you load the data. – Martijn Pieters Sep 21 '15 at 14:38
  • 1
    @Aurélien: Perhaps you can post a *new* question with a sample of how you insert your data into the database. `u'...'` objects are Python Unicode text objects, the prefix is there to distinguish the type from Python bytestrings. – Martijn Pieters Sep 21 '15 at 14:39
  • Thank you for your advices. I import json via csv in my database, so do I will have a look to convert to utf8 in csv file or in database – Aurélien Sep 21 '15 at 16:26
-4

ast.literal_eval is safe with SQL injection if you are using this. because when an unwanted charter is inserted it will show Syntex error which prevents from an injection.

Sajid
  • 31
  • 1
  • 5
  • This has nothing to do with SQL injection. And SQL injection relies on string *contents*, not types. – Martijn Pieters Nov 27 '17 at 13:59
  • 1
    Note to reviewers: This shouldn't have been flagged as Very Low Quality in the first place because the VLQ queue is not for technical inaccuracies - that's what downvoting and comments are for. This is already downvoted with a comment explaining why it's incorrect, so no further action is needed on it. – EJoshuaS - Stand with Ukraine Oct 05 '20 at 15:09
  • So click looks ok in the review queue. – 10 Rep Oct 05 '20 at 16:18