0

i have a file with over 15k lines each line having 1 key and 1 value. I can modify file content if any formatting is required for faster reading. currently i have made entire file like a dict and doing an eval on that is this the best way to read the file or any better approach can we follow, please suggest. File mymapfile.txt:

{
'a':'this',
'b':'that',
.
.
.
.
'xyz':'message can have "special" char %s etc '
}

and on this file i am doing eval

f_read = eval(open('mymapfile.txt', 'r').read())

my concern is my file keeps growing and values can have quotes,special char etc where we need to wrap value ''' or """. with dictionary format even if there is small syntax error eval will fail. So is it better to use readlines() without making file as dict and then create dict or eval is faster if we make dict in file? for readlines i can simply write text in each line split with : and need not worry about any special characters

File for readlines:

a:this
b:that
.
.
.
.
xyz:message can have "special" char %s etc
RAFIQ
  • 905
  • 3
  • 18
  • 32
  • 1
    use [json.load](https://docs.python.org/2/library/json.html) – nu11p01n73R Jun 03 '15 at 07:18
  • 1
    Use JSON. The JSON parser will turn it into a dict for you and the syntax is very simple. Depending on how large the file gets, reading it all into memory might become impractical at some point, the you would need to think about streaming the file. – Boris the Spider Jun 03 '15 at 07:19
  • 1
    Don't do that, instead use `Pickle` module. – ZdaR Jun 03 '15 at 07:20
  • @BoristheSpider JSON won't work here - it requires `"` quotes - not `'` quotes - it'll throw parse errors – Jon Clements Jun 03 '15 at 07:24
  • @JonClements I am suggesting changing the format, as the OP seems open to that idea. – Boris the Spider Jun 03 '15 at 07:24
  • @BoristheSpider sorry file will not grow more than 20k but while adding them i just need to make sure its readable by eval, just wanted to is it still better to use json, is there a performance improvement over eval? assume no memory constraints – RAFIQ Jun 03 '15 at 07:29
  • Not sure about speed - you'll need to benchmark your code to work that out. Never take anyone's word on speed - benchmark benchmark benchmark. The main reason to switch to JSON would be maintainability. – Boris the Spider Jun 03 '15 at 07:42

3 Answers3

1

@Mahesh24 answer returns a set with values that look like dict but are not. Also his variable overwrites the builtin dict. Rather use the two lines:

s={ (i.strip())  for i in open('ss.txt','r').readlines() }
d = {i.split(':')[0]:i.split(':')[1] for i in s}

d will then be dict with read in values. Bit of thinking could probably get this into a one liner. Pretty sure there are read csv in python standard library that will give you some more options and robustness. if your data is in any other standard format using the appropriate standard libraries will be preferential. The above two liner will however give you a quick and dirty way of doing it. can change the ":" for commas or whatever separator your data has.

Joop
  • 7,840
  • 9
  • 43
  • 58
0

Assuming you'll stick to json you might want to take a look at ultrajson. It seems to be very fast (even if with a memory penalty) at dumping and loading data.

Here are two articles that have some benchmarks and might help you make a decision:

https://medium.com/@jyotiska/json-vs-simplejson-vs-ujson-a115a63a9e26

http://jmoiron.net/blog/python-serialization/

bergonzzi
  • 389
  • 5
  • 13
-1

Please avoid eval if you only want to load data.

What you only need is to read lines, recognize the key and the value so your proposed file format:

a:this
b:that
...

is fully suitable.

dlask
  • 8,776
  • 1
  • 26
  • 30
  • you think its better to readlines and the split append to my dict each line? – RAFIQ Jun 03 '15 at 07:31
  • Actually it depends on your preferences. If you are able to *export* your data in any format, then use e.g. XML or JSON. Then you will *import* them using the corresponding parser. However, if you want to avoid special data formats, then you can use the proposed approach. – dlask Jun 03 '15 at 07:34