Python: Reading values from output of file and interpreting as dictionary as key value

Question

I am new to dictionary and facing trouble with understanding how to interpret the output of a file as dictionary and read the key value pairs of it.

Here is my script which takes the output of a file as dictionary:

dicts = {}
for line in sys.stdin:
   d = ast.literal_eval(line)
   for k,v in d.items():
      dicts.setdefault(k, []).append(v)
      charcount = sum(int(d['charcount']) for d in dicts[k])
      output_dict = {k: {'charcount': charcount}}
      print output_dict

Here is the output of my file from which the script takes as input:

{ 262968617233162240 : {'@': False, '#': False, 'word': 'good#1st#time#will',    'longword': True, 'title': False, 'charcount': 18, 'uppercase': False, 'stop': False, 'sscore': False, 'url': False, '!!!': False} }
{ 262968617233162240 : {'@': False, '#': False, 'word': 'be', 'longword': False, 'title': False, 'charcount': 2, 'uppercase': False, 'stop': True, 'sscore': False, 'url': False, '!!!': False} }
{ 262968617233162240 : {'@': False, '#': False, 'word': 'going', 'longword': False, 'title': False, 'charcount': 5, 'uppercase': False, 'stop': False, 'sscore': False, 'url': False, '!!!': False} }
{ 262968617233162240 : {'@': False, '#': False, 'word': 'back#', 'longword': False, 'title': False, 'charcount': 5, 'uppercase': False, 'stop': False, 'sscore': False, 'url': False, '!!!': False} }
{ 263790847424880641 : {'@': False, '#': False, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0', 'longword': True, 'title': False, 'charcount': 33, 'uppercase': False, 'stop': False, 'sscore': False, 'url': True, '!!!': False} }

When I run the script, I get repetitive values instead of it parsing the entire input.

Thanks.

Please post a representative sample of the contents of your file — inspectorG4dget, Oct 30 '13 at 22:26
It looks like k,v are just grabbing the first set of keys so '@' and 'uppercase' — Dylan Lawrence, Oct 30 '13 at 22:26
First, whatever you're trying to do here, `eval(line)` is probably a very bad idea. Repeatedly updating the same dictionary is probably not what you wanted either, and naming a single dictionary `dicts` is a good way to confuse yourself into thinking you have a dict or list or other collection full of dictionaries… — abarnert, Oct 30 '13 at 22:27
Meanwhile, `eval` on those particular lines is going to raise a `SyntaxError`, because a number, a space, and a dict display is not a valid Python expression. — abarnert, Oct 30 '13 at 22:29
Above output is just a part of the entire output but I want to understand, how can I print them properly. I feel like I am reading incorrectly. — fscore, Oct 30 '13 at 22:29
Your second 'for' loop is run for each line in stdin. That's part of the reason why it all seems repetative - you keep printing the stuff you've already added over and over again. Dedent (that is, move it so it is in the same column as the first 'for') and it will only run once, after that first for loop is complete. — tdelaney, Oct 30 '13 at 22:32
@abarnert Sorry I updated as I pasted wrong output there. I edited now to the correct one and eval does not give me error. — fscore, Oct 30 '13 at 22:32
@kulkarni.ankita09 When you loop through dicts in python, it returns the keys, not the values associated with those keys. — Dylan Lawrence, Oct 30 '13 at 22:34
@DylanLawrence how do I loop the dictionary in a way that it returns proper values for keys. — fscore, Oct 30 '13 at 22:37
@kulkarni.ankita09 inside of your loop you just need to reference the value, so you would do dicts[k] or dicts[v] to call the value at the given key. — Dylan Lawrence, Oct 30 '13 at 22:39
If I do, print dicts[v] inside the loop, it gives me no output at all — fscore, Oct 30 '13 at 22:47
@kulkarni.ankita09: First, as the [`eval`](http://docs.python.org/2/library/functions.html#eval) docs expicitly say: "See [`ast.literal_eval()`](http://docs.python.org/2/library/ast.html#ast.literal_eval) for a function that can safely evaluate strings with expressions containing only literals. But a better solution would be to change the code that generates this output so it uses a format that's meant to be used for interchange, like JSON. — abarnert, Oct 30 '13 at 23:21
Anyway, each line is a dict with a single key and value (the value itself being a dict). Most, but not all, of the keys are the same. So, when you do `dicts.update` with each line, most, but not all, of the time you'll be replacing an existing key-value pair with the new one. If that isn't what you wanted… maybe you can show us the intended output for this input, instead of making us guess what you don't like about it? — abarnert, Oct 30 '13 at 23:24
@DylanLawrence: He's looping over `sorted(dicts.items())`. And `items` returns key-value pairs. So `v` is _already_ each value, and `dicts[v]` will almost certainly raise a `TypeError` or `KeyError`. — abarnert, Oct 30 '13 at 23:25
@abarnert I want the output the same what I take as input but transformed into a dictionary. — fscore, Oct 30 '13 at 23:47
@kulkarni.ankita09: The input is the string representations of five separate dictionaries. There is no way a single dictionary can be the same as that. Or, rather, there are various ways you could "merge" them all into one big dictionary that all seem equally silly, but I have no idea which one is the one you want. Or maybe you didn't actually want a dictionary, but rather a list of dictionaries? — abarnert, Oct 31 '13 at 00:03
You've now edited the question to be completely different from the one you originally asked. Don't do that; it makes the question useless for any future searchers, and very hard to follow for anyone trying to answer you. If you have a followup question that's big enough to require rewriting the whole question, write a new question instead. — abarnert, Oct 31 '13 at 20:03

abarnert · Accepted Answer · 2013-10-31T02:06:48.460

I suspect what you're actually looking for here is not one big dict, but rather a list of dicts, one for each line. For example:

dicts = []
for line in sys.stdin:
    dicts.append(eval(line))

I would actually write this with ast.literal_eval (as the eval) docs suggest),* and simplify it into a list comprehension:

dicts = [ast.literal_eval(line) for line in sys.stdin]

But either way, now each element in dicts is a dict. So, to print them all out:

for d in dicts:
    print d

The only thing is, you wanted to sort them. I'm not sure how you want to sort them. In general, sorting dictionaries doesn't make any sense (which is why Python 2 gives you a meaningless order, and Python 3 gives you a TypeError). There are, of course, particular cases where there is some meaningful order, but each such case is different.

Maybe in your case, you want to rely on the fact that each dict has a single key, and sort on that key? If so:

for d in sorted(dicts, key=lambda d: d.keys()[0]):
    print d

But that's just a guess.

From a comment:

how do I do a count on let say, charcount (it exists in the value part of the dict) of all dictionaries with same key.

If you're trying to do that, you have two options.

First, you can always just search the whole list of dictionaries, like this:

charcounts = []
for d in dicts:
    for k, v in d.items():
        if k == key:
            charcounts.append(v['charcount'])

But in this case, you might be better off with a "multidict" structure—that is, a dict whose values are all lists (of dicts, in this case).

There are two easy ways to build a multidict—the setdefault method on dict, or the defaultdict class in collections. Both are equally simple; the different is that the first one gives you a regular dict, so it's a KeyError to look for a key that doesn't exist, while the second one gives you a defaultdict, so you'll get an empty list looking for a key that doesn't exist. I'll show the first, but really, you have to decide which one you want.

dicts = {}
for line in sys.stdin:
    d = ast.literal_eval(line)
    for k, v in d.items(): # should only be one
        dicts.setdefault(k, []).append(v)

This is a bit more work to set up, but less work to search through. For example, the whole mess above can be replaced by one line:

charcounts = [d['charcount'] for d in dicts[key]]

… and, if dicts is very big, it'll be a lot faster, because it only has to look through the ones with matching keys, rather than all of them.

To give you an idea of what this looks like, here's dicts with your sample input:

{262968617233162240: 
    [
        {'!!!': False, '#': False, '@': False, 'charcount': 18, 'longword': True, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': False, 'word': 'good#1st#time#will'},
        {'!!!': False, '#': False, '@': False, 'charcount': 2, 'longword': False, 'sscore': False, 'stop': True, 'title': False, 'uppercase': False, 'url': False, 'word': 'be'},
        {'!!!': False, '#': False, '@': False, 'charcount': 5, 'longword': False, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': False, 'word': 'going'},
        {'!!!': False, '#': False, '@': False, 'charcount': 5, 'longword': False, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': False, 'word': 'back#'}
    ],
 263790847424880641: 
    [
        {'!!!': False, '#': False, '@': False, 'charcount': 33, 'longword': True, 'sscore': False, 'stop': False, 'title': False, 'uppercase': False, 'url': True, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0'}
    ]
}

From another comment:

So the output that I am looking for is: { 262968617233162240, charcount: 30}

Well, that isn't a valid anything in Python. It looks like something half-way between a set and a dict. A dict is a bunch of key-value pairs, with a colon between each key and value.

Here's something that is valid in Python:

{262968617233162240: {'charcount': 30}}

How would you get that?

Well, I already showed you how to get the list of charcounts for any given key. Before you can add them up, you have to convert them all to numbers:

charcounts = [int(d['charcount']) for d in dicts[key]]

Then, to add them up, just call sum:

charcount = sum(int(d['charcount']) for d in dicts[key])

Now, how do we build the output you wanted?

charcount = sum(int(d['charcount']) for d in dicts[key])
output_dict = {key: {'charcount': charcount}}

If you want to do that for each key in the multidict:

for key, values in dicts.items():
    charcount = sum(int(d['charcount']) for d in values)
    output_dict = {key: {'charcount': charcount}}
    # now do something with output_dict

* Or, better yet, change the saving code to use a format actually meant for data interchange, like JSON or pickle.

You really helped me get very close to the solution but only question I have is that, how do I do a count on let say, charcount (it exists in the value part of the dict) of all dictionaries with same key. — fscore, Oct 31 '13 at 00:31
Thanks so much. So the output that I am looking for is: { 262968617233162240, charcount: 30} {263790847424880641, charcount:33} I do not want to do any type of search for a key but I want to output the sum of charcounts of all identical keys. — fscore, Oct 31 '13 at 01:58
for key, values in dicts.items(): AttributeError: 'list' object has no attribute 'items' I am getting a error when I used the above. for key, values in dicts.items(): charcount = sum(int(d['charcount']) for d in values) output_dict = {key: {'charcount': charcount}} — fscore, Oct 31 '13 at 02:26
@kulkarni.ankita09: As the explanation says, that code is for the multidict solution. If you used the flat solution, obviously you'll need different code. — abarnert, Oct 31 '13 at 02:45
Please check my initial code edit. I am still getting error. KeyError: 'charcount' — fscore, Oct 31 '13 at 07:04
@kulkarni.ankita09: Your new code is not doing the same thing as the code I gave you. In my code, `key` and `values` are each key and list of sub-dicts in the multidict `dicts`; in your code, `k` and `v` are each key and value within each sub-dict. You have to actually understand what the code is doing, and what the objects are, not just blindly copy and paste code that looks similar. — abarnert, Oct 31 '13 at 20:05
Thanks a lot for the explanation. But, I am still confused. I gave you the input, all I want to incorporate is counting all values of charcount in the dictionaries and doing it for all multi-dictionaries in the loop whose keys match. I do not understand how to incorporate in my code. — fscore, Oct 31 '13 at 21:56
@kulkarni.ankita09: I have code that builds up the multi-dict `dicts`, with an example of what the resulting object should look like if you print it out. I have code that loops over that object and creates each `output_dict` for you. Both of them work. If you actually understand them both, but don't know how to put them together, I don't know how to help you. If you're looking for someone to write all your code for you so you don't have to understand it, you'll have to hire someone. — abarnert, Oct 31 '13 at 22:07
I am new to dict and python hence I posted my question here. I do not want to hire anyone but m trying to understand myself. So, please cut me some slack. I may not be a pro like you and I am sure everyone has gone through this phase. Thanks for your help. — fscore, Oct 31 '13 at 22:19
@kulkarni.ankita09: This site is not good for tutoring people on basic concepts or open-ended interactive debugging of your code. It's great for answering specific questions from people who have a grasp of what they're doing, but that's all it's good for. — abarnert, Oct 31 '13 at 22:28

score 0 · Answer 2 · answered Oct 30 '13 at 23:31

You have two main problems:

1)

print dicts[v]

cannot work as a dict gets called with a key, and v is the value. This call should give you (your values are dicts in fact):

TypeError: unhashable type: 'dict'

Change it for

print dicts[k]

and the program will run

2)

Your three first lines in the file have the same key. So they are overwritten when you update the dictionary. So at the end you have only two outputs (in four lines as it includes the two print calls):

{'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'back#', 'longword': False, 'title': False, 'url': False, 'sscore': False, 'charcount': 5}
262968617233162240 {'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'back#', 'longword': False, 'title': False, 'url': False, 'sscore': False, 'charcount': 5}
{'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0', 'longword': True, 'title': False, 'url': True, 'sscore': False, 'charcount': 33}
263790847424880641 {'@': False, 'uppercase': False, 'stop': False, '!!!': False, '#': False, 'word': 'http://instagr.am/p/rx9939civ8/\xc2\xa0', 'longword': True, 'title': False, 'url': True, 'sscore': False, 'charcount': 33}
Script terminated.

Thanks. How do I print the values as I take them as input. I know my data has duplicate keys and I want that to be there. What should I use to update my dict "dicts" with all values from standard input including duplicates. — fscore, Oct 30 '13 at 23:44
@kulkarni.ankita09: A dict cannot have two different values for the same key. The whole point of a dict is that it maps each key to exactly one value. — abarnert, Oct 31 '13 at 00:04
@abarnert I want to print a list of dictionaries like you see on top and then my script on top should interpret them as a key-value pairs at every line — fscore, Oct 31 '13 at 00:18

Python: Reading values from output of file and interpreting as dictionary as key value

2 Answers2