1

I have a list of about 58,000 rows, and each row is a dictionary.

Example:

my_list_of_dicts = 
[{'id': '555', 'lang': 'en'}, 
{'id': '444', 'lang': 'en'}, 
{'id': '333', 'lang': 'fr'},
{'id': '222', 'lang': 'es'}, 
{'id': '111', 'lang': 'ge'},
{'id': '666', 'lang': 'fr'}, 
{'id': '777', : 'du'}]

Inside each dictionary, you'll see that I have a key "lang' with a corresponding value, which is an abbreviations for several languages ('en', 'es', 'fr', 'du', 'ge', etc...)

I have successfully written the code I need to produce a series which contains a value_count of all of the unique values within this key.

When I do this, however, I get a KeyError because apparently there are a few dictionaries that do not contain the 'lang' value.

I created a try/except command that allows me to skip these missing values. It looks like there are aboout 5 rows out of 58,000 with a missing 'lang' key.

I want to find the location of these missing values for 'lang'. In other words, out of about 58,000 rows, how can I find which 5 rows have a missing 'lang' key?

TJE
  • 570
  • 1
  • 5
  • 20
  • 2
    `[item for item in my_list_of_dicts if "lang" not in item]` – Sraw Oct 26 '17 at 01:06
  • Thanks. This produces a list of the rows that have the missing 'lang' value. However, this doesn't show me the location of these rows within the list of 58,000. I am hoping to identify, for example, that the rows with the missing 'lang' keys are rows 10,453 and 20,432 and 22,304 and 52,302 and 55,211. Is there a way to do that? – TJE Oct 26 '17 at 01:15
  • 1
    `[index for index in range(len(my_list_of_dicts)) if "lang" not in my_list_of_dicts[index]]` – Sraw Oct 26 '17 at 01:19

4 Answers4

1

You can use get and enumerate:

my_list_of_dicts = 
[{'id': '555', 'lang': 'en'}, 
 {'id': '444', 'lang': 'en'}, 
 {'id': '333', 'lang': 'fr'},
 {'id': '222', 'lang': 'es'}, 
 {'id': '111', 'lang': 'ge'},
 {'id': '666', 'lang': 'fr'}, 
 {'id': '777', "missing_lang": 'du'}]
 missing_vals = [i for i, a in enumerate(my_list_of_dicts) if not a.get("lang", False)]

Bear in mind that the original dictionary you had contained : 'du' which is an invalid key-value pair, which would raise an error when you run your file. Therefore, I added a placeholder value for the purposes of demonstration.

Ajax1234
  • 69,937
  • 8
  • 61
  • 102
0

Going off the answer above in the comments,

 counter = 0:
 for item in my_list_of dicts:
      if "lang" not in item:
           print(counter)
      counter += 1

To get the ID key,

for item in my_list_of_dicts:
      if "lang" not in item:
           print(item['id'])
elanor
  • 172
  • 8
0

Your list is not real example as there is no key in the dict. Let's assume it looks like that: my_list_of_dicts = [ {'id': '555', 'lang': 'en'}, {'id': '444', 'lang': 'en'}, {'id': '777', 'x': 'du'}]

You can get the list of tuples with the index and item using:

[(index, item) for index, item in enumerate(my_list_of_dicts) if 'lang' not in item]

zalun
  • 4,619
  • 5
  • 31
  • 33
0

Since this question is labelled pandas, you could try DataFrame constructor:

In [11]: my_list_of_dicts = \
    ...: [{'id': '555', 'lang': 'en'},
    ...: {'id': '444', 'lang': 'en'},
    ...: {'id': '333', 'lang': 'fr'},
    ...: {'id': '222', 'lang': 'es'},
    ...: {'id': '111', 'lang': 'ge'},
    ...: {'id': '666', 'lang': 'fr'},
    ...: {'id': '777', }]  # example one with no lang

In [12]: df1 = pd.DataFrame(my_list_of_dicts)

In [13]: df1
Out[13]:
    id lang
0  555   en
1  444   en
2  333   fr
3  222   es
4  111   ge
5  666   fr
6  777  NaN

In [14]: df1[df1.lang.isnull()]  # rows with a NaN (missing) lang
Out[14]:
    id lang
6  777  NaN
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535