-2

I hope this is a quite easy question, but for me without a lot of python background I can't find an answer.

df = pd.DataFrame(
    {'Messung': ['10bar','10bar','10bar','20bar','20bar'],  
     'Zahl': [1, 2, 3, 4, 5],  
     'Buchstabe': ['a','b','c','d','e']})  

There is a DataFrame (made a simplier Test DF for this post) where I loop through one column and compare the first 2 numbers of a string. The whole column has in the end like 20 Keys. Everytime the key is found, append the whole row to this key.

d={}
for row, item in enumerate(df['Messung']):
    key=item[0:2]
    if key not in d:
        d[key] = []
    d[key].append(df.iloc[row])

This code works, but my first attempt to this was different. I wanted to have dictionaries where I have keys named as 'RP_10', 'RP_20'.

d={}
for row, item in enumerate(df['Messung']):
    key=item[0:2]
    if key not in d:
        d['RP_'+key] = []
    d['RP_'+key].append(df.iloc[row])

Can someone explain to me, why this doesn't work and how could I write the code so that I get 'RP_10', 'RP_20' and so on?

Rabinzel
  • 7,757
  • 3
  • 10
  • 30
  • 1
    What happens instead of what you expect? – mkrieger1 Mar 31 '21 at 10:41
  • Every key has only one row as value connected instead of all rows with the same key (which is always the last row where each key occurs) – Rabinzel Mar 31 '21 at 10:46
  • `key` (`item[0:2]`) will never be in `d` since the actual key you use to update the dictionary is `"RP_" + item[0:2]` – Wouter Mar 31 '21 at 10:56
  • can you add that test dataframe as text? `print(df.to_dict())` and paste the output in your question body – Umar.H Mar 31 '21 at 11:03
  • yes i can. I edited my question – Rabinzel Mar 31 '21 at 11:09
  • @Wouter If that was correct my output would be only empty dictionaries, right? But if I run my "wrong" code and then try d['RP_10'] the output is the row with index 2 (Messung: 10bar, Zahl: 3, Buchstabe: c) – Rabinzel Mar 31 '21 at 11:14
  • @Rabinzel No your output would be exactly as you decribe, only containing the last row for each key. – Wouter Mar 31 '21 at 12:15

2 Answers2

0

Can you please try below code. You do small mistake in if condition.

d={}
for row, item in enumerate(df['Messung']):
    key=item[0:2]
    key = "RP_"+key
    if key not in d:
        d[key] = []
    d[key].append(df.iloc[row])

ALso you can use setdefault() of python.Then your code looks like as below:

d={}
for row, item in enumerate(df['Messung']):
    key=item[0:2]
    key = "RP_"+key
    d.setdefault(key, []).append(df.iloc[row])
  • On my way to the solution I also tried it with setdefault but failed.... Thanks! – Rabinzel Mar 31 '21 at 11:00
  • @Rabinzel taking a guess (and curious) but wouldn't something like `d = {f'RP_{k}': g.to_dict(orient='records') for k, g in df.groupby(df['Messung'].str[:2])}` be more useful? – Jon Clements Mar 31 '21 at 12:07
  • @JonClements First of all, yes, it does the same with the only difference of how to select data. For example d['RP_10']: if i want the value "2" - d['RP_10'][1][1]. with your line of code it is: d['RP_10'][1]['Zahl']. I don't know if i understood correct, but my way has lists to every key, in your way there are dictionaries behind every key, right? – Rabinzel Mar 31 '21 at 18:20
  • @Rabinzel just build a group then... so the equivalent would be: `d = {f'RP_{k}': g for k, g in df.groupby(df['Messung'].str[:2])}` then you get the same as here – Jon Clements Mar 31 '21 at 18:24
  • Since the support here is so nice and I'm not able to answer the question about "more useful" let me explain a little more. I get lots of data with different RailPressure (=RP) from an engine test bench. For every RP I have to do some calculation and plotting. My way is: Excel -> DF, seperate them by RP with the dict, change to np.array, do some calculations and plot each case. I went step by step to solve that so I don't know if I'm too complicated or do unnecessary steps. – Rabinzel Mar 31 '21 at 18:49
0

While trying your solution I noticed, I can even delete the line with key=item[0:2] and directly build my key with 'RP_' and the item[0:2]

d={}
for row, item in enumerate(df['Messung']):
    key = "RP_"+item[0:2]
    d.setdefault(key, []).append(df.iloc[row])
Rabinzel
  • 7,757
  • 3
  • 10
  • 30