select specific rows from dataframe and perform calculations for new column

Question

i have a dataframe that Looks like this.

       Task[ms]                              Funktion  ...     min     max
  0        1              CALL_TK_CDDio_PFC_BEGIN_1MS  ...   0.640000   3.360000
  1        1                       vAdcD_MainFunction  ...  21.280001  25.920000
  2        1                          vPressE_Main1ms  ...  17.120001  81.279999
  3        1  vPositionSensorPwm_MainFunction_Fast_In  ...   9.920000  13.760000
  4        1                           CDDIO_1MS_1_IN  ...   2.240000   5.280000

i have to select rows corresponding to this column Name. There are 146 rows df['Messvariable'] .This is the Messvariable column of dataframe

0      timeslices[0].profilerDataProcess[0]_C0[us]
1      timeslices[0].profilerDataProcess[1]_C0[us]
2      timeslices[0].profilerDataProcess[2]_C0[us]
3      timeslices[0].profilerDataProcess[3]_C0[us]
4      timeslices[0].profilerDataProcess[4]_C0[us]
                
141    timeslices[9].profilerDataProcess[0]_C0[us]
142    timeslices[9].profilerDataProcess[1]_C0[us]
143    timeslices[9].profilerDataProcess[2]_C0[us]
144    timeslices[9].profilerDataProcess[3]_C0[us]
145    timeslices[9].profilerDataTask_C0[us]

I want to select specific rows by this column and perfom a Operation like this

 while  df['Messvariable'].str.contains("timeslices[1]"):
   df['CPU_LOAD']=df['max']/(10000*2)

and similarly for all the remaining timeslices with different calculations. It does not work.

str.contains Returns empty dataframe.

Is there any other method of doing it?

@DanilaGanchar yeah added . It has 146 rows with timeslices ranging from 0 to 9 — shankar ram, Jul 20 '20 at 13:59

Danila Ganchar · Accepted Answer · 2020-07-21T07:31:58.507

1

The main problem is regex=True default argument (pat works a regular expression). Just set argument to False or you can do it using startswith() or find():

df = pd.DataFrame.from_dict({
    'Messvariable': ('timeslices[1]', 'timeslices[1]', 'empty', 'empty'),
    'max': (1, 2, 3, 4),
})

mask = df['Messvariable'].str.contains('timeslices[1]', regex=False)
# or
# mask = df['Messvariable'].str.find('timeslices[1]') != -1
# or
# mask = df['Messvariable'].str.startswith('timeslices[1]')
df['CPU_LOAD'] = 0
df.loc[mask, 'CPU_LOAD'] = df[mask]['max'] / (10000 * 2)
print(df.head())

#    Messvariable  max  CPU_LOAD
# 0  timeslices[1]    1   0.00005
# 1  timeslices[1]    2   0.00010
# 2          empty    3   0.00000
# 3          empty    4   0.00000

Updated. For different calculations better to use apply with custom function:

df['CPU_LOAD'] = 0

def set_cpu_load(x):
    if x['Messvariable'].startswith('timeslices[1]'):
        x['CPU_LOAD'] = x['max'] / (10000 * 2)
    elif x['Messvariable'].startswith('timeslices[2]'):
        pass  # other calculation
    # elif ...
    return x

df = df.apply(set_cpu_load, axis=1)

edited Jul 21 '20 at 07:31

answered Jul 20 '20 at 14:26

Danila Ganchar

10,266
13
49
75

Thank you for your answer. but doing this selects all the 146 rows even when specified as timeslices[1]. – shankar ram Jul 20 '20 at 14:56
@shankarram my solution works only for `timeslices[1]`. you need to update all except `timeslices[1]`? – Danila Ganchar Jul 20 '20 at 15:17
i need to update for all I mean for each timeslices from 0 to 9. There is a different calculation for each timeslices. so for example when i run this code .str.contains('timeslices[1]', regex=False) code, It should return rows from all the columns that corresponding to the timeslices[1]. But in this case i still get 146 all the rows – shankar ram Jul 20 '20 at 20:52
i tried the same but i did not use the apply before. thankyou. – shankar ram Jul 21 '20 at 13:12

select specific rows from dataframe and perform calculations for new column

1 Answers1