0

I have a dataframe with 5000 records.

Data:
        Month   Heat Number  wts   gcs 
         1        HA        8.2    98
         1        HB        7.6    86
         2        HB        4.2    76
         3        HC        6.9    46
         4        HD        7.4    36
         5        HD        9.8    26
         6        HF        10.8   16

The 2nd letter of the Heat number column denotes month.

A for January

B for Feb

C for March and so on..

Condition:

If the record in month column is 1 , the corresponding value in heat number column should always be 'HA'

If month is 2, the corresponding value in heat number column should always be 'HB'

If month is 3, the corresponding value in heat number column should always be 'HC'

If month is 4, the corresponding value in heat number column should always be 'HD'

If month is 5, the corresponding value in heat number column should always be 'HE'

Any records with incorrect match should be deleted(that entire row in the dataframe should be deleted) Example in the data the 2nd row should be deleted since the month is 1 and heat number is HB(heat number should have been HA)

Desired Output:

Month   Heat Number  wts   gcs 
 1        HA        8.2    98
 2        HB        4.2    76
 3        HC        6.9    46
 4        HD        7.4    36
 6        HF        10.8   16
Ken Y-N
  • 14,644
  • 21
  • 71
  • 114
Kaaviya V
  • 77
  • 8
  • 1
    What have you tried? [This question and answers](https://stackoverflow.com/q/18172851/1270789) should give you hints. – Ken Y-N Jul 16 '20 at 00:06

2 Answers2

1

We need create the condition dataframe first then merge

df=df.merge(pd.DataFrame({'Month': [1,2,3,4,5,6],'Heat Number':['HA','HB','HC','HD','HE','HF']}),how='inner')
   Month Heat Number   wts  gcs
0      1         HA   8.2   98
1      2         HB   4.2   76
2      3         HC   6.9   46
3      4         HD   7.4   36
4      6         HF  10.8   16
BENY
  • 317,841
  • 20
  • 164
  • 234
0

This plays off of the ord function in python to filter out rows that do not match with the Month column :

(
    df.assign(temp=lambda x: [ord(word[-1]) - 64 for word in x["Heat Number"]])
    .query("Month == temp")
    .drop("temp", axis=1)
)

    Month   Heat Number wts gcs
0   1       HA          8.2 98
2   2       HB          4.2 76
3   3       HC          6.9 46
4   4       HD          7.4 36
6   6       HF         10.8 16
sammywemmy
  • 27,093
  • 4
  • 17
  • 31