Compare values in multiple columns and add a new value in another column in Python

Question

I have a house rent price data as follows:

import pandas as pd
import numpy as np
data = {
    "HouseName": ["A", "A", "B", "B", "B"],
    "Type": ["OneRoom", "TwoRooms", "OneRoom", "TwoRooms", "ThreeRooms"],
    "Jan_S": [1100, 1776, 1228, 1640, np.NaN],
    "Feb_S": [1000, 1805, 1231, 1425, 1800],
    "Mar_S": [1033, 1748, 1315, 1591, 2900],
    "Jan_L": [1005, np.NaN, 1300, np.NaN, 7000]
}
df = pd.DataFrame.from_dict(data)
print(df)

  HouseName        Type   Jan_S  Feb_S  Mar_S   Jan_L 
0         A     OneRoom  1100.0   1000   1033  1005.0 
1         A    TwoRooms  1776.0   1805   1748     NaN 
2         B     OneRoom  1228.0   1231   1315  1300.0 
3         B    TwoRooms  1640.0   1425   1591     NaN 
4         B  ThreeRooms     NaN   1800   2900  7000.0

I need to realize two things: first, I want to find a reasonable rent price for January based on columns 'Jan_S', 'Feb_S', 'Mar_S', 'Jan_L'. Here S and L mean two different data sources, both of them may have outliers and nans but data from S will be taken as final price for January at priority. Second, For the same HouseName I need to check and make sure that the price of one room is lower than two rooms, and prices of two rooms is lower than three rooms. My final results will look like this:

HouseName        Type    Jan_S    Feb_S  Mar_S   Jan_L  
0         A     OneRoom  1100.0   1000   1033  1005.0     
1         A    TwoRooms  1776.0   1805   1748     NaN     
2         B     OneRoom  1228.0   1231   1315  1300.0   
3         B    TwoRooms  1640.0   1425   1591     NaN   
4         B  ThreeRooms     NaN   1800   2900  7000.0    

      Result(Jan)  
0         1100  
1         1776  
2         1228  
3         1640  
4         1800

My idea is check if Jan_S is in range of 0.95 and 1.05 of Jan_L, if yes, take Jan_S as final result, otherwise, continue to check a value from Feb_S as Jan_S.

Please share any ideas that you might have to deal with this problem in Python. Thanks! Here are some references which may helps.

Find nearest value from multiple columns and add to a new column in Python

Compare values under multiple conditions of one column in Python

Check if values in one column is in interval values of another column in Python

Shouldn't `Result(Jan)[0]` be 1000 since 1100 > 1100 * .95 (or 1100 > 1000 * 1.05)? — gosuto, Dec 29 '18 at 15:40
And what exactly is the role of the L columns? Why are you falling back onto `Feb_S` and not `Jan_L` when it is not within the 5% range or when it is `NaN`? — gosuto, Dec 29 '18 at 15:41
Thanks for your reply. Column L give another different source of the room rent prices for reference. — ah bon, Dec 29 '18 at 16:20

MrE · Accepted Answer · 2018-12-31T17:19:36.547

1

You can use fillna for this.

If you want to have a conditional on selection of columns, then you need to figure the logic to filter the columns to pick the values from.

I'm showing the logic using the min() of all price columns

# filter out the price columns
price_cols = df.columns[~df.columns.isin(['HouseName','Type', 'Jan_S'])]

# then figure out the logic to filter the columns you need and use fillna
# here with the min of all columns as example
df['Jan_S'] = df['Jan_S'].fillna(df[price_cols].apply(min, axis=1))

edited Dec 31 '18 at 17:19

answered Dec 30 '18 at 05:54

MrE

19,584
12
87
105

Thanks. I have tried with your method, it doesn’t change the column of Jan_S. – ah bon Dec 30 '18 at 07:22
No problem. But for Jan_S of last row in example, I still get NaN with your solution, is that ok? Shouldn't it apply with value of Feb_S 1800? – ah bon Dec 31 '18 at 15:43
1

it's an example. sorry I messed up again: you need to exclude the Jan_S column frkm the columns to use. editing – MrE Dec 31 '18 at 17:18

Compare values in multiple columns and add a new value in another column in Python

1 Answers1