I need to clean up some data. For items in a dataframe that are of the format '<x' I want to return 'x/2' so if the cell contents is '<10' it should be replaced with '5', if the cell contents is '<0.006' it should be replace with 0.003 etc. I want changed cells to be formatted red and bold. I have the following code which operates in two steps and each step does what I want (almost) but I get a TypeError: 'float' object is not iterable when I try and chain them using : fixed_df=df.style.apply(color_less_than,axis=None).applymap(lessthan)
Note that the actual dataset may be thousands of rows and will contain mixed and Dummy data and code :
import pandas as pd
df = pd.DataFrame({'A': ['<10', '20', 'foo', '<30', '40'],
'B': ['baz', '<dlkj', 'bar', 'foo', '<5']})
def color_less_than(x):
c1 = 'color: red; font-weight: bold'
c2 = ''
df1 = pd.DataFrame(c2, index=x.index, columns=x.columns)
for col in x.columns:
mask = x[col].str.startswith("<")
#display(mask)
df1.loc[mask, col] = c1
return df1
def lessthan(x):
#for x in df:
if isinstance(x, np.generic):
return x.item()
elif type(x) is int:
return x
elif type(x) is float:
return x
elif type(x) is str and x[0]=="<":
try:
return float(x[1:])/2
except:
return x
elif type(x) is str and len(x)<10:
try:
return float(x)
except:
return x
else:
return x
coloured=df.style.apply(color_less_than,axis=None)
halved=df.applymap(lessthan)
display(coloured)
display(halved)
Note that the df item <dlkj does not display at all after applying color_less_than and I don't know why, I want it to be returned unformatted as it should not be changed (it's a string and cant be 'halved'). I have been trying to use the boolean mask to do both the calculation and the formatting but I can't get it to work.