0

I need to clean up some data. For items in a dataframe that are of the format '<x' I want to return 'x/2' so if the cell contents is '<10' it should be replaced with '5', if the cell contents is '<0.006' it should be replace with 0.003 etc. I want changed cells to be formatted red and bold. I have the following code which operates in two steps and each step does what I want (almost) but I get a TypeError: 'float' object is not iterable when I try and chain them using : fixed_df=df.style.apply(color_less_than,axis=None).applymap(lessthan)

Note that the actual dataset may be thousands of rows and will contain mixed and Dummy data and code :

import pandas as pd
df = pd.DataFrame({'A': ['<10', '20', 'foo', '<30', '40'],
                   'B': ['baz', '<dlkj', 'bar', 'foo', '<5']})



def color_less_than(x):
    c1 = 'color: red; font-weight: bold'
    c2 = ''
    df1 = pd.DataFrame(c2, index=x.index, columns=x.columns)
    for col in x.columns:
        mask = x[col].str.startswith("<")
        #display(mask)
        df1.loc[mask, col] = c1
    return df1

def lessthan(x):
    #for x in df:    
        if isinstance(x, np.generic):
            return x.item()
        elif type(x) is int:
            return x
        elif type(x) is float:
            return x

        elif type(x) is str and x[0]=="<":
            try:
                return float(x[1:])/2
            except:
                return x
        elif type(x) is str and len(x)<10:
            try:
                return float(x)
            except:
                return x
        else:
            return x
     
coloured=df.style.apply(color_less_than,axis=None)

halved=df.applymap(lessthan)


display(coloured)
display(halved)
 

Note that the df item <dlkj does not display at all after applying color_less_than and I don't know why, I want it to be returned unformatted as it should not be changed (it's a string and cant be 'halved'). I have been trying to use the boolean mask to do both the calculation and the formatting but I can't get it to work.

flashliquid
  • 500
  • 9
  • 23
  • @jezrael styling component based on this answer https://stackoverflow.com/questions/50141630/python-pandas-style – flashliquid Feb 17 '23 at 08:31

2 Answers2

0

This code will looped through the entire dataset and change any value containing '<' + integer||float to (int||float/2). I will then also check to see if the value is a string such as 'dlkj' and then add the color/bold style to the cell. Might have to test the line of code though, I did not attempt to do it.

for col in df:
    for value in df[col].values:
        if '<' in value:
            num = value.split('<')[1]
            try:
                df[col] = df[col].replace([value], int(num)/2)
            except ValueError:
                try:
                    df[col] = df[col].replace([value], float(num)/2)
                except ValueError:
                    print(num) # <-- should be your '<dlkj' value
                    # not sure if this line of code will work or not, wasnt able to test it
                    #df.style.set_properties(subset=df[col][value],**{'color': 'red', 'font-weight': 'bold'})
Nolan Walker
  • 352
  • 1
  • 7
0

Without the style mapping, the desired output DF can be reached like so:

df = pd.DataFrame({'A': ['<10', '20', 'foo', '<30', '40'],
                   'B': ['baz', '<dlkj', 'bar', 'foo', '<5']})
for col in df.columns:
    mask = df[col].str.match('<[0-9]+$|<[0-9]+[.][0-9]+$')
    tmp = pd.to_numeric(df[col].str.slice(1), errors='coerce')
    df[col] = np.where(mask, tmp/2, df[col])

print(df)

#       A      B
# 0   5.0    baz
# 1    20  <dlkj
# 2   foo    bar
# 3  15.0    foo
# 4    40    2.5
JarroVGIT
  • 4,291
  • 1
  • 17
  • 29