5

I made a function to clean up any HTML code/tags from strings in my dataframe. The function takes every value from the data frame, cleans it with the remove_html function, and returns a clean df. After converting the data frame to string values and cleaning it up I'm attempting to convert where possible the values in the data frame back to integers. I have tried try/except but don't get the result that I want. This is what I have at the moment:

def clean_df(df):
    df = df.astype(str)
    list_of_columns = list(df.columns)
    for col in list_of_columns:
        column = []
        for row in list(df[col]):
            column.append(remove_html(row))
            try:
                return int(row)
            except ValueError:
                pass

        del df[col]

        df[col] = column

    return df

Without the try/except statements the function returns a clean df where the integers are strings. So its just the try/except statement that seems to be an issue. I've tried the try/except statements in multiple ways and none of them return a df. The current code for example returns an 'int' object.

RF_PY
  • 343
  • 1
  • 3
  • 9

4 Answers4

2

insert the columm.append into the try:

for col in list_of_columns:
    column = []
    for row in list(df[col]):
        try:
            column.append(remove_html(row))
        except ValueError:
            pass

    del df[col]

    df[col] = column

return df
Steven G
  • 16,244
  • 8
  • 53
  • 77
0

consider the pd.DataFrame df

df = pd.DataFrame(dict(A=[1, '2', '_', '4']))

enter image description here

You want to use the function pd.to_numeric...
Note
pd.to_numeric operates on scalars and pd.Series. It doesn't operate on a pd.DataFrame
Also
Use the parameter errors='coerce' to get numbers where you can and NaN elsewhere.

pd.to_numeric(df['A'], 'coerce')

0    1.0
1    2.0
2    NaN
3    4.0
Name: A, dtype: float6

Or, to get numbers where you can, and what you already had elsewhere

pd.to_numeric(df['A'], 'coerce').combine_first(df['A'])

0    1
1    2
2    _
3    4
Name: A, dtype: object

you can then assign it back to your df

df['A'] = pd.to_numeric(df['A'], 'coerce').combine_first(df['A'])
piRSquared
  • 285,575
  • 57
  • 475
  • 624
0

Works like this:

def clean_df(df):
df = df.astype(str)
list_of_columns = list(df.columns)
for col in list_of_columns:
    column = []
    for row in list(df[col]):
        try:
            column.append(int(remove_html(row)))
        except ValueError:
            column.append(remove_html(row))

    del df[col]

    df[col] = column

return df
RF_PY
  • 343
  • 1
  • 3
  • 9
0

Use the try/except in a function and use that function with DataFrame.applymap()

df = pd.DataFrame([['a','b','1'],
                   ['2','c','d'],
                   ['e','3','f']])
def foo(thing):
    try:
        return int(thing)
    except ValueError as e:
        return thing

>>> df[0][2]
'e'
>>> df[0][1]
'2'
>>> df = df.applymap(foo)
>>> df[0][2]
'e'
>>> df[0][1]
2
>>>
wwii
  • 23,232
  • 7
  • 37
  • 77