I have a dataset that sometimes contains extraneous comments that prevent insertion into SQL because of their size. the comments do not pertain to what Im doing, but are not formatted well, so I cannot routinely find them by for example looking for a symbol that represents their beginning.
What I need is to find every cell that has a length of over 250 characters, and replace it with cell_data[:250] (the first 250 characters of that cells data) bonus points if you can do this by column, as at the end of the day, there are a couple columns in each file I would like to preserve, so I can call for x in dataframe.columns: if x != (column_name to preserve) do the thing
example code below
import numpy as np
import pandas as pd
data = {'country': ['Italy','Spain','Greece','France','Portugal'],
'popu': [61, 46, 11, 65, 10],
'percent': ['fgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgddsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgs','ff','da','vv','d']}
df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])
I would like to be able to pass a function that looks into each column, and if there is more than 250 characters (as in df.percent) then replace those cells with only their first 250 characters.
np.where and df.loc seemed promising at first, but I cant seem to make the conditional depend on length, AND use that area to reassign values at each position