0

I have a dataset that sometimes contains extraneous comments that prevent insertion into SQL because of their size. the comments do not pertain to what Im doing, but are not formatted well, so I cannot routinely find them by for example looking for a symbol that represents their beginning.

What I need is to find every cell that has a length of over 250 characters, and replace it with cell_data[:250] (the first 250 characters of that cells data) bonus points if you can do this by column, as at the end of the day, there are a couple columns in each file I would like to preserve, so I can call for x in dataframe.columns: if x != (column_name to preserve) do the thing

example code below

import numpy as np 
import pandas as pd

data = {'country': ['Italy','Spain','Greece','France','Portugal'],
        'popu': [61, 46, 11, 65, 10],
        'percent': ['fgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgddsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgsfgdsgfdsgsdgfdsgsdfgsgsgsfdgsdfgsgsgfsgsgfsgsgsgsdfgdfgdsfgdsgsfgdsgs','ff','da','vv','d']}

df = pd.DataFrame(data, index=['ITA', 'ESP', 'GRC', 'FRA', 'PRT'])

I would like to be able to pass a function that looks into each column, and if there is more than 250 characters (as in df.percent) then replace those cells with only their first 250 characters.

np.where and df.loc seemed promising at first, but I cant seem to make the conditional depend on length, AND use that area to reassign values at each position

AlbinoRhino
  • 467
  • 6
  • 23
  • Does this answer your question? [How to extract first 8 characters from a string in pandas](https://stackoverflow.com/questions/51607400/how-to-extract-first-8-characters-from-a-string-in-pandas) – AMC Mar 05 '20 at 01:28

1 Answers1

3

You can simply extract the 250 first characters for all rows, if a string has less than 250 characters it won't change it

df['percent'] =df['percent'].str[:250]
fmarm
  • 4,209
  • 1
  • 17
  • 29
  • wow! I should have expected this since I knew python is good at handling boundaries (did you know range(0,0) and range (0,-1) both work without erroring out and have a length of 0 ) fantastic, thank you so much! – AlbinoRhino Mar 04 '20 at 23:25