1

I am trying to read a bunch of tsv dataset files with (normally) three columns

Pandas df of a file looks like this

But some of the files have extra two values here and there, also separated by tabs. So i made a code to solve this problem, by reading files as having five columns and then removing two of them.

for i in range(101, 154): 
    print(i)
    # read a file into pandas df
    thisfile = pd.read_csv(f'pgc-csv/2022_06_22_TRPV1_AAV488_6x10-11_No1/2022-06-22_TRPV1_AAV488_6x10-11_No{i}.txt', 
                      skiprows=10, header = None, 
                      names = ['Время, s', 'Laser, V', 'ECG lead', 'empty1', 'empty2'], 
                      encoding = 'unicode_escape', delimiter = '\t', 
                      )
    #delete extra columns
    del thisfile['empty1']
    del thisfile['empty2']

But for that problem files I get an error

"DtypeWarning: Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.'

I tried to usу a method from this article: https://www.roelpeters.be/solved-dtypewarning-columns-have-mixed-types-specify-dtype-option-on-import-or-set-low-memory-in-pandas/

for i in range(101, 154):
    print(i)
    # read a file into pandas df
    thisfile = pd.read_csv(f'pgc-csv/2022_06_22_TRPV1_AAV488_6x10-11_No1/2022-06-22_TRPV1_AAV488_6x10-11_No{i}.txt', 
                      skiprows=10, header = None, 
                      names = ['Время, s', 'Laser, V', 'ECG lead', 'empty1', 'empty2'], 
                      encoding = 'unicode_escape', delimiter = '\t', 
                      dtype={'Время, s': float, 'Laser, V':float, 'ECG lead': float, 'empty1': 'str', 'empty2': 'str'})
    #delete extra columns
    del thisfile['empty1']
    del thisfile['empty2']

But i still get the errors: Screenshot

The first question is: how can remove this error?

The second question is that, as i understand, there are some values with datatypes other then float in the df.

I tried to get them with this:

ecgfile[lambda x: not isinstance(x['Время, s'], float)]

And this:

ecgfile[lambda x: type(x['Время, s']) is not float]

But didn't succeed. So i need an advice on this part, too.

The last question is, maybe, there is some averall better way to do all this procedures? Thank you)

Diana
  • 63
  • 4
  • 1
    Maybe have a look at: https://stackoverflow.com/questions/24251219/pandas-read-csv-low-memory-and-dtype-options ? – thoroc Nov 28 '22 at 10:59

0 Answers0