I am trying to read a bunch of tsv dataset files with (normally) three columns
Pandas df of a file looks like this
But some of the files have extra two values here and there, also separated by tabs. So i made a code to solve this problem, by reading files as having five columns and then removing two of them.
for i in range(101, 154):
print(i)
# read a file into pandas df
thisfile = pd.read_csv(f'pgc-csv/2022_06_22_TRPV1_AAV488_6x10-11_No1/2022-06-22_TRPV1_AAV488_6x10-11_No{i}.txt',
skiprows=10, header = None,
names = ['Время, s', 'Laser, V', 'ECG lead', 'empty1', 'empty2'],
encoding = 'unicode_escape', delimiter = '\t',
)
#delete extra columns
del thisfile['empty1']
del thisfile['empty2']
But for that problem files I get an error
"DtypeWarning: Columns (3) have mixed types. Specify dtype option on import or set low_memory=False.'
I tried to usу a method from this article: https://www.roelpeters.be/solved-dtypewarning-columns-have-mixed-types-specify-dtype-option-on-import-or-set-low-memory-in-pandas/
for i in range(101, 154):
print(i)
# read a file into pandas df
thisfile = pd.read_csv(f'pgc-csv/2022_06_22_TRPV1_AAV488_6x10-11_No1/2022-06-22_TRPV1_AAV488_6x10-11_No{i}.txt',
skiprows=10, header = None,
names = ['Время, s', 'Laser, V', 'ECG lead', 'empty1', 'empty2'],
encoding = 'unicode_escape', delimiter = '\t',
dtype={'Время, s': float, 'Laser, V':float, 'ECG lead': float, 'empty1': 'str', 'empty2': 'str'})
#delete extra columns
del thisfile['empty1']
del thisfile['empty2']
But i still get the errors: Screenshot
The first question is: how can remove this error?
The second question is that, as i understand, there are some values with datatypes other then float in the df.
I tried to get them with this:
ecgfile[lambda x: not isinstance(x['Время, s'], float)]
And this:
ecgfile[lambda x: type(x['Время, s']) is not float]
But didn't succeed. So i need an advice on this part, too.
The last question is, maybe, there is some averall better way to do all this procedures? Thank you)