1

I am trying to use tables in pandas.

The original data look like that (.txt file):

µm nm 1.34E+00 1.39E+00 1.34E+00 1.61E+00 ...

When I manually convert the file from .txt to .csv, by opening it in excel and saving as a .csv file, I obtain something like that:

µm;nm 1.339216;1.388997 1.340324;1.612847 1.341462;1.587352 1.342533;1.686544 ...

Which is working fine in pandas, using the following code:

file =('filename.csv')
df = pd.read_csv(file, sep = ";")
df

dataframe from manually obtained .csv file

Which is what I want. But since I am planning to deal with a lot of those files, I need to process them as batch. So I need to obtain the same dataframe from the original files, which come as .txt.

But if I try to do that from the original data, it looks like this:

enter image description here

The code is as follows:

df2 = pd.read_csv('filename.txt', sep = ";", encoding = 'unicode_escape')
df2.to_csv('filename-2.csv', sep='\t', index=None)
df2

Please note that I use the 'unicode_escape' value to avoid the error message "utf-8' codec can't decode byte 0xb5 in position 0: invalid start byte"

I tried to specify various separators, but without success so far.

I hope someone will be able to help.

Thanks,

Sébastien.

  • 1
    While read `df2`, have you tried giving separator as \t? – srishtigarg Jul 18 '22 at 19:43
  • Post your code as text in the question itself. Images can't be copied, tested, executed or googled. CSV *is* text. It's a text file with fields separated by commas. Python strings are Unicode so there's no reason to escape or unescape anything. – Panagiotis Kanavos Jul 18 '22 at 19:53
  • You'll have to check what separator is used in the file you want and specify it in `read_csv`. CSV is just a text file with separators. If the file uses a different separator you can still use `read_csv` by specifying the correct separator. `;` isn't a comma for example. It's commonly used in half of the world which uses `,` as a decimal separator – Panagiotis Kanavos Jul 18 '22 at 19:56
  • There is something happening when I open the .txt file in file and save it as .csv that I cannot reproduce with pandas... The .txt file looks like this: µm nm 1.34E+00 1.39E+00 1.34E+00 1.61E+00 1.34E+00 1.59E+00 1.34E+00 1.69E+00 After conversion into .csv through excel, I obtain this: µm nm 1.339216 1.388997 1.340324 1.612847 1.341462 1.587352 1.342533 1.686544 1.343659 1.910365 1.344734 1.660482 1.345845 1.585136 1.346938 1.634419 1.348139 1.908091 – Sébastien Zappa Jul 18 '22 at 20:14
  • The problem is that pandas won't opn my .txt file unless I specify something regarding the Unicode. Otherwise, I have the error message "utf-8' codec can't decode..." – Sébastien Zappa Jul 18 '22 at 20:16

1 Answers1

0

While read df2, you should use the same separater as the one given while writing it (assuming it is written from df.

df2 = pd.read_csv('filename.txt', sep = ';')

Or if .txt is a separate file altogether,

df2 = pd.read_csv('filename.txt', sep = '\t')

This should give correctly formatted dataframe.

srishtigarg
  • 1,106
  • 10
  • 24