Issue: I have gotten a csv
file (wtih delimiter ~
) that came from a third party, and about 4000 records, and has 150 columns with real column names such as FirstName~LastName~OrderID~City~.....
. But when the file is loaded into a pandas dataframe df
and when I use print(list(df.columns))
it displays the column names as follows (I've simplified it for brevity):
['ÿþA', 'Unnamed: 1', 'Unnamed: 2', 'Unnamed: 3', 'Unnamed: 4',,,,,'Unnamed: 49']
Question: What I may be doing wrong, and how can we fix the issue to simply display the real column names in df? I'm using latest version of python
. I see some relevant articles such as this one but they are all related to one column.
Remark: It's a UTF-16 LE BOM
file. I discovered the issue when in my code, I referenced a column as df['OrdeID']
and I got well know KeyError that means you are refencing a column that does not exist.
Code:
import pandas as pd
df = pd.read_csv('/dbfs/FileStore/tables/MyDataFile.txt', sep='~', low_memory=False, quotechar='"', header='infer', encoding='cp1252')
print(df['OrdId'])
MyDataFile.txt sample:
FirstName~LastName~OrderID~City~.....
Kim~Doe~1234~New York~...............
Bob~Mason~456~Seattle~...............
..................