4

I would like to import 10K csv files generated by 3rd party app with UCS-2 LE coding. I wouldn't like to use csv reader as in example Python UTF-16 as there are so many files.

Below you can find my code, where I'm trying to read just one. I'm using Python 3.4 and Pandas 0.18.1

Sample file to download.

MWE:

import pandas as pd

df = pd.read_csv('1.csv', 
             encoding="mbcs",
             skip_blank_lines=True,
             error_bad_lines=False,
             decimal=',',
             sep='\s+')

I got an error:

CParserError: Error tokenizing data. C error: EOF inside string starting at line 17

Community
  • 1
  • 1
Michal
  • 1,927
  • 5
  • 21
  • 27

1 Answers1

5

Actually I don't know how your expected output might be, but I'm reading your files with:

df = pd.read_csv('1.csv', encoding="utf-16", skip_blank_lines=True, error_bad_lines=False, decimal=',', sep='\s+', skiprows=5)

obtaining something like:

In [17]: df.head()
Out[17]: 
  Oznaczenie techniczne  Wartość Jednostka                Opis obiektu  \
0  PPHS:LPlt'Ahu'CumEg1    488.0        GJ  Energia skumulowana chłodu   
1  PPHS:LPlt'Ahu'CumVlm  57263.0        m3        Objętość skumulowana   
2      PPHS:LPlt'Ahu'Fl     31.6      m3/h                    Przepływ   
3     PPHS:LPlt'Ahu'Pwr    111.0        kW                         Moc   
4     PPHS:LPlt'Ahu'TFl     12.7        °C       Temperatura zasilania   

  Parameter   Value Timestamp  
0     PrVal  2016-07-27 19:55  
1     PrVal  2016-07-27 19:55  
2     PrVal  2016-07-27 19:55  
3     PrVal  2016-07-27 19:55  
4     PrVal  2016-07-27 19:55  

Basically I'm skipping the first 5 rows (related to the report of the file, that actually mess the file formatting). Hope that helps.

Fabio Lamanna
  • 20,504
  • 24
  • 90
  • 122
  • I tried your code using Pandas 0.16.1 and got very weird symbols but now everything is ok. Thank you ! – Michal Aug 16 '16 at 15:16