4

I'm trying to read a fixed-width file into Python using Pandas but the first two columns are returned as one.

Here is a sample of the file I am trying to read in:

Some header information
          Date            day           value
    01/01/2015         000001           3.14
    01/02/2015              2           1.59

and here is my code:

import pandas as pd

my_data = pd.read_fwf(my_file, skiprows=1)

but upon inspection of my_data the first two columns are not separated:

> my_data.keys()
array(['Date            day', 'value'])

I know that my columns all have a width of 15 characters -- however I have several files with different numbers of columns and the widths option seems to expect a known number of columns (e.g. [(0, 15), (16, 30), ...]) rather an being able to specify the widths but not the number of columns.

Does anyone know how to get pandas to recognize the first two columns are distinct?

unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
Ellis Valentiner
  • 2,136
  • 3
  • 25
  • 36
  • 1
    You may get better joy just reading it in as a plain text file: `my_data = pd.read_csv(my_file, skiprows=1, sep='\s+')` – EdChum Sep 28 '15 at 16:37
  • I edited the data you posted so the code exhibits the behavior you describe. Please check to see if the change is consistent with your real data. – unutbu Sep 28 '15 at 16:37
  • @unutbu thanks for solving the problem –– apparently this is easily replicated by making the value longer than the header. – Ellis Valentiner Sep 28 '15 at 16:51
  • 1
    @user12202013: Ah -- that's insightful. It appears the `FixedWidthReader.detect_colspecs` method does not ignore rows specified by `skiprows=1`. A workaround would be to open the file, advance the filehandle to skip the first row, and then pass the filehandle as the first argument to `pd.read_fwf`. – unutbu Sep 28 '15 at 16:59

0 Answers0