Splitting a data file in python or pandas

Question

Have a data file consisting of a string (no tabs and no spaces and no column names). First two columns are equivalent to one piece of data, third column is another and 4 thru 7 are something else, etc.

How can I get these strings into a dataframe with named columns? All the answers I've seen assume that I have tabs, spaces etc.

Could you give an example of your data? I'm not sure what it means to have "columns" but also say that there are no "tabs, spaces etc." between the values. How do you know where one value stops and the next starts? — user94559, Jul 17 '16 at 19:51
Are you describing a "fixed-width format" file where each column is defined by a strict number of characters? If so, look at `pandas.read_fwf`. — BrenBarn, Jul 17 '16 at 19:58

score 3 · Answer 1 · answered Jul 17 '16 at 19:59

You can use pd.read_fwf with widths parameter. A file with these contents:

ieafxfrjzyxfxkymiwuy
lqqmceegjnbjpxnidygr
zssawojanxbrfwkgbvnl
ahcwwhtayjwozzrgfftt

Becomes this:

pd.read_fwf('test.txt', widths = [2, 4, 3, 11], names=['first', 'second', 'third', 'fourth'])
Out[226]: 
  first second third       fourth
0    ie   afxf   rjz  yxfxkymiwuy
1    lq   qmce   egj  nbjpxnidygr
2    zs   sawo   jan  xbrfwkgbvnl
3    ah   cwwh   tay  jwozzrgfftt

That is exactly what I needed. @ayhan - Thank you. – TIll Jul 19 '16 at 02:09 — TIll, Jul 19 '16 at 02:09

Splitting a data file in python or pandas

1 Answers1