Importing a weird text file into a Pandas DataFrame

Question

I'm having a lot of trouble importing a weirdly formatted text file into a Pandas DataFrame—or at all for that matter. Here are the first several lines of the text file:

 Name        RA       Dec  B      vh   sig  Type   D1  D2
00006-0211 000032.0 -021129 14.3  7323  31   4X   1.3 0.8
00006+2142 000035.6  214054 14.4  6605  32   5  P 1.0 0.7
N7814      000041.1  155203 12.0  1050   4   2A s 6.5 2.7
00010+2256 000101.2  225519 14.0  7301  34   5    1.9 1.0
N7816      000115.2  071203 14.0  5241   5   4    2.0 2.0
N7817      000124.9  202818 12.7  2309   5   4A   4.0 1.1
N7819      000150.3  311138 14.3  4953  10   3B s 2.0 1.8
N7820      000156.7  045513 13.9  3064  19   0    1.6 0.7
N7824      000232.2  063833 14.5  6134  28   2    1.9 1.3
0003+1955  000345.1  195527 14.0  7730  19  -6    0.3 0.3
N   1      000441.3  272550 13.4  4534   6   3A s 1.8 1.2
00056+2644 000535.3  264331 14.4  8741  36   3    1.0 .55
N  12      000610.9  042005 14.5  3941   4   4B R 2.0 1.7

I think the 'Name' column is really throwing everything off. I'm able to perfectly import the 'B' column on by using:

import pandas as pd

df1 = pd.read_fwf('cfa1.txt', skiprows=11, header=None, names['GARBAGE', 'B', 'vh', 'sig', 'Type', 'D1', 'D2'])
df2 = df1[['B', 'vh', 'sig', 'Type', 'D1', 'D2']]
df2.head()

where GARBAGE is all of the junk that isn't importing properly.

I've also tried:

df = pd.read_table('cfa1.txt', skiprows=11, header=None, names=['Name', 'RA', 'Dec', 'B', 'vh', 'sig', 'Type', 'D1', 'D2'])
df.head()

which didn't work. (I have many more failed attempts which I figure aren't worth including) Thank you in advance for your time and consideration!

Just to be clear: If we use your example file, should we still be trying to skip 11 rows? — saintsfan342000, Oct 25 '17 at 02:33
Can you explain more clearly what the problem is? And what you expected instead. — Bill, Oct 25 '17 at 02:34
Oh no sorry! When I pasted in the text file, I didn't include the first 11 rows which had no important data. The 10th row has the column labels. — Bobby Stiller, Oct 25 '17 at 02:35
That string seems to work fine for me, perhaps you can just get away without passing names/header to read_fwf? — Andy Hayden, Oct 25 '17 at 02:36
The problem is that when I go to import the text file, Pandas and NumPy both think the first 3 columns (Name, RA, Dec) are just 1 column, when they contain unique data. — Bobby Stiller, Oct 25 '17 at 02:38
When I tried to read your file with `pd.read_fwf("textfile.txt")` it reads all columns except the 'Dec' and 'B' columns which it joins into one as expected given the alignment of the 'B' column label. If I shift the 'B' label one character to the right all 9 columns read perfectly. — Bill, Oct 25 '17 at 02:44
@BobbyStiller are you using an old version of pandas? 0.20.3 is the latest... — Andy Hayden, Oct 25 '17 at 03:43
@BobbyStiller is the text file actually delimited with varying delimiters? it looks like single space delimiters between the first couple columns, then double space delimiters. — OverflowingTheGlass, Oct 25 '17 at 12:04
Can you convert the file to some *not weird*, valid format prior to the pandas import? I'm assuming that pandas would accept a .csv or something. — alex, Oct 31 '17 at 17:42

Importing a weird text file into a Pandas DataFrame

0 Answers0