0

I have a large .txt file and I'd like to read each column as a list. The file has 9 columns of delimited floats, the first line (of a few thousand) is:

0.49406565E-323  0.29532530E+003  0.89244837E+001  0.20901651E-002  0.34989878E+001  0.11594090E+000  0.34025716E-001  0.33723126E+001  0.27954433E+000  0.80757378E-001  0.50813056E+001

I'm reading my file like this:

colnames = ['weight', 'likelihood', 'A_0', 'w_0', 'p_0', 'A_1', 'w_1', 'p_1', 'A_2', 'w_2', 'p_2']
data = pandas.read_csv('data.txt', names=colnames)

weights = data.weight.tolist()
A_0     = data.A_0.tolist()

The first column is the weight, and the rest are parameters and I want to perform a weighted average calculation of all the parameters with respect to their weights.

But if I print weights, for example, it returns the entire file, and weights[0] is the first row of the file.

For completion my weighted average would be something like:

weighted_A_0 = numpy.average(A_0, weights=weights)

Perhaps there's a neater way with pandas and numpy?

Thanks!

rh1990
  • 880
  • 7
  • 17
  • 32

3 Answers3

3

Since you have not passed any separator to read_csv function, it takes comma as a default delimiter. Your file data.txt doesn't contain any comma, hence it takes the whole data into first column (weight).

data = pandas.read_csv('data.txt', names=colnames, delim_whitespace=True)

delim_whitespace : boolean, default False

Specifies whether or not whitespace (e.g. ' ' or ' ') will be used as the sep.

Equivalent to setting sep='\s+'. If this option is set to True, nothing should be passed in for the delimiter parameter.

Chankey Pathak
  • 21,187
  • 12
  • 85
  • 133
1

by default pd.read_csv looks for comma separated, but you can specify your delimiter using the sep argument eg:

df = pd.read_csv('data.txt', names=colnames, sep='\t')

for tab separated - would that help?


Final:

turns out it was ' ' delimited so we made it work with

df = pd.read_csv('data.txt', names=colnames, sep='\s+')
Community
  • 1
  • 1
Stael
  • 2,619
  • 15
  • 19
  • Nope, just tried it, I get the exact same results, thanks though! – rh1990 Jul 25 '17 at 09:09
  • 1
    what's your file separated by? the snipet looks like 2 spaces? have you tried that? (`sep=' '`) – Stael Jul 25 '17 at 09:09
  • I assumed it was tab separated, but with `(sep = ' ')` it worked, I do get this error though: `ParserWarning: Falling back to the 'python' engine because the 'c' engine does not support regex separators (separators > 1 char and different from '\s+' are interpreted as regex); you can avoid this warning by specifying engine='python'.` – rh1990 Jul 25 '17 at 09:11
  • 1
    a) that's probably fine and b) you could use `sep='\s+'` as that should do the same thing. – Stael Jul 25 '17 at 09:12
  • from the [docs](https://pandas.pydata.org/pandas-docs/stable/generated/pandas.read_csv.html): >The C engine is faster while the python engine is currently more feature-complete. – Stael Jul 25 '17 at 09:14
  • ah nice, your (b) answer worked better with no errors, cheers! – rh1990 Jul 25 '17 at 09:14
0
with open(r'C:/input_data.txt') as f:
   lines = f.read().splitlines()
   lines = [line.split() for line in lines]
   #print lines
labels = [column labels]
df = pd.DataFrame.from_records(lines , columns=labels)