1

I am trying to load multiple columns of data from a .txt file using python.

My file contains multiple sets of data, and each set have a title.

I want to choose a set and then choose 2 columns from it. I am using genfromtxt to read the .txt file, but it reads the title of the set as a column, so it gives me this kind of error:

Line #2 (got 4 columns instead of 1)

this is an example of my txt file, where TC_14TeV_NLO and TC_13TeV_LO are the titles, and I want to take the first 2 columns of each set:

TC_14TeV_NLO 
1000 1.51100e+01 6.2e-03 4.1e-02%
2000 7.36556e-01 4.4e-04 5.9e-02%
3000 7.85092e-02 5.1e-05 6.5e-02%
4000 1.17810e-02 7.4e-06 6.3e-02%
5000 2.39873e-03 1.3e-06 5.2e-02%
6000 7.18132e-04 2.7e-07 3.7e-02%
7000 3.10281e-04 8.1e-08 2.6e-02%
8000 1.67493e-04 3.3e-08 1.9e-02%
9000 1.01369e-04 2.2e-08 2.2e-02%
10000 6.54776e-05 1.6e-08 2.4e-02%

TC_13TeV_LO
1000 1.04906e+01 1.7e-03 1.7e-02%
2000 4.53170e-01 8.1e-05 1.8e-02%
3000 4.25722e-02 7.9e-06 1.9e-02%
4000 5.80036e-03 1.1e-06 1.9e-02%
5000 1.17278e-03 2.1e-07 1.8e-02%
6000 3.82330e-04 6.1e-08 1.6e-02%
7000 1.78036e-04 2.7e-08 1.5e-02%
8000 9.91945e-05 1.9e-08 1.9e-02%
9000 6.05766e-05 1.6e-08 2.6e-02%
10000 3.92631e-05 1.2e-08 3.0e-02%
Moe
  • 13
  • 3

2 Answers2

1

For your example file you can do this:

import pandas as pd

#read in first set of data, start from the beginning, read 10 lines
df1=pd.read_csv('exfile.txt', sep=" ",skiprows=None,nrows=10)

#read in the second set of data, do not start at the beginning of file but skip 11 rows, read the next 10 lines
df2=pd.read_csv('exfile.txt', sep=" ",skiprows=11,nrows=10)

#choose any two cols, for example:
print(df1['TC'])
print(df2['13TeV'])

Otherwise, I suggest to split to give each set its own file, than use pandas.read_csv to read them in.

zabop
  • 6,750
  • 3
  • 39
  • 84
  • (also, suitable title for each of your columns would make your example file better, header is missing last column name) – zabop Sep 01 '18 at 08:42
  • thanks for your answer Pal. The think is that I have a big number of sets so counting will be too much.I was thinking of a way to let the code recognise each set from its title. the titles are not (TC 13TeV LO) but (TC_13TeV_LO). So its not a title for a column, it is for the whole set. – Moe Sep 01 '18 at 13:11
1

First, define a function to split a file into sections. This is a generator, which produces a sequence of lists of lines:

def split_sections(infile):
    """Generate a sequence of lists of lines from infile delimited by blank lines.
    """
    section = []
    for line in infile:
        if not line.strip():
            if section:
                yield section
                section = []
        else:
            section.append(line)
    if section: # last section may not have blank line after it
        yield section

Then your actual task is fairly simple:

with open(path) as infile:
    for lines in split_sections(infile):
        heading = lines[0].rstrip()
        data = np.genfromtxt(lines[1:], usecols=[0,1])
        print(heading)
        print(data)
John Zwinck
  • 239,568
  • 38
  • 324
  • 436
  • thanks for your answer John. I've tried your code but I'm still having the same error: ValueError: Some errors were detected ! Line #1 (got 1 columns instead of 2) Line #14 (got 1 columns instead of 2) another think is that I have different titles (TC , SSM, NU, ...), so do I have to do multiple with open() as infile? – Moe Sep 01 '18 at 13:02
  • @Moe: I have completely reworked my answer to be more general and handle the cases you've asked about. I also tested it on your original example text. – John Zwinck Sep 02 '18 at 01:53