-2

I have a txt file with multiple columns and each column is different in length and separated by a different number of spaces. I just want to get one column out of that list. For example: I want to get the fifth column out of the list to print.

I want the output to read [155000, 9200, , 570, 75000,......]

Thank you for your help.

1010          Mob/Demob                                             1.000         LS   155,000.00  155,000.00
1020          Provide Site Office                                   1.000         LS     9,200.00    9,200.00
101010        100% Perf. & Maint., and 100% Mater. and Lab. Bond    1.000        LS
101020        Advertise for Substantial Completion                  1.000         LS      570.00      570.00
101030        Contractor Layout                                     1.000     ALOW      75,000.00   75,000.00
101040        Prepare Construction Management Plan                  1.000         LS     2,850.00    2,850.00
101050        S&I 10x20 Mud Mat                                     1.000         LS    10,400.00   10,400.00
101060        Pre-Construction Survey                               1.000         LS      370.00      370.00
101070        Post-Construction Survey                              1.000         LS      370.00      370.00
Robert Columbia
  • 6,313
  • 15
  • 32
  • 40
  • What language do you use? Also, don't post pictures. Post the data in text form (with correct formatting). – Andrej Kesely Jun 02 '21 at 17:18
  • I'm using Python – codernumber5 Jun 02 '21 at 17:25
  • to post the data in correct formatting can i just copy it from the txt file and paste it on the question? (sorry im still new here) – codernumber5 Jun 02 '21 at 17:43
  • You can press `Ctrl`+`K` when edit your code to format the text – Andrej Kesely Jun 02 '21 at 17:44
  • i tried formatting it, not sure if thats what you meant or not – codernumber5 Jun 02 '21 at 17:51
  • How are the "columns" delimited? Meaning, is a person typing this into a text file and just using tabs to line things up? Or is this output by some other software? It's a little strange for a piece of software to output a txt file unless it's comma delimited. I'm going to assume that it's just multiple spaces to make it "look pretty." I'll create a post below based on this assumption. – pedwards Jun 02 '21 at 18:00
  • If this is one-off thing it might be easier to use Excel's Text to Columns feature with Fixed width to create and separate your columns. You can save as a csv and read that in to python easily – Keverly Jun 02 '21 at 18:00
  • 2
    Please don't vandalize your question. – Robert Columbia Jun 02 '21 at 23:29

3 Answers3

2

You can use this example how to parse the text file:

import re

data = []

with open("your_file.txt", "r") as f_in:
    for line in map(str.strip, f_in):
        # skip empty lines
        if not line:
            continue
        # more than 2 whitespaces is our column separator:
        line = re.split(r"\s{2,}", line)
        # if we have 6 columns add last column to our list:
        if len(line) == 6:
            n = int(float(line[5].replace(",", "")))
            data.append(n)
        # if not, add empty string:
        else:
            data.append("")

print(data)

Prints:

[155000, 9200, '', 570, 75000, 2850, 10400, 370, 370]
Andrej Kesely
  • 168,389
  • 15
  • 48
  • 91
0

If you're working in python you can use the readlines() method to break down your data into a list of strings. From there you could split them on spaces and then finally index your data at the relevant column.

See Here for examples of how to work with txt files in python.

Tristan
  • 3
  • 2
0

Pandas has a method to read fixed-width data files. In this case widths specifies the width of each column. I've used io.StringIO to provide the file data as a file-like object for a minimal, reproducible example; but a filename can be passed as well:

import pandas as pd
import io

data = '''\
1010          Mob/Demob                                             1.000         LS   155,000.00  155,000.00
1020          Provide Site Office                                   1.000         LS     9,200.00    9,200.00
101010        100% Perf. & Maint., and 100% Mater. and Lab. Bond    1.000        LS
101020        Advertise for Substantial Completion                  1.000         LS      570.00      570.00
101030        Contractor Layout                                     1.000     ALOW      75,000.00   75,000.00
101040        Prepare Construction Management Plan                  1.000         LS     2,850.00    2,850.00
101050        S&I 10x20 Mud Mat                                     1.000         LS    10,400.00   10,400.00
101060        Pre-Construction Survey                               1.000         LS      370.00      370.00
101070        Post-Construction Survey                              1.000         LS      370.00      370.00
'''

df = pd.read_fwf(io.StringIO(data),widths=(14,54,10,9,10,12),header=None,thousands=',')
print(list(df[4]))

Output:

[155000.0, 9200.0, nan, 570.0, 75000.0, 2850.0, 10400.0, 370.0, 370.0]
Mark Tolonen
  • 166,664
  • 26
  • 169
  • 251