-1

Consider a tool generated report file use case.

A tabular data is available as plain text in the report file and I would like to parse through the report to gather some data from the table and present in an html page.

Source   Arrival   Arrival   Departure   Departure   Destination
         Expected  Actual    Expected    Actual      
X        10:00     10:15     10:10       10:25       Z
A        8:30      8:30      8:45        8:50        B

Is it possible to get the exact column name by considering the two rows of header and then fetch value for the records?

Currently I had used crude way of setting flag to get the corresponding values from the required column

mozway
  • 194,879
  • 13
  • 39
  • 75

1 Answers1

0

I would use and read_fwf for this:

import pandas as pd
import io

text = '''Source   Arrival   Arrival   Departure   Departure   Destination
         Expected  Actual    Expected    Actual      
X        10:00     10:15     10:10       10:25       Z
A        8:30      8:30      8:45        8:50        B'''


df = pd.read_fwf(io.StringIO(text), header=[0, 1])
# or from file
# df = pd.read_fwf('file.txt', header=[0, 1])

df.columns = df.columns.map(lambda level: ' '.join([x for x in level if not x.startswith('Unnamed:')]))

print(df)

Output:

  Source Arrival Expected Arrival Actual Departure Expected Departure Actual Destination
0      X            10:00          10:15              10:10            10:25           Z
1      A             8:30           8:30               8:45             8:50           B
mozway
  • 194,879
  • 13
  • 39
  • 75