Reading in ASCII file with uncommon formatting in Python

Question

I've never worked with importing data from an ASCII file and I've noticed that different ASCII files have different formats so trying to find a general solution that works for any format has proven challenging for me.

I have a .dat (ASCII) file that I need to read in and extract the variables (see snippet of txt at bottom of question). Below is code from different attempts (separated by ###) of me trying to figure out how to read in the data.

f_41 = open(fileRS41, 'r')
data_41 = f_41.read()
for line in data_41:
    print(repr(line))
data_41.close()

############################

f = open(fileRS41, 'r')

# Read and ignore header lines
header1 = f.readline()
header2 = f.readline()
header3 = f.readline()

# Loop over lines and extract variables of interest
for line in f:
    line = line.strip()
    columns = line.split()
    name = columns[1] # Not sure what the different numbers do but this was code from another solution
    j = float(columns[1]) # ERROR: string can't be converted to float
    print(name, j)
f.close()

############################
from astropy.io import ascii
data = ascii.read(f_41, guess=False)  
print(data) 
############################
x = np.genfromtxt(f_41, dtype=None)

Another option would be to convert this into a CSV file first and then use Pandas to work with it. However, when I do convert, the variable names get imported as a column stacked on top of each other versus one variable name per its respective column.

# convert ASCII to CSV
f = open(file, 'r')
lines = f.readlines()

with open("FILEOUT.csv", 'w') as csvfile:
    writer = csv.writer(csvfile)
    for l in lines:
        asdf = l.split()
        writer.writerow(asdf)
print("out?")

.dat file relevant sample:

Generated by Rfunction:  Get.mw41.edt.func2 
============> Radisonde_info:
RS_type:        RS41-SGP
RS_config:      -32768
RS_serialnum:   R3340183
RS_freq:        403
RS__windtype:   ccGPS
=============> Station_info:
Station:        HUBV_RS41SGP
Latitude:       39.0563
Longitude:      -76.8755
Altitude:       52.3
SW version:     MW41 2.15.0
Start time:     2020-01-23 06:46:41
=============> Variables & units - Vaisala EDT
NA_numeric value:  -9999
NA_string:  xx or NA
-----------------------------
      Variable       Unit
          time        sec
            xx         NA
            Ta          K
            RH          %
       v(S->N)        m/s
       u(E->W)        m/s
        Height          m
         press        hPa
            Td          K
            MR       g/Kg
            DD        dgr
            FF        m/s
    Ascend_FLG  (0-N,1-Y)
            xx         NA
            xx         NA
           Lon        dgr
           Lat        dgr
            xx         NA
            xx         NA
            xx         NA
=============> Data:
   0.00  -9999.    268.37    85.00      0.00      0.00         52.3   1023.19    266.24     2.22       0.00      0.00 1  -9999.  -9999.   -76.8755    39.0563  -9999.  -9999.  -9999.
   0.81  -9999.    268.46    83.38      0.46      0.86         54.5   1022.90    266.08     2.19     241.86      0.98 1  -9999.  -9999.   -76.8757    39.0564  -9999.  -9999.  -9999.

Images of text are not very useful as we can't parse them! Pictures of code and data are discouraged. — Mark Setchell, May 27 '20 at 16:12
I don’t want to click a link - please edit a relevant sample the text of your dat file into your question. — DisappointedByUnaccountableMod, May 27 '20 at 19:24
I've updated the question to include a relevant sample of the txt file and removed the link. Thanks for sharing tips on how to better ask questions. — Natasha, May 27 '20 at 20:32

score 1 · Answer 1 · answered May 28 '20 at 23:14

I found one solution that works but I would like to avoid hard-coding column headers and be able to read them directly from the ASCII file:

# set the directory for data files
fileRS41 = 'filename.dat'

# load text from .dat files
f41 = np.loadtxt(fileRS41, skiprows = 40)

# create column names for variables
c = ['time' , 'xx0', 'temp', 'RH', 'v(S_N)', 'u(E_W)', 'height', 'pressure', 'Td', 'mixingratio', 'DD', 'FF', 'Ascend_FLG', 'xx1', 'xx2', 'lon', 'lat', 'xx3', 'xx4', 'xx5']

# skip columns when reading in file and converting to dataFrame
skip = ['xx0', 'DD', 'FF', 'Ascend_FLG', 'xx1', 'xx2', 'xx3', 'xx4', 'xx5']

# convert to Pandas Dataframe
df_f41 = pd.DataFrame.from_records(f41, exclude=skip, columns = c)

Reading in ASCII file with uncommon formatting in Python

1 Answers1