Importing .dat files in python without knowing how it is structured

Question

I am trying to load and see the contents of data which can be downloaded from here. After which I need to analyze it. In this regard, I had already posed on problem, but I could not get any solution.

Now, I went through their label file located here. In that, it is mentioned that

“Will code useful Python based letters to describe each object
/ / see http://docs.python.org/library/struct.html for codes / / formats will comma separated beginning with "RJW," as key then / / {NAME}, {FORMAT}, {Number of dims}, {Size Dim 1}, {Size Dim 2}, ... / / where {FORMAT} is the Python code for the type, i.e. I for uint32 / / and there are as many Size Dim's as number of dimensions. ”

So, I guess one can try python. I do have a working knowledge in python. So, I started with this program which I got from here (for simplicity python file and the data files are in same folder):

import numpy as np
data = np.genfromtxt('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.dat')
print(data)

I got the error “UnicodeDecodeError: 'cp949' codec can't decode byte 0xff in position 65: illegal multibyte sequence”.

If I change the code to (as mentioned here):

data=open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', encoding='utf-8')
print(data)

The error message disappears, but all I get is:

<_io.TextIOWrapper name='JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT' mode='r' encoding='utf-8'>

I had checked other answers in in StackOverflow, but could not get any answer. My question may be closely similar to what is posted here

I need to first see the contents of this dat file and then export to other format, say .csv.

Any help will be deeply appreciated...

You will need to call `.read()` on what was returned using `open(...)` in order to read the contents of the file to see what is in the data file. What you got was a `repr` output telling you that `data` is a `TextIOWrapper`. — metatoaster, Mar 18 '21 at 07:21

buran · Accepted Answer · 2021-03-19T05:32:36.817

You need to open the file in binary mode.

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    while True:
        chunk = f.read(160036) # that is record size as per LBL file
            # because the file is huge it will expect to hit Enter
            # to display next chunk. Use Ctrl+C to interrupt
        print(chunk)
        input('Hit Enter...')

Note, you can parse the LBL file, construct format string to use with struct module and parse each chunk into meaningful fields. That is what the comment you quote is saying.

"""Example of reading NASA JUNO JADE CALIBRATED SCIENCE DATA
https://pds-ppi.igpp.ucla.edu/search/view/?f=yes&id=pds://PPI/JNO-J_SW-JAD-3-CALIBRATED-V1.0/DATA/2018/2018091/ELECTRONS/JAD_L30_LRS_ELC_ANY_CNT_2018091_V03&o=1
https://stackoverflow.com/a/66687113/4046632
"""

import struct
from functools import reduce
from operator import mul
from collections import namedtuple

__author__ = "Boyan Kolev, https://stackoverflow.com/users/4046632/buran"

with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.LBL') as f:
    rjws = [line.strip('\n/* ') for line in f if line.startswith('/* RJW')]

# create the format string for struct
rjws = rjws[2:] # exclude first 2 RJW comments related to file itself
names = []
FMT = '='
print(f'Number of objects: {len(rjws)}')
for idx, rjw in enumerate(rjws):
    _, name, fmt, num_dim, *dims = rjw.split(', ')
    fstr = f'{reduce(mul, map(int, dims))}{fmt}'
    FMT = f'{FMT} {fstr}'
    names.append(name)
    print(f'{idx}:{name}, {fstr}')
FMT = FMT.replace('c', 's') # for conveninece treat 21c as s char[]
print(f"Format string: {repr(FMT)}")

# parse DAT file
s = struct.Struct(FMT)
print(f'Struct size:{s.size}')
with open('JAD_L30_LRS_ELC_ANY_CNT_2018091_V03.DAT', 'rb') as f:
    n = 0
    while True: # in python3.8+ this loop can be simplified with walrus operator
        chunk = f.read(s.size)
        if not chunk:
            break
        data = s.unpack_from(chunk)
        # process data further, e.g. split data in 2D containers where appropriate
        n += 1

print(f'Number of records: {n}')

# make a named tuple to represent first 10 fields
# for nice display. This basic use of namedtuple works only
# for first 23 objects, which have single item.
num_fields = 10
Record = namedtuple('Record', names[:num_fields])
record = Record(*data[:num_fields])
print('\n----------------------\n')
print(f'First {num_fields} fields of the last record.')
print(record)

output:

Number of objects: 49
0:DIM0_UTC, 21c
1:PACKETID, 1B
2:DIM0_UTC_UPPER, 21c

--- omitted for sake of brevity ---

46:DIM2_AZIMUTH_DESPUN_LOWER, 3072f
47:MAG_VECTOR, 3f
48:ESENSOR, 1H
Format string: '= 21s 1B 21s 1b 21s 1b 1H 1B 1B 1B 1B 1h 1h 1f 1f 1f 1f 1f 1f 1f 1f 1f 1f 3f 3f 3f 1f 9f 9f 9f 1f 1I 1I 1H 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3072f 3f 1H'
Struct size:160036
Number of records: 1101

----------------------

First 10 fields of the last record.
Record(DIM0_UTC=b'2018-091T23:56:08.925', PACKETID=106, DIM0_UTC_UPPER=b'2018-092T00:01:08.925', PACKET_MODE=1, DIM0_UTC_LOWER=b'2018-091T23:51:08.925', PACKET_SPECIES=-1, ACCUMULATION_TIME=600, DATA_UNITS=2, SOURCE_BACKGROUND=3, SOURCE_DEAD_TIME=0)

Link to GutHub gist

many thanks for this wonderful piece of code. I am trying to understand it, but seems like it is doing the job. As I mentioned in question, is it possible to export it into .csv file with columns'DIM0_UTC,PACKETID,....,MAG_VECTOR,ESENSOR' and rows with corresponding values? As this being kind of sorted out, you may also look at this question [https://stackoverflow.com/questions/59782618/reading-dat-file-in-python]. — sreeraj t, Mar 18 '21 at 21:53
You can always save in csv, but there are 40004 separate fields. And there are 2D containers of data as per the description (all 3072 = 64 * 48 ). So you need to decide how to deal with that. By the way the NASA viewer mentioned in your other question shows 39996 columns, but for some of the objects that are 1D but 3 items, like SC_POS_JUPITER_J2000XYZ it shows just 1 value. — buran, Mar 18 '21 at 22:16
Thanks for the speedy reply. Apart from SC_POS_JUPITER_J2000XYZ, the other quantities which has 3 components are SC_VEL_JUPITER_J2000XYZ, SC_VEL_ANGULAR_J2000XYZ, MAG_VECTOR. I guess, it is kind of vector. So, if we take into account of remaining 6 columns, we get 40004 (=39996+6). I checked the software mentioned in my previous question, but could not find a way to export the dat vales. — sreeraj t, Mar 18 '21 at 22:56
Yes, these 4 account for the 8 column difference. I didn't check how it display DESPUN_SC_TO_J2000, 2000_TO_JSSXYZ and J2000_TO_JSSRTP, all of which are 9-items !D objects, but I guess it's file. Also, I edited my answer to include comment that this basic use of namedtuple works only for first 23 objects, which have single item. And yes, I also think NASA software is just viewer, without export option. — buran, Mar 19 '21 at 05:31

Importing .dat files in python without knowing how it is structured

1 Answers1