0

I have a function that outputs a dataframe generated from a RINEX (GPS) file. At present, I get the dataframe to be output into separated satellite (1-32) files. I'd like to access in the first column (either when it's still a dataframe or in these new files) in order to format the date to a timestamp in seconds, like below:

Epochs                  Epochs
2014-04-27 00:00:00 ->  00000
2014-04-27 00:00:30 ->  00030
2014-04-27 00:01:00 ->  00060 

This requires stripping the date away, then converting hh:mm:ss to seconds. I've hit a wall trying to figure out how best to access this first column (Epochs) and then make the conversion on the entire column. The code I have been working on is:

def read_data(self, RINEXfile):
    obs_data_chunks = []

    while True:
        obss, _, _, epochs, _ = self.read_data_chunk(RINEXfile)

        if obss.shape[0] == 0:
            break

        obs_data_chunks.append(pd.Panel(
            np.rollaxis(obss, 1, 0),
            items=['G%02d' % d for d in range(1, 33)],
            major_axis=epochs,
            minor_axis=self.obs_types
        ).dropna(axis=0, how='all').dropna(axis=2, how='all'))

        obs_data_chunks_dataframe = obs_data_chunks[0]

        for sv in range(32):
            sat = obs_data_chunks_dataframe[sv, :]
            print "sat_columns: {0}".format(sat.columns[0]) #list header of first column: L1
            sat.to_csv(('SV_{0}').format(sv+1), index_label="Epochs", sep='\t')

Do I perform this conversion within the dataframe i.e on "sat", or on the files after using the "to_csv"? I'm a bit lost here. Same question for formatting the columns. See the not-so-nicely formatted columns below:

Epochs  L1  L2  P1  P2  C1  S1  S2
2014-04-27 00:00:00 669486.833  530073.33   24568752.516    24568762.572    24568751.442    43.0    38.0
2014-04-27 00:00:30 786184.519  621006.551  24590960.634    24590970.218    24590958.374    43.0    38.0
2014-04-27 00:01:00 902916.181  711966.252  24613174.234    24613180.219    24613173.065    42.0    38.0
2014-04-27 00:01:30 1019689.006 802958.016  24635396.428    24635402.41 24635395.627    42.0    37.0
2014-04-27 00:02:00 1136478.43  893962.705  24657620.079    24657627.11 24657621.828    42.0    37.0

UPDATE: By saying that I've hit a wall trying to figure out how best to access this first column (Epochs), the ""sat" dataframe originally in its header had no "Epochs". It simply had the signals:

L1  L2  P1  P2  C1  S1  S2

The index, (date&time), was missing from the header. In order to overcome this in my csv output files, I "forced" the name with:

sat.to_csv(('SV_{0}').format(sv+1), index_label="Epochs", sep='\t')

I would expect before generating the csv files, I should (but don't know how) be able to access this index (date&time) column and simply convert all dates/times in one swoop, so that the timestamps are outputted.

UPDATE: The epochs are generated in the dataframe in another function as so:

epochs = np.zeros(CHUNK_SIZE, dtype='datetime64[us]')

UPDATE:

def read_data_chunk(self, RINEXfile, CHUNK_SIZE = 10000):
    obss = np.empty((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.float64) * np.NaN
    llis = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
    signal_strengths = np.zeros((CHUNK_SIZE, TOTAL_SATS, len(self.obs_types)), dtype=np.uint8)
    epochs = np.zeros(CHUNK_SIZE, dtype='datetime64[us]')
    flags = np.zeros(CHUNK_SIZE, dtype=np.uint8)

    i = 0
    while True:
        hdr = self.read_epoch_header(RINEXfile)
        #print hdr
        if hdr is None:
            break
        epoch, flags[i], sats = hdr
        epochs[i] = np.datetime64(epoch)
        sat_map = np.ones(len(sats)) * -1
        for n, sat in enumerate(sats):
            if sat[0] == 'G':
                sat_map[n] = int(sat[1:]) - 1
        obss[i], llis[i], signal_strengths[i] = self.read_obs(RINEXfile, len(sats), sat_map)
        i += 1
        if i >= CHUNK_SIZE:
            break

    return obss[:i], llis[:i], signal_strengths[:i], epochs[:i], flags[:i]

UPDATE:

My apologies if my description was somewhat vague. Actually I'm modifying code already developed, and I'm not a SW developer so it's a strong learning curve for me too. Let me explain further: the "Epochs" are read from another function:

def read_epoch_header(self, RINEXfile):
            epoch_hdr = RINEXfile.readline()
            if epoch_hdr == '':
                return None

            year = int(epoch_hdr[1:3])
            if year >= 80:
                year += 1900
            else:
                year += 2000
            month = int(epoch_hdr[4:6])
            day = int(epoch_hdr[7:9])
            hour = int(epoch_hdr[10:12])
            minute = int(epoch_hdr[13:15])
            second = int(epoch_hdr[15:18])
            microsecond = int(epoch_hdr[19:25]) # Discard the least significant digits (use microseconds only).
            epoch = datetime.datetime(year, month, day, hour, minute, second, microsecond)

            flag = int(epoch_hdr[28])
            if flag != 0:
                raise ValueError("Don't know how to handle epoch flag %d in epoch header:\n%s", (flag, epoch_hdr))

            n_sats = int(epoch_hdr[29:32])
            sats = []
            for i in range(0, n_sats):
                if ((i % 12) == 0) and (i > 0):
                    epoch_hdr = RINEXfile.readline()
                sats.append(epoch_hdr[(32+(i%12)*3):(35+(i%12)*3)])

            return epoch, flag, sats

In the above read_data function, these are appended into a dataframe. I basically want to have this dataframe separated by its satellite axis, so that each satellite file has in the first column, the epochs, then the following 7 signals. The last bit of code in the read_data file (below) explains this:

for sv in range(32):
            sat = obs_data_chunks_dataframe[sv, :]
            print "sat_columns: {0}".format(sat.columns[0]) #list header of first column: L1
            sat.to_csv(('SV_{0}').format(sv+1), index_label="Epochs", sep='\t')

The problem here is (1) I want to have the first column as timestamps (so, strip the date, convert so midnight = 00000s and 23:59:59 = 86399s) not as they are now, and (2) ensure the columns are aligned, so I can eventually manipulate these further using a different class to perform other calculations i.e. L1 minus L2 plotted against time, etc.
pymat
  • 1,090
  • 1
  • 23
  • 45

2 Answers2

0

It will be much quicker to do this when it's a df, if the dtype is datetime64 then just convert to int64 and then divide by nanoseconds:

In [241]:
df['Epochs'].astype(np.int64) // 10**9

Out[241]:
0    1398556800
1    1398556830
2    1398556860
3    1398556890
4    1398556920
Name: Epochs, dtype: int64

If it's a string then convert using to_datetime and then perform the above:

df['Epochs'] = pd.to_datetime(df['Epochs']).astype(np.int64) // 10**9

see related

Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • The index label (sat.to_csv(('SV_{0}').format(sv+1), index_label="Epochs", sep='\t')) I assigned after the conversion from dataframe to csv. I somehow need to tell the dataframe, to use the first column. This was not indexed. – pymat Sep 17 '15 at 09:59
  • I really don't understand what you mean, can you edit your question with raw data and code that creates your df and a further explanation – EdChum Sep 17 '15 at 10:02
0

I resolved part of this myself in the end: in the read_epoch_header function, I simply manipulated a variable that converted just hh:mm:ss to seconds, and used this as the epoch. Doesn't look that elegant but it works. Just need to format the header so that it aligns with the columns (and they are aligned too). Cheers, pymat

pymat
  • 1,090
  • 1
  • 23
  • 45