0

I have some Pandas (python) data frames that are created by collecting data roughly every 8 milliseconds. The data is broken down into blocks, where a sequence restarts. All the blocks have a label, and there is a timestamp column indicating the time (from the beginning of the file) that the sample was collected. To get an idea, the frame looks something like this:

|        | EXPINDEX | EXPTIMESTAMP | DATA1 | DATA2 |
-----------------------------------------------------
| BLOCK  | 0        |              |       |       |
| Block1 | 1        | 0            | .423  | .926  |
|        | 2        | 8.215        | .462  | .919  |
|        | 3        | 17.003       | .472  | .904  |
| Block2 | 4        | 55.821       | .243  | .720  |
|        | 5        | 63.521       | .237  | .794  |
| ...    | ...      | ...          | ...   | ...   |
------------------------------------------------------

The EXPTIMESTAMP column is a DateTimeIndex. What I would like to do is keep that column for utility later on, but create a different sub-index with a block-relative DateTimeIndex, e.g.:

|        |                | EXPTIMESTAMP | DATA1 | DATA2 |
----------------------------------------------------------
| BLOCK  | BLOCKTIMESTAMP |              |       |       |
| Block1 | 0              | 0            | .423  | .926  |
|        | 8.215          | 8.215        | .462  | .919  |
|        | 17.003         | 17.003       | .472  | .904  |
| Block2 | 0              | 55.821       | .243  | .720  |
|        | 7.700          | 63.521       | .237  | .794  |
| ...    | ...            | ...          | ...   | ...   |
----------------------------------------------------------

I have gotten this working:

blockreltimestamp = []
blocks = list(df.index.levels[0])
for block in blocks:
   dfblock = df.xs(block, level='BLOCK').copy()
   dfblock["InitialVal"] = dfblock.iloc[0]["EXPTIMESTAMP"]
   reltime = dfsblock["EXPTIMESTAMP"] - dfblock["InitialVal"]
   blockreltimestamp.extend(list(reltime))
df["BLOCKTIMESTAMP"] = blockreltimestamp
df.set_index(["BLOCK","BLOCKTIMESTAMP"], drop=False, inplace=True)

But I was wondering if there were a cleaner / more efficient / more pandas-esque way of doing this type of transformation.

Thanks!

Andrew Schoen
  • 145
  • 1
  • 11

1 Answers1

0

The cleaner solution ended up working on the non-multiindex data frame, where BLOCK was still a column with block IDs, and EXPTIMESTAMP was a column, as I wanted it in the end. From there, I used pandas' groupby functionality:

initialvalmatrix = df.groupby("BLOCK").min()[["EXPTIMESTAMP"]]

This creates a data frame with indices of "BLOCK", and a column "EXPTIMESTAMP" containing the minimum value of "EXPTIMESTAMP" for each block.

For clarity, I renamed the "EXPTIMESTAMP" column to "INITIALBYBLOCK":

initialvalmatrix.columns = ["INITIALBYBLOCK"]

I then used pandas' apply to run a function across the columns to compute the "BLOCKTIMESTAMP" column:

df["BLOCKTIMESTAMP"] = df.apply(apply_zero_timestamp, axis=1, tslookup=initialvalmatrix)
#Keyword arguments, if not used in the apply method, are passed into the function specified.

...Where the "apply_zero_timestamp" function was defined as:

def apply_zero_timestamp(series, tslookup):
    zeroval = series["EXPTIMESTAMP"] - tslookup["INITIALBYBLOCK"][series["BLOCK"]]
    return zeroval

And finally, I just had to set my index as I wanted:

df.set_index(["BLOCK","BLOCKTIMESTAMP"], drop=False, inplace=True)

Hope it helps!

Andrew Schoen
  • 145
  • 1
  • 11