Since Pandas does not have any facilities for dealing with Lorentz vectors, expressing them in terms of their components (pT, eta, phi, mass) and writing your own functions for transforming them is the only way to go, especially if you want to save to and from CSV.
That said, it is possible to create Lorentz vector objects that retain their "Lorentziness" inside of Pandas, but there are limitations. You can create structured data as Awkward Arrays:
>>> import awkward1 as ak
>>> import pandas as pd
>>> import numpy as np
>>> class Lorentz:
... @property
... def p(self):
... return self.pt * np.cosh(self.eta)
...
>>> class LorentzRecord(Lorentz, ak.Record): pass
...
>>> class LorentzArray(Lorentz, ak.Array): pass
...
>>> ak.behavior["lorentz"] = LorentzRecord
>>> ak.behavior["*", "lorentz"] = LorentzArray
>>> array = ak.Array([{"pt": 1.1, "eta": 2.2},
... {"pt": 3.3, "eta": 4.4},
... {"pt": 5.5, "eta": -2.2}],
... with_name="lorentz")
>>> array
<LorentzArray [{pt: 1.1, eta: 2.2}, ... eta: -2.2}] type='3 * lorentz["pt": floa...'>
The above defined an array
of records with fields pt
and eta
and gave both the single-record and the array-of-records views a new property p
, which is derived from pt
and eta
.
>>> # Each record has a pt, eta, and p.
>>> array[0].pt
1.1
>>> array[0].eta
2.2
>>> array[0].p
5.024699161788051
>>> # The whole array has a pt, eta, and p (columns).
>>> array.pt
<Array [1.1, 3.3, 5.5] type='3 * float64'>
>>> array.eta
<Array [2.2, 4.4, -2.2] type='3 * float64'>
>>> array.p
<Array [5.02, 134, 25.1] type='3 * float64'>
You can put an array of Lorentz records into a Pandas DataFrame:
>>> df = pd.DataFrame({"column": array})
>>> df
column
0 {pt: 1.1, eta: 2.2}
1 {pt: 3.3, eta: 4.4}
2 {pt: 5.5, eta: -2.2}
and do the same things with it:
>>> df.column.values.pt
<Array [1.1, 3.3, 5.5] type='3 * float64'>
>>> df.column.values.eta
<Array [2.2, 4.4, -2.2] type='3 * float64'>
>>> df.column.values.p
<Array [5.02, 134, 25.1] type='3 * float64'>
but that's because we're pulling the Awkward Array back out to apply these operations.
>>> df.column.values
<LorentzArray [{pt: 1.1, eta: 2.2}, ... eta: -2.2}] type='3 * lorentz["pt": floa...'>
Any NumPy functions applied to the DataFrame, such as negation (implicitly calls np.negative
), get passed through to the Awkward Array without having to unpack it.
>>> -df
column
0 {pt: -1.1, eta: -2.2}
1 {pt: -3.3, eta: -4.4}
2 {pt: -5.5, eta: 2.2}
but at present, it's the wrong operation: it shouldn't negate the pt
. It's possible to further overload that:
>>> def negative_Lorentz(x):
... return ak.zip({"pt": x.pt, "eta": -x.eta})
...
>>> ak.behavior[np.negative, "lorentz"] = negative_Lorentz
>>> -df
column
0 {pt: 1.1, eta: -2.2}
1 {pt: 3.3, eta: -4.4}
2 {pt: 5.5, eta: 2.2}
We're still building up a suite of functions for Lorentz arrays, but now they work in the array-at-a-time mode that Pandas operates in. There's a project called vector to define all of these functions for 2D, 3D, and Lorentz vectors, but it's in early stages of development.
Getting back to the issue of saving—all of the above doesn't help you with that because Pandas "saves" these data by printing them out:
>>> df.to_csv("whatever.csv")
writes
,column
0,"{pt: 1.1, eta: 2.2}"
1,"{pt: 3.3, eta: 4.4}"
2,"{pt: 5.5, eta: -2.2}"
which is not something that can be read back. We can try,
>>> df2 = pd.read_csv("whatever.csv")
>>> df2
Unnamed: 0 column
0 0 {pt: 1.1, eta: 2.2}
1 1 {pt: 3.3, eta: 4.4}
2 2 {pt: 5.5, eta: -2.2}
>>> df2.column.values
array(['{pt: 1.1, eta: 2.2}', '{pt: 3.3, eta: 4.4}',
'{pt: 5.5, eta: -2.2}'], dtype=object)
and so far, it looks good, but it isn't good:
>>> df2.column.values
array(['{pt: 1.1, eta: 2.2}', '{pt: 3.3, eta: 4.4}',
'{pt: 5.5, eta: -2.2}'], dtype=object)
They're strings. They are no longer computable. So if you want to save to a file, break it down into components.
Maybe all of this can be pulled together into a usable system, but some aspects, like saving these arrays with their "Lorentizness" intact, are not ready yet.