1

I have a 2D map where each pixel contains a spectrum. I want to convert the data from this format:

X       Y       Wave        Intensity
-34727.180000   -4204.820000    1.484622    139.193512
-34727.180000   -4204.820000    1.484043    120.991280
-34727.180000   -4204.820000    1.483465    125.905304
-34726.180000   -4204.820000    1.483465    131.262970
-34726.180000   -4204.820000    1.482887    122.784081
-34726.180000   -4204.820000    1.482309    129.853088
-34725.180000   -4204.820000    1.483465    129.655670
-34725.180000   -4204.820000    1.482887    119.567032
-34725.180000   -4204.820000    1.482309    126.097000
-34727.180000   -4203.820000    1.463490    124.331985
-34727.180000   -4203.820000    1.462927    138.189377
-34727.180000   -4203.820000    1.462364    127.824867

to a dictionary where the keys are a tuple of the X,Y coordinate, and the values are a 3-by-2 numpy array. For example:

DICT = {
    (-34727.180000, -4204.820000): [[1.484622, 139.193512], [1.484043, 120.991280], [1.483465, 125.905304]], 
    (-34726.180000, -4204.820000): [[1.482887, 122.784081], [1.482887, 122.784081], [1.482309, 129.853088]],
    (-34725.180000, -4204.820000): [[1.483465, 129.655670], [1.482887, 119.567032], [1.482309, 126.097000]],
    (-34727.180000, -4203.820000): [[1.463490, 124.331985], [1.462927, 138.189377], [1.462927, 138.189377]]}

This example is simplified; my actual map contains more than four pixels (X,Y coordinates), and there are 512 Wave-Intensity pairs for each coordinate. I hope the solution can be generalized from a four pixel map to a 400 pixel map, and each array from a 3-by-2 numpy array to an 512-by-2 numpy array.

The ultimate goal is to take the Wave-Intensity pairs for each coordinate, fit them to a Gaussian distribution, find the (maximum) amplitude for this distribution, and plot that maximum for each X,Y coordinate. This part of the problem does not need to be included in a solution, but if someone adds a solution to this part of the problem, that would be excellent!

I am open to approaches that do not involve a dictionary (e.g. a 4D numpy array), but at the moment I cannot see another way. Feel free to recommend an alternate approach. Currently, I am importing the data in its original format using pandas:

import pandas as pd
IN_PATH = r'PATH_TO_FILE'
FNAME = r'\FILENAME.txt'
data = pd.read_csv(IN_PATH+FNAME, sep='\t', skiprows=1)
data.columns = ["X", "Y", "Wave", "Intensity"]

Thanks in advance!

T Walker
  • 330
  • 1
  • 3
  • 12

2 Answers2

1

First pandas.DataFrame.set_index the coordinates, pandas.DataFrame.agg list along axis=1, pandas.DataFrame.groupby the indices, then pandas.DataFrame.groupby.agg to list, and convert pandas.Series.to_dict:

>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).to_dict()

{(-34727.18, -4204.82): [[1.484622, 139.193512],
                         [1.484043, 120.99128],
                         [1.483465, 125.905304]],
 (-34727.18, -4203.82): [[1.46349, 124.331985],
                         [1.462927, 138.189377],
                         [1.462364, 127.824867]],
 (-34726.18, -4204.82): [[1.483465, 131.26297],
                         [1.482887, 122.784081],
                         [1.482309, 129.853088]],
 (-34725.18, -4204.82): [[1.483465, 129.65567],
                         [1.482887, 119.567032],
                         [1.482309, 126.097]]}

This will give the results in list, if you want array, you can pandas.Series.transform to numpy.array:

>>> df.set_index(['X', 'Y']).agg(list, 1).groupby(level=(0,1)).agg(list).transform(np.array).to_dict()

{(-34727.18, -4204.82): array([[  1.484622, 139.193512],
        [  1.484043, 120.99128 ],
        [  1.483465, 125.905304]]),
 (-34727.18, -4203.82): array([[  1.46349 , 124.331985],
        [  1.462927, 138.189377],
        [  1.462364, 127.824867]]),
 (-34726.18, -4204.82): array([[  1.483465, 131.26297 ],
        [  1.482887, 122.784081],
        [  1.482309, 129.853088]]),
 (-34725.18, -4204.82): array([[  1.483465, 129.65567 ],
        [  1.482887, 119.567032],
        [  1.482309, 126.097   ]])}
Sayandip Dutta
  • 15,602
  • 4
  • 23
  • 52
  • 1
    This is very close to what I'm looking for, however, the dictionary values are stored as a list instead of a numpy array. Is it possible to store the dictionary values as a numpy array? If not, then I can include a conversion from list to numpy array when I go to iterate over the each dictionary entry. – T Walker Jan 31 '21 at 16:30
  • 1
    @TWalker Sure, just transform np.array before to_dict. Editing. – Sayandip Dutta Jan 31 '21 at 18:51
0

You can simply iterate through the dataframe. Note though, that in your sample data the first few entries have identical X and Y and therefore the dictionary entry will be overwritten.

d = {}

for ix, row in df.iterrows():
    d[(row['X'], row['Y'])] = [row[a] for a in row.keys() if not a=='X' and not a=='Y']

Edit: storing all data for one pixel under the same key:

d = {}

for ix, row in df.iterrows():
    entry = [row[a] for a in row.keys() if not a=='X' and not a=='Y']
    x,y = row['X'], row['Y']

    if d.get((x,y)):
        d[(x,y)] += [entry]
    else:
        d[(x,y)] = [entry]
warped
  • 8,947
  • 3
  • 22
  • 49
  • The entries with identical X and Y represent a single pixel containing an entire spectrum. I cannot have the dictionary entry that represents this pixel be overwritten. The solution is closer to that provided by @Sayandip Dutta. – T Walker Jan 31 '21 at 16:27
  • @TWalker Ok, I must have misunderstood the question. The edit should do what you want, though. – warped Jan 31 '21 at 16:45