1

Sorry in advance the number of images, but they help demonstrate the issue

I have built a dataframe which contains film thickness measurements, for a number of substrates, for a number of layers, as function of coordinates:

|    | Sub | Result | Layer | Row | Col |
|----|-----|--------|-------|-----|-----|
|  0 |   1 |   2.95 | 3 - H |   0 |  72 |
|  1 |   1 |   2.97 | 3 - V |   0 |  72 |
|  2 |   1 |   0.96 | 1 - H |   0 |  72 |
|  3 |   1 |   3.03 | 3 - H | -42 |  48 |
|  4 |   1 |   3.04 | 3 - V | -42 |  48 |
|  5 |   1 |   1.06 | 1 - H | -42 |  48 |
|  6 |   1 |   3.06 | 3 - H |  42 |  48 |
|  7 |   1 |   3.09 | 3 - V |  42 |  48 |
|  8 |   1 |   1.38 | 1 - H |  42 |  48 |
|  9 |   1 |   3.05 | 3 - H | -21 |  24 |
| 10 |   1 |   3.08 | 3 - V | -21 |  24 |
| 11 |   1 |   1.07 | 1 - H | -21 |  24 |
| 12 |   1 |   3.06 | 3 - H |  21 |  24 |
| 13 |   1 |   3.09 | 3 - V |  21 |  24 |
| 14 |   1 |   1.05 | 1 - H |  21 |  24 |
| 15 |   1 |   3.01 | 3 - H | -63 |   0 |
| 16 |   1 |   3.02 | 3 - V | -63 |   0 |

and this continues for >10 subs (per batch), and 13 sites per sub, and for 3 layers - this df is a composite. I am attempting to present the data as a facetgrid of heatmaps (adapting code from How to make heatmap square in Seaborn FacetGrid - thanks!)

I can plot a subset of the df quite happily:

spam = df.loc[df.Sub== 6].loc[df.Layer == '3 - H']
spam_p= spam.pivot(index='Row', columns='Col', values='Result')

sns.heatmap(spam_p, cmap="plasma")

enter image description here

BUT - there are some missing results, where the layer measurement errors (returns '10000') so I've replaced these with NaNs:

df.Result.replace(10000, np.nan)

Single seaborn heatmap with correct axes

To plot a facetgrid to show all subs/layers, I've written the following code:

def draw_heatmap(*args, **kwargs):
    data = kwargs.pop('data')
    d = data.pivot(columns=args[0], index=args[1], 
    values=args[2])
    sns.heatmap(d, **kwargs)

fig = sns.FacetGrid(spam, row='Wafer', 
col='Feature', height=5, aspect=1)

fig.map_dataframe(draw_heatmap, 'Col', 'Row', 'Result', cbar=False, cmap="plasma", annot=True, annot_kws={"size": 20})

which yields:

heatmap image with incomplete axes plot

It has automatically adjusted axes to not show any positions where there is a NaN. I have tried masking (see https://github.com/mwaskom/seaborn/issues/375) but just errors out with Inconsistent shape between the condition and the input (got (237, 15) and (7, 7)).

And the result of this is, when not using the cropped down dataset (i.e. df instead of spam, the code generates the following Facetgrid):

enter image description here

Plots featuring missing values at extreme (edge) coordinate positions make the plot shift within the axes - here all apparently to the upper left. Sub #5, layer 3-H should look like:

enter image description here

i.e. blanks in the places where there are NaNs.

Why is the facetgrid shifting the entire plot up and/or left? The alternative is dynamically generating subplots based on a sub/layer-count (ugh!).

Any help very gratefully received.

Full dataset for 2 layers of sub 5:

    Sub Result  Layer   Row     Col
0   5   2.987   3 - H   0       72
1   5   0.001   1 - H   0       72
2   5   1.184   3 - H   -42     48
3   5   1.023   1 - H   -42     48
4   5   3.045   3 - H   42      48 
5   5   0.282   1 - H   42      48
6   5   3.083   3 - H   -21     24 
7   5   0.34    1 - H   -21     24
8   5   3.07    3 - H   21      24
9   5   0.41    1 - H   21      24
10  5   NaN     3 - H   -63     0
11  5   NaN     1 - H   -63     0
12  5   3.086   3 - H   0       0
13  5   0.309   1 - H   0       0
14  5   0.179   3 - H   63      0
15  5   0.455   1 - H   63      0
16  5   3.067   3 - H   -21    -24
17  5   0.136   1 - H   -21    -24
18  5   1.907   3 - H   21     -24
19  5   1.018   1 - H   21     -24
20  5   NaN     3 - H   -42    -48
21  5   NaN     1 - H   -42    -48
22  5   NaN     3 - H   42     -48
23  5   NaN     1 - H   42     -48
24  5   NaN     3 - H   0      -72
25  5   NaN     1 - H   0      -72
Community
  • 1
  • 1
BAC83
  • 811
  • 1
  • 12
  • 27
  • 1
    How can I test this? Is "sub" the same as "wafer"? What minimal dataset would reproduce the issue? – ImportanceOfBeingErnest Aug 16 '18 at 12:59
  • Yes - sorry, multiple naming conventions here, I've hacked this together to ask the question. Sub == wafer. – BAC83 Aug 16 '18 at 13:03
  • And i'll make an edit to help generate a test dataset... – BAC83 Aug 16 '18 at 13:06
  • I've added a full dataset; however you could always use these data multiple times to emulate multiple subs (obviously). If you do - it would perhaps be a good idea to include more (fake) values, to force different/new positions to be used ie. replace some NaNs with values. – BAC83 Aug 16 '18 at 13:40

1 Answers1

2

You may create a list of unique column and row labels and reindex the pivot table with them.

cols = df["Col"].unique()
rows = df["Row"].unique()

pivot = data.pivot(...).reindex_axis(cols, axis=1).reindex_axis(rows, axis=0)

as seen in this answer.

Some complete code:

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt

r = np.repeat([0,-2,2,-1,1,-3],2)
row = np.concatenate((r, [0]*2, -r[::-1]))
c = np.array([72]*2+[48]*4 + [24]*4 + [0]* 3)
col = np.concatenate((c,-c[::-1]))

df = pd.DataFrame({"Result" : np.random.rand(26),
                   "Layer" : list("AB")*13,
                   "Row" : row, "Col" : col})

df1 = df.copy()
df1["Sub"] = [5]*len(df1)
df1.at[10:11,"Result"] = np.NaN
df1.at[20:,"Result"] = np.NaN

df2 = df.copy()
df2["Sub"] = [3]*len(df2)
df2.at[0:2,"Result"] = np.NaN

df = pd.concat([df1,df2])

cols = np.unique(df["Col"].values)
rows = np.unique(df["Row"].values)

def draw_heatmap(*args, **kwargs):
    data = kwargs.pop('data')
    d = data.pivot(columns=args[0], index=args[1], 
                   values=args[2])
    d = d.reindex_axis(cols, axis=1).reindex_axis(rows, axis=0)
    print d
    sns.heatmap(d,  **kwargs)

grid = sns.FacetGrid(df, row='Sub', col='Layer', height=3.5, aspect=1 )

grid.map_dataframe(draw_heatmap, 'Col', 'Row', 'Result', cbar=False, 
                  cmap="plasma", annot=True)

plt.show()

enter image description here

ImportanceOfBeingErnest
  • 321,279
  • 53
  • 665
  • 712
  • **Thank you** very much for this - saved me an enormous headache. If i'm understanding it correctly, your solution explains why the plots were drifting differently; they each need to be reindexed for the col/rows. And extra thanks for the complete code, really helpful to see some pro-level approaches to my problems! Really appreciate it. – BAC83 Aug 17 '18 at 08:00
  • This seems to have gone out of date (the example no longer runs successfully), but I don't have the expertise to fix it. – beyarkay Mar 19 '22 at 16:07