9

The seaborn documentation is pretty unclear about the differences, and I can't figure them out. It seems like they have very similar, if not identical, functionality.

seaborn.FacetGrid.map

seaborn.FacetGrid.map_dataframe

What exactly are the differences, and when do you use one vs. the other? The seaborn documentation about map_dataframe says "Unlike the map method, a function used here must “understand” Pandas objects." That is the only difference in the documentation of map_dataframe vs. map. What kind of objects are sent to the function in map then if not a dataframe and why does it matter? I also don't really understand the color argument that the target functions have to accept. What information is in that color argument?

Jim
  • 1,579
  • 1
  • 11
  • 18

1 Answers1

6

When you use FacetGrid.map(func, "col1", "col2", ...), the function func is passed the values of the columns "col1" and "col2" (an more if needed) as parameters 1 and 2 (args[0], args[1], ...). In addition, the function always receives a keyword argument named color=.

When you use FacetGrid.map_dataframe(func, "col1", "col2", ...), the function func is passed the names "col1" and "col2" (an more if needed) as parameters 1 and 2 (args[0], args[1], ...), and the filtered dataframe as keyword argument data=. In addition, the function always receives a keyword argument named color=.

Maybe this demonstration would help:

N=4
df = pd.DataFrame({'col1': np.random.random(N), 'col2':np.random.random(N), 'cat':np.random.choice([True,False], size=N)})

|    |     col1 |      col2 | cat   |
|---:|---------:|----------:|:------|
|  0 | 0.651592 | 0.631109  | True  |
|  1 | 0.981403 | 0.550882  | False |
|  2 | 0.467846 | 0.997084  | False |
|  3 | 0.119726 | 0.0452547 | False |
  • using FacetGrid.map():

code:

def test(*args, **kwargs):
    print(">>> content of ARGS:")
    print(args)
    print(">>> content of KWARGS:")
    print(kwargs)


g = sns.FacetGrid(df, col='cat')
g.map(test, 'col1', 'col2')

output:

>>> content of ARGS:
(1    0.981403
2    0.467846
3    0.119726
Name: col1, dtype: float64, 1    0.550882
2    0.997084
3    0.045255
Name: col2, dtype: float64)
>>> content of KWARGS:
{'color': (0.12156862745098039, 0.4666666666666667, 0.7058823529411765)}
>>> content of ARGS:
(0    0.651592
Name: col1, dtype: float64, 0    0.631109
Name: col2, dtype: float64)
>>> content of KWARGS:
{'color': (0.12156862745098039, 0.4666666666666667, 0.7058823529411765)}
  • using map_dataframe

code:

g.map_dataframe(test, 'col1', 'col2')

output:

>>> content of ARGS:
('col1', 'col2')
>>> content of KWARGS:
{'color': (0.12156862745098039, 0.4666666666666667, 0.7058823529411765), 
 'data':        col1      col2    cat
         1  0.981403  0.550882  False
         2  0.467846  0.997084  False
         3  0.119726  0.045255  False}
>>> content of ARGS:
('col1', 'col2')
>>> content of KWARGS:
{'color': (0.12156862745098039, 0.4666666666666667, 0.7058823529411765), 
 'data':        col1      col2   cat
         0  0.651592  0.631109  True}
Diziet Asahi
  • 38,379
  • 7
  • 60
  • 75
  • 1
    So a function being called with `map_dataframe` should be: `func(col1_name, col1_name, data, color)`, This function receives the whole dataframe and then can do whaterever it wants with it, The columns that it is supposed to plot are given as strings, so they need to be accessed as `data[col1_name]` and `data[col2_name)`. – Jim Sep 25 '20 at 20:02
  • 1
    But a function called with map should be: `func(col1, col2, color)`. Here `col1` and `col2` ARE the columns already, and not just the labels. Outside of those columns the rest of the dataframe is not available. So `map_dataframe` is a generalization of `map`. Right? – Jim Sep 25 '20 at 20:03