1

I have a pandas dataframe containing indices that have a one-to-many relationship. A very simplified and shortened example of my data is shown in the DataFrame Example link. I want to get a list or Series or ndarray of the unique namIdx values in which nCldLayers <= 1. The final result should show indices of 601 and 603.

  1. I am able to accomplish this with the 3 statements below, but I am wondering if there is a much better, more succinct way with perhaps 'filter', 'select', or 'where'.

    grouped=(namToViirs['nCldLayers']<=1).groupby(namToViirs.index).all(axis=0)
    grouped = grouped[grouped==True]
    filterIndex = grouped.index
    
  2. Is there a better approach in accomplishing this result by applying the logical condition (namToViirs['nCldLayers >= 1) in a subsequent part of the chain, i.e., first group then apply logical condition, and then retrieve only the namIdx where the logical result is true for each member of the group?

user1745564
  • 159
  • 1
  • 2
  • 8
  • 1
    Not sure about your second question, but for your first, what about `set((namToViirs["nCldLayers"] <= 1).index)` ? – dmn Oct 13 '16 at 18:56

4 Answers4

1

I think your code works nice, only you can add use small changes:

In all can be omit axis=0
grouped==True can be omit ==True

grouped=(namToViirs['nCldLayers']<=1).groupby(level='namldx').all()
grouped = grouped[grouped]
filterIndex = grouped.index
print (filterIndex)
Int64Index([601, 603], dtype='int64', name='namldx')

I think better is first filter by boolean indexing and then groupby, because less loops -> better performance.

jezrael
  • 822,522
  • 95
  • 1,334
  • 1,252
0

For question 1, see jezrael answer. For question 2, you could play with indexes as sets:

namToViirs.index[namToViirs.nCldLayers <= 1] \ 
          .difference(namToViirs.index[namToViirs.nCldLayers > 1])
Zeugma
  • 31,231
  • 9
  • 69
  • 81
0

You might be interested in this answer.

The implementation is currently a bit hackish, but it should reduce your statement above to:

filterIndex = ((namToViirs['nCldLayers']<=1)
                .groupby(namToViirs.index).all(axis=0)[W].index)

EDIT: also see this answer for an analogous approach not requiring external components, resulting in:

filterIndex = ((namToViirs['nCldLayers']<=1)
                .groupby(namToViirs.index).all(axis=0)[lambda x : x].index)
Pietro Battiston
  • 7,930
  • 3
  • 42
  • 45
0

Another option is to use .pipe() and a function which applies the desired filtering.

For instance:

filterIndex = ((namToViirs['nCldLayers']<=1)
                .groupby(namToViirs.index)
                .all(axis=0)
                .pipe(lambda s : s[s])
                .index)
Pietro Battiston
  • 7,930
  • 3
  • 42
  • 45