-2

What does this code do? I understand it is running a for loop, checking each array element for nulls. I don't understand why there are 2 functions: isnull & any. Also, what does the col before the 'for' do? It looks like a for loop can return the iterable and a tuple is being made. Is that in the Python docs?

cols_with_missing = [col for col in X_train.columns
                         if X_train[col].isnull().any()]
givonz
  • 187
  • 9
  • You could manually explore what this does. Instead of `col for col in X_train.columns` inside the list-comprehension, just pick any column name e.g. 'B'. Then see for yourself what `X_train['B'].isnull()` returns. – smci Dec 29 '19 at 12:17
  • Whle list comprehensions are a core Python feature, data frames are not. The question is basically too broad (three questions in one) and a reader will have to guess several things right to properly understand it (and a title which seems to be vaguely about something else doesn't help either). I hope the selected duplicate will help you solve your problem; if not, you should probably spend some time in the [help] before trying to articulate a better question. Also note that Pandas generally prefers vectorized operations over list-based approaches. – tripleee Dec 29 '19 at 12:26
  • ```[col for col in X_train.columns ... ]``` - This is Python and it is list comprehension. A way of generating a new list from another list or some kind of collection of items. – givonz Dec 30 '19 at 19:40

1 Answers1

1

Summary: It returns a list of column names which have missing values.

Explanation:

  • X_train[col].isnull(): returns a series of True, False values.
  • X_train[col].isnull().any(): returns a scalar value i.e. True or False
tripleee
  • 175,061
  • 34
  • 275
  • 318
YOLO
  • 20,181
  • 5
  • 20
  • 40