2

I already know how to append a value depending on a for with an if loop but I want to know if there is an optimized way to do it.

Here is the solution:

column=[]
for i in range(movies.shape[1]): 
    if ((movies.dtypes[i]==float) | (movies.dtypes[i]==int)): 
        column.append(movies.columns[i])
print(column)
['title_year', 'aspect_ratio', 'duration', 'duration.1', 'budget', 'imdb_score', 'gross']

Where movies is a dataset.

I've tried with this:

column=[movies.columns[i] if ((movies.dtypes[i]==float) | (movies.dtypes[i]==int)) else 0 for i in range(movies.shape[1])]

But the result is:

[0, 'title_year', 0, 'aspect_ratio', 'duration', 0, 0, 'duration.1', 0, 0, 0, 0, 0, 0, 0, 0, 'budget', 'imdb_score', 'gross']

I had to put that 0 in the else sentence because without it I get a syntax error.

So, can I put those 3 lines in just one sentence?

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
Arn. Rojas
  • 53
  • 6
  • See also the `pandas.DataFrame.select_dtypes()` method: https://stackoverflow.com/questions/21271581/selecting-pandas-columns-by-dtype – NicholasM Dec 04 '19 at 15:01

1 Answers1

3

Firstly, you can simplify (x==y) | (x==z) to x in (y, z). Also it's recommended to use logical or instead of bitwise OR | in logical expressions, but that's beside the point.

To answer your question, yes, you just have the syntax a bit confused. Putting if in the expression part of the comprehension makes a ternary. The equivalent in the for loop would be:

for i in range(movies.shape[1]):
    column.append(movies.columns[i] if movies.dtypes[i] in (float, int) else 0)

The way to use an if as a filter is to put it at the end of the comprehension:

column = [movies.columns[i] for i in range(movies.shape[1]) if movies.dtypes[i] in (float, int)]

The syntax for a comprehension is described in the documentation here: Displays for lists, sets and dictionaries. A ternary is called a conditional expression in the Python docs.

marc_s
  • 732,580
  • 175
  • 1,330
  • 1,459
wjandrea
  • 28,235
  • 9
  • 60
  • 81
  • Using this form with the for loop make it seem even shorter. `column = [c for c in movies.columns if movies.dtypes[c] in (float, int)]` – Arn. Rojas Dec 04 '19 at 15:22
  • @Arn.Rojas Definitely! it's always preferable to avoid looping over indexes. I'm not very familiar with Pandas myself otherwise I would have recommended it :P – wjandrea Dec 04 '19 at 15:27
  • How can i search the notation used in `x in (y, z)` ? I want to use it with **not equal** and **and** operators – Arn. Rojas Jan 10 '20 at 13:27
  • @Arn.Rojas in plain Python you could use `any` or `all`, but there's probably a better way to do it in Pandas -- maybe with bitwise operators? You might want to ask a new question about that with more details. – wjandrea Jan 10 '20 at 16:02