-1

How this is working? I know the intuition behind it that given movie_dataset(using panda we have loaded it in "md" and we are finding those rows in 'votecount' which are not null and converting them to int.

but i am not understanding the syntax.

  • 2
    Refer [How to ask good question on SO](https://stackoverflow.com/help/mcve) – Sociopath Dec 17 '18 at 10:57
  • Use `md.loc[md['vote_count'].notnull(), 'vote_count'].astype(int)` to prevent index chaining which is generally a bad idea in pandas. How it works is better explained with this statement, the first arg to loc is a boolean series created where vote_count is not null. The second arg to loc is what column to return then cast that return as an integer. – Scott Boston Dec 17 '18 at 15:23

1 Answers1

1

md[md['vote_count'].notnull()] returns a filtered view of your current md dataframe where vote_count is not NULL. Which is being set to the variable vote_counts This is Boolean Indexing.

# Assume this dataframe
df = pd.DataFrame(np.random.randn(5,3), columns=list('ABC'))
df.loc[2,'B'] = np.nan

when you do df['B'].notnull() it will return a boolean vector which can be used to filter your data where the value is True

df['B'].notnull()

0     True
1     True
2    False
3     True
4     True
Name: B, dtype: bool


df[df['B'].notnull()]

         A          B           C
0   -0.516625   -0.596213   -0.035508
1   0.450260    1.123950    -0.317217
3   0.405783    0.497761    -1.759510
4   0.307594    -0.357566   0.279341
It_is_Chris
  • 13,504
  • 2
  • 23
  • 41