0

Just a random q. If there's a dataframe, df, from the Boston Homes ds, and I'm trying to do EDA on a few of the columns, set to a variable feature_cols, which I could use afterwards to check for na, how would one go about this? I have the following, which is throwing an error: enter image description here

This is what I was hoping to try to do after the above: enter image description here

Any feedback would be greatly appreciated. Thanks in advance.

A.B
  • 20,110
  • 3
  • 37
  • 71
daniness
  • 363
  • 1
  • 4
  • 21

2 Answers2

1

You need to store the names of the columns only in an array, to access multiple columns, for example

feature_cols = ['RM','ZN','B']

now accessing it as

x = df[feature_cols]

Now to iterate on columns of df, you can use

for column in df[feature_cols]:
    print(df[column]) # or anything

As per your updated comment,. if your end goal is to see null counts only, you can achieve without looping., e.g

df[feature_cols].info(verbose=True,null_count=True)
A.B
  • 20,110
  • 3
  • 37
  • 71
  • Thanks, @A.B . That seemed to help. Do you know what the issue is now in this for loop: #Check the above columns for missing values for column in x: column.isna().values.any() ...it's throwing an "AttributeError: 'str' object has no attribute 'isna'" error. I've tried for column in feature_cols: column.isna().values.any() as well. – daniness Feb 04 '22 at 21:47
  • 1
    updated the answer for the loop thing – A.B Feb 04 '22 at 21:51
  • Thanks, @A.B. This helped. – daniness Feb 04 '22 at 22:00
1

There are two problems in your pictures. First is a keyError, because if you want to access subset of columns of a dataframe, you need to pass the names of the columns in a list not a tuple, so the first line should be

feature_cols = df[['RM','ZN','B']]

However, this will return a dataframe with three columns. What you want to use in the for loop can not work with pandas. We usually iterate over rows, not columns, of a dataframe, you can use the one line:

df.isna().sum()

This will print all names of columns of the dataframe along with the count of the number of missing values in each column. Of course, if you want to check only a subset of columns, you can. replace df buy df[list_of_columns_names].

Ahmed Elashry
  • 389
  • 2
  • 12