How do I select certain columns from Python?

Question

I must be doing a very basic mistake. I am trying to select only certain columns from a dataframe, dropping the na rows. I also am supposed to reset the row index after removing the rows.
This is what my dataset looks like

     CRIM     ZN     INDUS  CHAS   NOX    ...  TAX  PTRATIO  B        LSTAT  MEDV                                        
0    0.00632  18.0   2.31   0.0    0.538  ...  296     15.3  396.90   4.98   24.0
1    0.02731   0.0   7.07   0.0    0.469  ...  242     17.8  396.90   9.14   21.6
2    0.02729   0.0   7.07   0.0    0.469  ...  242     17.8  392.83   4.03   34.7

This is what I have tried so far

F = HousingData.dropna(subset = ['CRIM', 'ZN', 'INDUS'])

this first attempt just gives no output

HousingData.select("CRIM").show("CRIM")

this one gives the error message AttributeError: 'DataFrame' object has no attribute 'select'

cheers!

Try `F=HousingData[['CRIM', 'ZN', 'INDUS']].dropna()` – Redox Oct 09 '22 at 08:28 — Redox, Oct 09 '22 at 08:28

score 0 · Accepted Answer · answered Oct 09 '22 at 09:35

there are few problems. first when you use dropna you can indicate the parameter inplace=True, or work with the output of the method which in your code you named F.

Second I do belive that you are used to R and not python, whilst in R you select rows using select in python do not; you can use either HousingData.loc[:, my_colum] or HousingData["my_colum"] here there is more info for pandas dataframe indexing

Finally, I'm not sure what you what to do with show() but is also not valid for python you can use plot, head or values ...

HousingData.dropna(subset=['CRIM', 'ZN', 'INDUS'], inplace=True)
HousingData["CRIM"].plot() # visualize the first 5 values
# HousingData["CRIM"].head() # visualize the first 5 values

# if you don't use inplace=True
F = HousingData.dropna(subset=['CRIM', 'ZN', 'INDUS'])
F["CRIM"].plot()

How do I select certain columns from Python?

1 Answers1