I am currently trying to count the number of NAs found in each of my dataset's columns.
I am running the following code:
function(x, df1, df2, ncp, log = FALSE)
apply(Total_HousingData, 2, function(x) {sum(is.na(x))})
Here is my output:
Id MSSubClass MSZoning LotFrontage LotArea Street
0 0 0 0 0 0
Alley LotShape LandContour Utilities LotConfig LandSlope
0 0 0 0 0 0
Neighborhood Condition1 Condition2 BldgType HouseStyle OverallQual
0 0 0 0 0 0
OverallCond YearBuilt YearRemodAdd RoofStyle RoofMatl Exterior1st
0 0 0 0 0 0
Exterior2nd MasVnrType MasVnrArea ExterQual ExterCond Foundation
0 0 0 0 0 0
BsmtQual BsmtCond BsmtExposure BsmtFinType1 BsmtFinSF1 BsmtFinType2
0 0 0 0 1 0
BsmtFinSF2 BsmtUnfSF TotalBsmtSF Heating HeatingQC CentralAir
1 1 1 0 0 0
Electrical 1stFlrSF 2ndFlrSF LowQualFinSF GrLivArea BsmtFullBath
0 0 0 0 0 2
BsmtHalfBath FullBath HalfBath BedroomAbvGr KitchenAbvGr KitchenQual
2 0 0 0 0 0
TotRmsAbvGrd Functional Fireplaces FireplaceQu GarageType GarageYrBlt
0 0 0 0 0 0
GarageFinish GarageCars GarageArea GarageQual GarageCond PavedDrive
0 1 1 0 0 0
WoodDeckSF OpenPorchSF EnclosedPorch 3SsnPorch ScreenPorch PoolArea
0 0 0 0 0 0
PoolQC Fence MiscFeature MiscVal MoSold YrSold
0 0 0 0 0 0
SaleType SaleCondition SalePrice
0 0 1459
For some reason, all of the NA counts are being counted on the SalePrice variable. When I look at other variables, there are plenty of NAs. I tried factoring the appropriate variables, but this still hasn't fixed the issue.
"Alley" for instance should read 1, but its NA is not being picked up.
Here is a sample of the code:
Id MSSubClass MSZoning LotFrontage LotArea Street Alley LotShape LandContour Utilities
<dbl> <dbl> <chr> <chr> <dbl> <chr> <chr> <chr> <chr> <chr>
1 1 60 RL 65 8450 Pave NA Reg Lvl AllPub
2 2 20 RL 80 9600 Pave NA Reg Lvl AllPub
3 3 60 RL 68 11250 Pave NA IR1 Lvl AllPub
4 4 70 RL 60 9550 Pave NA IR1 Lvl AllPub
5 5 60 RL 84 14260 Pave NA IR1 Lvl AllPub
6 6 50 RL 85 14115 Pave NA IR1 Lvl AllPub