So I was beginning to perform some statistics on a data frame that I have read in using the read_xls function from the readxl package(version:1.1.0) when I realized R wasn't reading in a column the way I wanted it to. The column has a great deal of blanks in the excel spreadsheet, which I believe after doing some research is the issue.After a lot of blank rows, the column does have numeric values that I need to perform my analysis in R. However, when I read it in with the read_xls
function, it gives it a class logical and all NAs... After going over this website readxl it seems clear to me that this problem is due to the blanks in the column. I am still confused though on how to solve this issue since only one of the columns has blanks in the beginning of the dataset. I would appreciate any help or guidance! Thanks! The column that is giving me issues is Rep_Val_Quantity_Avg
dput of data:
dput(head(df_trib,10))
structure(list(NJPDES = c("NJ0020206", "NJ0020532", "NJ0021326",
"NJ0022021", "NJ0022985", "NJ0023361", "NJ0023736", "NJ0024015",
"NJ0024031", "NJ0024040"), Facility_Name = c("ALLENTOWN BORO WWTP",
"HARRISON TWP MULLICA HILL WWTP", "MEDFORD LAKES BOROUGH STP",
"SWEDESBORO WTP", "WRIGHTSTOWN BOROUGH STP", "WILLINGBORO WATER POLLUTION CONTROL PLANT",
"PINELANDS WASTEWATER CO", "MOUNT HOLLY WPCF", "ELMWOOD WTP",
"WOODSTREAM STP"), `Monitored Location Designator` = c("001A",
"001A", "001A", "001A", "001A", "001A", "001A", "001A", "001A",
"001A"), Date = structure(c(1372550400, 1372550400, 1372550400,
1372550400, 1372550400, 1372550400, 1372550400, 1372550400, 1372550400,
1372550400), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
Parameter_Number_DMR = c("00300", "00300", "00300", "00300",
"00300", "00300", "00300", "00300", "00300", "00300"), Parameter = c("Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)",
"Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)", "Oxygen, Dissolved (DO)"
), Sample_Point_Desc = c("Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value", "Effluent Gross Value",
"Effluent Gross Value", "Effluent Gross Value"), Rep_Val_Quantity_Avg = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), X__1 = c(NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA), `Reported Value Quantity Maximum` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), `Quantity Units Description` = c(NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA), Rep_Val_Con_Min = c("7.2",
NA, "7.65", "6.79", NA, NA, "6", NA, "6.6", NA), Val_Con_AVG = c("7.3",
"8.8", NA, "7.58", "7.5", "7.100", "5", "7.8", "6.6", "7.4"
), Rep_Val_Con_Max = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), valunit = c("MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER",
"MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER", "MILLIGRAMS PER LITER"
)), .Names = c("NJPDES", "Facility_Name", "Monitored Location Designator",
"Date", "Parameter_Number_DMR", "Parameter", "Sample_Point_Desc",
"Rep_Val_Quantity_Avg", "X__1", "Reported Value Quantity Maximum",
"Quantity Units Description", "Rep_Val_Con_Min", "Val_Con_AVG",
"Rep_Val_Con_Max", "valunit"), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"))
Code Used:
df_trib<-read_xls("4_Del_Tribs_ DMR data all pull for certain params.xls",
sheet = "NJEMS DATA", col_names = TRUE,
col_types = c("text","text","text","date","text","text","text",
"numeric","numeric","text","numeric",
"numeric","numeric","text","text"))