-1

Help guys, I'm trying to Impute missing values using a function, the function itself works in this respect, but fails to store the imputed values.

The following shows the function, gh_Df is the dataset; val is the value in FacilityName variable and lat and long are the respective values I hope to fill.

fill_lat_long_na <- function(val, lat, long){
  if(is.na(gh_Df[gh_Df$FacilityName == val,]$Latitude)){
    gh_Df[gh_Df$FacilityName == val,]$Latitude <- lat
    gh_Df[gh_Df$FacilityName == val,]$Longitude <- long
  
    
  }
   print(gh_Df[gh_Df$FacilityName == val,])
}

## Check
fill_lat_long_na("Yapesa St.Mary Clinic", 6.43011, -1.33299)

Results

Latitude Longitude
6.43011 -1.33299

However if I go back and run the following outside the function it still shows the empty / NA rows.

print(gh_Df[gh_Df$FacilityName == "Yapesa St.Mary Clinic",])

Results

Latitude Longitude
NA NA

Is there a way to go about this and the values actually change within the dataset?

Thanks.

1 Answers1

0

Usually it is not a good practice to change a dataframe from inside the function. You can return the changed dataframe and save it outside the function.

fill_lat_long_na <- function(val, lat, long){
  if(is.na(gh_Df[gh_Df$FacilityName == val,]$Latitude)){
    gh_Df[gh_Df$FacilityName == val,]$Latitude <- lat
    gh_Df[gh_Df$FacilityName == val,]$Longitude <- long
  }
  return(gh_Df)
}

gh_Df <- fill_lat_long_na("Yapesa St.Mary Clinic", 6.43011, -1.33299)
gh_Df
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thank you @Ronak, really appreciate this. In your opinion what would be a better overall approach to this kind of problem, I'm coming from Python where imputing values could have been rather easily revalued using `map`, I don't know if there is a similar method for R. – afrologicinsect Feb 25 '21 at 09:04
  • If you are going to have to impute one missing value at a time (like in this example) then I think this approach is good. – Ronak Shah Feb 25 '21 at 09:11
  • I agree with @Ronak Shah. Also, donsider adding an argument to the function to define the dataframe that you wish to modify. That makes it more general. if you make the dataframe argument the first argument, then your function can easily be used as part of a pipe, which might improve your workflow. – Limey Feb 25 '21 at 09:30