0

I'm new to R and am working on a code that is able to provide a list of hospitals at specific ranks from every state based on a specific condition (the last assignment from the Johns Hopkins R Programming class on Coursera). I'm auditing the class for the sake of learning R and have been stuck on the last portion.

Essentially I'm taking a data set that contains multiple columns: hospital name, state, and the 30-day death counts for specific conditions. I'm creating a function with arguments outcome and num where outcome is the medical condition and num is the specified rank of the hospital. The end goal is a data frame with a list of hospitals that are ranked at the specific num within their respective state.

What I'm doing is reading the .csv file into a data frame and stripping away the insignificant columns, then relabeling the column headers such that they're easier to reference.

data <- read.csv("outcome-of-care-measures.csv")
data <- select(data, c(2, 7, 11, 17, 23))
colnames(data) <- c("hospital", "state", "heart attack", 
                    "heart failure", "pneumonia")
data[[outcome]] <- as.numeric(as.character(data[[outcome]]))

Then what I'm doing is taking this new data frame with only hospitals, states, and death counts and simplifying it such that the new data frame contains only the column of the death toll that's specified (meaning if "heart attack" is entered, that's the only column selected to be analyzed). My method is taking the list and sorting it alphabetically by state abbreviation, followed by the death count in ascending order.

outcomedata <- select(data, hospital, state, matches(outcome))
outcomedata <- arrange(outcomedata, outcomedata$state, outcomedata[[outcome]])

I'm then splitting the large data frame by state, such that each data frame within this new list of data frames contains the hospitals of ONLY that state.

statesplit <- split(outcomedata, outcomedata$state)

My idea was to create a new column within each data frame that contains the rank of each hospital within each respective state using the rank() function and passing the death rate column as an argument for EACH data frame. In other words, is there a way I can use lapply() to create a new column in each data frame with the ranks of the hospitals within their respective data frame by referencing the variable name for each respective data frame--I'm trying to do something along the lines of:

hospital_rank <- rank(outcomedata[, 2])
outcomedata <- mutate(outcomedata, Rank = hospital_rank)
specific_rank <- num
  
rank_hospital <- filter(outcomedata, outcomedata$Rank == specific_rank)

but do this for EACH data frame and return a data frame with all the hospitals at the specified rank across all states.

Would appreciate any and all help, thanks!!

EDIT: Intended Results

  • take arguments outcome and num into function
  • display data frame with hospital names and states, all of which are at the rank of num within their own state

For example, I have one aspect of the function set up such that num = "best" finds the #1 rank in each state:

outcomedata <- select(data, hospital, state, matches(outcome))
outcomedata <- arrange(outcomedata, outcomedata$state, outcomedata[[outcome]])
outcomedata <- distinct(outcomedata, outcomedata$state, .keep_all = TRUE)

   return(outcomedata)

This returns this output and continues for all states. But instead of finding the #1 ranked, how can I instead find ANY rank as passed as an argument into the initial function?

0 Answers0