0

i am trying to reorganize my dataframe with two colums into a dataframe with three columns. The problem looks ass follows:

In my dataframe i have two columns: "Surgery" Information and "Date" of Operation. In the Column "Surgery" there is always the Patient-ID number followed by the different numbers defining the surgery procedure steps (can be from 1-6 surgery steps). In the second column is the date. One row of NAs seperates each Patient from the next.

Surgery Date NA NA 798873 NA 8-136.10 2018-01-03 5-540.21 2018-01-03 5-555.2 2018-01-03 NA NA 797997 NA 1-453.1 2018-01-15 NA NA 799182 NA 5-540.21 2018-01-11 5-92B.X 2018-01-11

How it should look

Patient  Procedures    Date
798873   8-136.10      2018-01-03
798873   5-540.21      2018-01-03
798873   5-555.2       2018-01-03
797997   1-453.1       2018-01-15
799182   5-540.21      2018-01-11
799182   5-92B.X       2018-01-11

Any Genius out there knows how to solve this? I am lost. Thanks!

1 Answers1

0

Here's an approach with tidyverse (using dplyr and tidyr functions).

library(tidyverse)

# load data
df <- read.table(
  header = T,
  stringsAsFactors = F,
  text = "Surgery  Date
  NA       NA
  798873   NA
  8-136.10 2018-01-03
  5-540.21 2018-01-03
  5-555.2  2018-01-03
  NA       NA
  797997   NA
  1-453.1  2018-01-15
  NA       NA
  799182   NA
  5-540.21 2018-01-11
  5-92B.X  2018-01-11"
)

df2 <- df %>%
  # remove rows with blank in Surgery
  drop_na(Surgery) %>%
  # Add Patient column using Surgery where Date is blank
  mutate(Patient = if_else(is.na(Date), Surgery, NA_character_)) %>%
  # Fill Patient down into blank rows
  fill(Patient) %>%
  # Remove rows with blank Date
  drop_na(Date) %>%
  # Rename and sort columns
  select(Patient, Procedures = Surgery, Date)



> df2
  Patient Procedures       Date
2  798873   8-136.10 2018-01-03
3  798873   5-540.21 2018-01-03
4  798873    5-555.2 2018-01-03
6  797997    1-453.1 2018-01-15
8  799182   5-540.21 2018-01-11
9  799182    5-92B.X 2018-01-11
Jon Spring
  • 55,165
  • 4
  • 35
  • 53