-1

How would I combine any row with a single column entry into a single combined input in a new column? e.g. when column A has value, but B-C are empty, I would like to merge the row entries into a single input in column D.

original file is a txt file that looks like this:

A|B|C 
1|2|3
1
text
2
[end]
4|5|6
2
1
[end]
df <-read.delim("file.txt", header=TRUE, sep="|", blank.lines.skip = TRUE)

    A B C
    1 2 3  
    1
    text
    2
    [end]
    4 5 6
    2
    1
    [end]

desired out data table with newly added column D:

    A B C D
    1 2 3 1 text 2 [end]
    4 5 6 2 1 [end]

I imagine this would be combination of is.na and mutate functions but have been unable to find a solution. The code could also include ends_with("[end]") since each row that I want to combine ends with this text. Any thoughts on this?

ctk1100
  • 13
  • 4
  • Is this an existing data frame or are you reading in a source CSV file or what? How is the data currently stored? – Paul Stafford Allen Oct 18 '22 at 07:27
  • Just to be sure: is this what your `df` look like: `df <- data.frame( A = c(1,1, "text", 2, "[end]", 4), B = c(2, "", "", "", "", 5), C = c(3, "", "", "", "", 6) )` – Chris Ruehlemann Oct 18 '22 at 07:28
  • The original data source is a TXT file that is "|" separated, but the last column entry for each row includes line breaks that cause each input from this column to be input as new rows, rather than all being put in that last column. The shown data frame is a simplified example of the txt file that is uploaded using df <- read.delim(df.txt, header =TRUE, sep = "|"). Each row in the df represents a person and the last column is a text report that has line breaks. Thus my solution to the line breaks is merging them per above – ctk1100 Oct 18 '22 at 19:05

1 Answers1

0

Not sure if this is what you need given that the questions about your data structure are unanswered:

library(tidyverse)    
df %>%
  # change empty cells to NA:
  mutate(across(everything(), ~na_if(., ""))) %>%
  # filter rows with NA:
  filter(if_any(everything(), is.na)) %>%
  # contract rows in new column `D`:
  summarise(D = str_c(A, collapse = " ")) %>%
  # bind original `df` (after mutations) to result:
  bind_cols(df %>%
              mutate(across(everything(), ~na_if(., ""))) %>%
              filter(!if_any(everything(), is.na)), .) %>%
  # remove duplicated values in `D`:
  mutate(D = ifelse(duplicated(D), NA, D))
  A B C              D
1 1 2 3 1 text 2 [end]
2 4 5 6           <NA>

Data:

df <- data.frame(
  A = c(1,1, "text", 2, "[end]", 4),
  B = c(2, "", "", "", "", 5), 
  C = c(3, "", "", "", "", 6)
  )
Chris Ruehlemann
  • 20,321
  • 4
  • 12
  • 34
  • Thanks for your reply. This is very close, but it only works for the first row and then duplicates the new entries in column D for the second row. I just posted showing more on the data structure - is this helpful to see? – ctk1100 Nov 04 '22 at 02:04