0

I am splitting a column in a dataset by using strsplit and wish to map one column to the split data.

Here is a sample dataset:

https://drive.google.com/file/d/1jtrn6Htezz6iRhJN0HaxXowT5JZW52ai/view?usp=sharing

My code is as follows:

library(readr)

df <- read_csv("sample for community.csv", col_names = FALSE)[,1:2]

x<-strsplit(df$X2, '\n')

y5<-x[lapply(x, length) ==5]
y4<-x[lapply(x, length) ==4]
y3<-x[lapply(x, length) ==3]


p5<-data.frame(unlist(lapply(y5, `[[`, 1)),unlist(lapply(y5, `[[`, 2)),unlist(lapply(y5, `[[`, 3)),unlist(lapply(y5, `[[`, 4)),unlist(lapply(y5, `[[`, 5)))
p4<-data.frame(unlist(lapply(y4, `[[`, 1)),unlist(lapply(y4, `[[`, 2)),unlist(lapply(y4, `[[`, 3)),unlist(lapply(y4, `[[`, 4)))
p3<-data.frame(unlist(lapply(y3, `[[`, 1)),unlist(lapply(y3, `[[`, 2)),unlist(lapply(y3, `[[`, 3)))

p5[,5]<-NULL
p3[,4]<-rep("NA")


colnames(p5)<-c("X1","X2","X3","X4")
colnames(p4)<-c("X1","X2","X3","X4")
colnames(p3)<-c("X1","X2","X3","X4")

final<-rbind(p5,p4,p3)

As you can see, the order of the rows change due to some data having a different number of lines.

I wish to merge the first column onto the final dataset but cannot work out how to do so.

In the real dataset it will not be possible to match by matching strings (E.g. Match "String1" with columns containing "String1")

All help is highly appreciated.

Thanks,

Matt

Matt
  • 45
  • 4

2 Answers2

1

Here is a base R solution. There may be smarter ways to do this.

library(readr)

df <- read_csv("sample for community.csv", col_names = FALSE)[,1:2]

x<-strsplit(df$X2, '\n')

lmax= max(sapply(x,length))
p=t(sapply(x, function(x)c(x,rep(NA,lmax-length(x)))))
p=p[,-5]
colnames(p)<-c("X1","X2","X3","X4")
final=as.data.frame(p)
Bing
  • 1,083
  • 1
  • 10
  • 20
0

With tidyverse

library(tidyverse)
df %>% 
   separate(X2, into = paste0("X2_", 1:4), sep="\\s*\n\\s*")
# A tibble: 5 x 5
#  X1      X2_1         X2_2         X2_3         X2_4        
#  <chr>   <chr>        <chr>        <chr>        <chr>       
#1 String1 String1Line1 String1Line2 String1Line3 String1Line4
#2 String2 String2Line1 String2Line2 String2Line3 String2Line4
#3 String3 String3Line1 String3Line2 String3Line3 ""          
#4 String4 String4Line1 String4Line2 String4Line3 String4Line4
#5 String5 String5Line1 String5Line2 String5Line3 String5Line4
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    This is much prettier. I am continuing learning new things from you guys every day. Thanks. – Bing Nov 05 '18 at 21:14