1

my data is in a single column and I want to get the split output into 2 column.I need both output columns.

input column is....

column

HOUSEHOLDS-Total households - Female householder- under 18 years  
Total households - Female householder- under 18 years

I want the output ...

column1

HOUSEHOLDS  
Total households

column2

Total households - Female householder- under 18 years  
Female householder- under 18 years
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56
muntasir kabir
  • 168
  • 2
  • 13

2 Answers2

2

if this is your dataset:

df1<- c("HOUSEHOLDS-Total households - Female householder- under 18 years", "Total households - Female householder- under 18 years")

You can use :

regmatches(df1, regexpr("-", df1), invert = TRUE)
[[1]]
[1] "HOUSEHOLDS-Total households "       "Female householder- under 18 years"

[[2]]
[1] "Total households "                   " Female householder- under 18 years"

The output is a list.

Mac
  • 111
  • 1
  • 2
  • 10
2

Here's how to do that with separate from tidyr. Basically, you are separating on the first occurrence of "-". The other occurrences are ignored because of extra = "merge".

df <- read.table(text="'HOUSEHOLDS-Total households - Female householder- under 18 years'  
                      'Total households - Female householder- under 18 years'",
                       header=FALSE,stringsAsFactors=FALSE)

library(tidyr)
df %>% separate(V1, into = c('Col1', 'Col2'),  sep="-", extra = "merge").
               Col1                                                  Col2
1        HOUSEHOLDS Total households - Female householder- under 18 years
2 Total households                     Female householder- under 18 years
Pierre Lapointe
  • 16,017
  • 2
  • 43
  • 56