0

I am trying to split one column into three columns so I can give a date format. Currently the data set looks like this

YYYYMMDD         Number
20020101         0.21
20020102         0.34
20020103         1.22

I want it to look like this

Year    Month  Day  Number
2002    01     01   0.21
2002    01     02   0.34
2002    01     03   1.22

I have the following code written and it works in the sense that i can split up the column, but in doing so I create new data frames and I am unsure how to then add back in the data.frame to the original data.set

  • dataset=data

Is there a nicer way to do it? or how to do I get new2 + new to combine with data?

res <- strsplit(data$YYYYMMDD, "(?<=.{4})" , perl = TRUE)
new<-do.call(rbind, res)
summary(new)
colnames(new)<-c("Year", "MMDD")
new<-as.data.frame(new)
new$MMDD<-as.character(new$MMDD)
res <- strsplit(new$MMDD, "(?<=.{2})" , perl = TRUE)
new2<-do.call(rbind, res)
summary(new2)
colnames(new2)<-c("Month", "Dom")
new2<-as.data.frame(new2)
Fosulli
  • 13
  • 4

4 Answers4

2

With substring:

x <- mapply(substring, c(1, 5, 7), c(4, 6, 8),
            MoreArgs = list(text = df$YYYYMMDD), SIMPLIFY = F)
names(x) <- c('Year', 'Month', 'Day')
cbind(as.data.frame(x), df[-1])
#   Year Month Day Number
# 1 2002    01  01   0.21
# 2 2002    01  02   0.34
# 3 2002    01  03   1.22
mt1022
  • 16,834
  • 5
  • 48
  • 71
1

We can do this easily with separate

library(tidyr)
separate(df1, YYYYMMDD, into = c('Year', 'Month', 'Day'), sep=c(4, 6))
#   Year Month Day Number
#1 2002    01  01   0.21
#2 2002    01  02   0.34
#3 2002    01  03   1.22
akrun
  • 874,273
  • 37
  • 540
  • 662
1

You can try this (with your variable YYYYMMDD as character):

year = substr(data$YYYYMMDD,1,4)
month = substr(data$YYYYMMDD,5,6)
day = substr(data$YYYYMMDD,7,8)

new_data = as.data.frame(cbind(year,month,day,data$Number))
colnames(new_data)[4] = "Number"
MBnnn
  • 308
  • 2
  • 13
0

You can do it with lubridate like so:


library(tidyverse)
library(lubridate)

data %>% 
  mutate(
    YYYYMMDD = as.Date(as.character(YYYYMMDD), format = "%Y%m%d"),
    year = year(YYYYMMDD),
    month = month(YYYYMMDD),
    day = mday(YYYYMMDD)
    ) 
#>     YYYYMMDD Number year month day
#> 1 2002-01-01   0.21 2002     1   1
#> 2 2002-01-02   0.34 2002     1   2
#> 3 2002-01-03   1.22 2002     1   3
yeedle
  • 4,918
  • 1
  • 22
  • 22