9

I have a dataset that has dates and interest rates in the same column. I need to split these two numbers into two separate columns, however when I use the following code:

 Split <- str_split(df$Dates, "[ ]", n = 2)
 Dates <- unlist(Split)[1]
 Rates <- unlist(Split)[2]

It returns only the first "value" of each element, i.e., "1971-04-01" for Dates and "7.43" for Rates. I need it to return all values for the portion of the string split and the same for the second portion of the string split

Below is a portion of the dataset, total rows = 518.

    1971-04-01   7.31
    1971-05-01   7.43
    1971-06-01   7.53
    1971-07-01   7.60
    1971-08-01   7.70
    1971-09-01   7.69
    1971-10-01   7.63
    1971-11-01   7.55
    1971-12-01   7.48
    1972-01-01   7.44

Thanks

pnuts
  • 58,317
  • 11
  • 87
  • 139
j riot
  • 544
  • 3
  • 6
  • 16

7 Answers7

12

Could do

Split <- strsplit(as.character(df$Dates), " ", fixed = TRUE)
Dates <- sapply(Split, "[", 1)
Rates <- sapply(Split, "[", 2)
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • This worked perfectly, though I had to use 4 for Rates, due to it picking up 4 separate splits. When I used str_split, I was able to specify the number of actual splits to make, if I didn't it picked up the same number of splits. Thanks! – j riot Jun 30 '14 at 18:33
6

You can use reshape2::colsplit

library(reshape2)
colsplit(df$Dates, ' ', names =  c('Dates','Rates'))
#         Dates Rates
# 1  1971-04-01  7.31
# 2  1971-05-01  7.43
# 3  1971-06-01  7.53
# 4  1971-07-01  7.60
# 5  1971-08-01  7.70
# 6  1971-09-01  7.69
# 7  1971-10-01  7.63
# 8  1971-11-01  7.55
# 9  1971-12-01  7.48
# 10 1972-01-01  7.44
mnel
  • 113,303
  • 27
  • 265
  • 254
3

Perhaps I'm biased, but I would suggest my cSplit function for this problem.

First, I'm assuming we are starting with the following (single column) data.frame (where there are multiple spaces between the "date" value and the "rate" value).

df <- data.frame(
  Date = c("1971-04-01   7.31", "1971-05-01   7.43", "1971-06-01   7.53", 
           "1971-07-01   7.60", "1971-08-01   7.70", "1971-09-01   7.69", 
           "1971-10-01   7.63", "1971-11-01   7.55", "1971-12-01   7.48", 
           "1972-01-01   7.44"))

Next, get the cSplit function from my GitHub Gist, and use it. You can split on a regular expression (here, multiple spaces).

cSplit(df, "Date", "\\s+", fixed = FALSE)
#         Date_1 Date_2
#  1: 1971-04-01   7.31
#  2: 1971-05-01   7.43
#  3: 1971-06-01   7.53
#  4: 1971-07-01   7.60
#  5: 1971-08-01   7.70
#  6: 1971-09-01   7.69
#  7: 1971-10-01   7.63
#  8: 1971-11-01   7.55
#  9: 1971-12-01   7.48
# 10: 1972-01-01   7.44

Since the function converts a data.frame to a data.table, you have access to setnames which would let you rename your columns in place.

setnames(cSplit(df, "Date", "\\s+", fixed = FALSE), c("Dates", "Rates"))[]
#          Dates Rates
#  1: 1971-04-01  7.31
#  2: 1971-05-01  7.43
#  3: 1971-06-01  7.53
#  4: 1971-07-01  7.60
#  5: 1971-08-01  7.70
#  6: 1971-09-01  7.69
#  7: 1971-10-01  7.63
#  8: 1971-11-01  7.55
#  9: 1971-12-01  7.48
# 10: 1972-01-01  7.44
A5C1D2H2I1M1N2O1R2T1
  • 190,393
  • 28
  • 405
  • 485
  • I actually ended up using a concat.split as I had to do this for numerous files of a similar set-up. I was running into errors due to slight spacing issues from the files; nothing like government datasets to be clean and easy to use. – j riot Jul 01 '14 at 17:15
2

Using @user2583119's data (please post minimal reproducible code including a data set):

library(qdap)
colsplit2df(data.frame(Split), sep = " ")

##           X1   X2
## 1 1971-06-01 7.53
## 2 1971-05-01 7.43
## 3 1971-06-01 7.53
Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
1

Also:

 Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53")

Your code selects only the first observation.

 Str <- unlist(str_split(Split, "[ ]", n=2))
 Str[1] 
 #[1] "1971-06-01"

If you look at the output of unlist(..), dates are followed by values. So, you can use a logical index.

Str[c(T,F)]
#[1] "1971-06-01" "1971-05-01" "1971-06-01"

as.numeric(Str[c(F,T)])
#[1] 7.53 7.43 7.53

You can convert to two columns of a dataframe from Split by using read.table

  read.table(text=Split, header=F, sep="",stringsAsFactors=F)
 #         V1   V2
 # 1 1971-06-01 7.53
 # 2 1971-05-01 7.43
 # 3 1971-06-01 7.53
akrun
  • 874,273
  • 37
  • 540
  • 662
  • +1. I personally use the `read.table` trick quite a bit, and used it in my "splitstackshape" package. However, because there are faster options than `read.table`, I've been playing around with creating different functions, like [`cSplit`](https://gist.github.com/mrdwab/11380733) and [`concat.split.DT`](https://gist.github.com/mrdwab/6873058). – A5C1D2H2I1M1N2O1R2T1 Jul 01 '14 at 04:16
  • Thanks `Ananda Mahto` for the functions. It would be better to have `cSplit` in a package. – akrun Jul 01 '14 at 06:45
  • It's planned--just haven't gotten around to testing whether everything really seems OK for committing it to the "splitstackshape" package. :-) – A5C1D2H2I1M1N2O1R2T1 Jul 01 '14 at 13:17
1
df <- data.frame(
Date = c("1971-04-01   7.31", "1971-05-01   7.43", "1971-06-01   7.53", 
         "1971-07-01   7.60", "1971-08-01   7.70", "1971-09-01   7.69", 
         "1971-10-01   7.63", "1971-11-01   7.55", "1971-12-01   7.48", 
         "1972-01-01   7.44"))

do.call(rbind, strsplit(as.character(df$Date), split = '\\s+', fixed = FALSE))
0

Try this:

Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53")
df <- unlist(str_split(string = Split, pattern = "\\s"))
df
lawyeR
  • 7,488
  • 5
  • 33
  • 63