R - Using str_split and unlist to create two columns

Question

I have a dataset that has dates and interest rates in the same column. I need to split these two numbers into two separate columns, however when I use the following code:

 Split <- str_split(df$Dates, "[ ]", n = 2)
 Dates <- unlist(Split)[1]
 Rates <- unlist(Split)[2]

It returns only the first "value" of each element, i.e., "1971-04-01" for Dates and "7.43" for Rates. I need it to return all values for the portion of the string split and the same for the second portion of the string split

Below is a portion of the dataset, total rows = 518.

    1971-04-01   7.31
    1971-05-01   7.43
    1971-06-01   7.53
    1971-07-01   7.60
    1971-08-01   7.70
    1971-09-01   7.69
    1971-10-01   7.63
    1971-11-01   7.55
    1971-12-01   7.48
    1972-01-01   7.44

Thanks

David Arenburg · Accepted Answer · 2016-03-12T18:03:22.250

12

Could do

Split <- strsplit(as.character(df$Dates), " ", fixed = TRUE)
Dates <- sapply(Split, "[", 1)
Rates <- sapply(Split, "[", 2)

edited Mar 12 '16 at 18:03

answered Jun 30 '14 at 18:24

David Arenburg

91,361
17
137
196

This worked perfectly, though I had to use 4 for Rates, due to it picking up 4 separate splits. When I used str_split, I was able to specify the number of actual splits to make, if I didn't it picked up the same number of splits. Thanks! – j riot Jun 30 '14 at 18:33

mnel · Answer 2 · 2014-07-01T06:50:32.693

6

You can use reshape2::colsplit

library(reshape2)
colsplit(df$Dates, ' ', names =  c('Dates','Rates'))
#         Dates Rates
# 1  1971-04-01  7.31
# 2  1971-05-01  7.43
# 3  1971-06-01  7.53
# 4  1971-07-01  7.60
# 5  1971-08-01  7.70
# 6  1971-09-01  7.69
# 7  1971-10-01  7.63
# 8  1971-11-01  7.55
# 9  1971-12-01  7.48
# 10 1972-01-01  7.44

edited Jul 01 '14 at 06:50

answered Jun 30 '14 at 23:53

mnel

113,303
27
265
254

3

Now you can also look at `tidyr::separate()` that does the same thing – Rekyt Oct 15 '18 at 09:53

score 3 · Answer 3 · answered Jul 01 '14 at 04:13

Perhaps I'm biased, but I would suggest my cSplit function for this problem.

First, I'm assuming we are starting with the following (single column) data.frame (where there are multiple spaces between the "date" value and the "rate" value).

df <- data.frame(
  Date = c("1971-04-01   7.31", "1971-05-01   7.43", "1971-06-01   7.53", 
           "1971-07-01   7.60", "1971-08-01   7.70", "1971-09-01   7.69", 
           "1971-10-01   7.63", "1971-11-01   7.55", "1971-12-01   7.48", 
           "1972-01-01   7.44"))

Next, get the cSplit function from my GitHub Gist, and use it. You can split on a regular expression (here, multiple spaces).

cSplit(df, "Date", "\\s+", fixed = FALSE)
#         Date_1 Date_2
#  1: 1971-04-01   7.31
#  2: 1971-05-01   7.43
#  3: 1971-06-01   7.53
#  4: 1971-07-01   7.60
#  5: 1971-08-01   7.70
#  6: 1971-09-01   7.69
#  7: 1971-10-01   7.63
#  8: 1971-11-01   7.55
#  9: 1971-12-01   7.48
# 10: 1972-01-01   7.44

Since the function converts a data.frame to a data.table, you have access to setnames which would let you rename your columns in place.

setnames(cSplit(df, "Date", "\\s+", fixed = FALSE), c("Dates", "Rates"))[]
#          Dates Rates
#  1: 1971-04-01  7.31
#  2: 1971-05-01  7.43
#  3: 1971-06-01  7.53
#  4: 1971-07-01  7.60
#  5: 1971-08-01  7.70
#  6: 1971-09-01  7.69
#  7: 1971-10-01  7.63
#  8: 1971-11-01  7.55
#  9: 1971-12-01  7.48
# 10: 1972-01-01  7.44

I actually ended up using a concat.split as I had to do this for numerous files of a similar set-up. I was running into errors due to slight spacing issues from the files; nothing like government datasets to be clean and easy to use. — j riot, Jul 01 '14 at 17:15

score 2 · Answer 4 · answered Jun 30 '14 at 18:43

2

Using @user2583119's data (please post minimal reproducible code including a data set):

library(qdap)
colsplit2df(data.frame(Split), sep = " ")

##           X1   X2
## 1 1971-06-01 7.53
## 2 1971-05-01 7.43
## 3 1971-06-01 7.53

answered Jun 30 '14 at 18:43

Tyler Rinker

108,132
65
322
519

akrun · Answer 5 · 2014-06-30T18:54:07.113

1

Also:

 Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53")

Your code selects only the first observation.

 Str <- unlist(str_split(Split, "[ ]", n=2))
 Str[1] 
 #[1] "1971-06-01"

If you look at the output of unlist(..), dates are followed by values. So, you can use a logical index.

Str[c(T,F)]
#[1] "1971-06-01" "1971-05-01" "1971-06-01"

as.numeric(Str[c(F,T)])
#[1] 7.53 7.43 7.53

You can convert to two columns of a dataframe from Split by using read.table

  read.table(text=Split, header=F, sep="",stringsAsFactors=F)
 #         V1   V2
 # 1 1971-06-01 7.53
 # 2 1971-05-01 7.43
 # 3 1971-06-01 7.53

edited Jun 30 '14 at 18:54

answered Jun 30 '14 at 18:43

akrun

874,273
37
540
662

+1. I personally use the `read.table` trick quite a bit, and used it in my "splitstackshape" package. However, because there are faster options than `read.table`, I've been playing around with creating different functions, like [`cSplit`](https://gist.github.com/mrdwab/11380733) and [`concat.split.DT`](https://gist.github.com/mrdwab/6873058). – A5C1D2H2I1M1N2O1R2T1 Jul 01 '14 at 04:16
Thanks `Ananda Mahto` for the functions. It would be better to have `cSplit` in a package. – akrun Jul 01 '14 at 06:45
It's planned--just haven't gotten around to testing whether everything really seems OK for committing it to the "splitstackshape" package. :-) – A5C1D2H2I1M1N2O1R2T1 Jul 01 '14 at 13:17

score 1 · Answer 6 · answered Mar 09 '17 at 03:34

1

df <- data.frame(
Date = c("1971-04-01   7.31", "1971-05-01   7.43", "1971-06-01   7.53", 
         "1971-07-01   7.60", "1971-08-01   7.70", "1971-09-01   7.69", 
         "1971-10-01   7.63", "1971-11-01   7.55", "1971-12-01   7.48", 
         "1972-01-01   7.44"))

do.call(rbind, strsplit(as.character(df$Date), split = '\\s+', fixed = FALSE))

answered Mar 09 '17 at 03:34

Liangbo Huang

11
1

1

please give more clarifications.refer this http://stackoverflow.com/help/how-to-answer – isuruAb Mar 09 '17 at 04:11

lawyeR · Answer 7 · 2014-06-30T19:06:51.603

0

Try this:

Split <- c("1971-06-01 7.53", "1971-05-01 7.43", "1971-06-01 7.53")
df <- unlist(str_split(string = Split, pattern = "\\s"))
df

edited Jun 30 '14 at 19:06

answered Jun 30 '14 at 18:24

lawyeR

7,488
5
33
63

R - Using str_split and unlist to create two columns

7 Answers7