I have a question how to create a new column based on another.
Here is my part of data
:
Category Brand Time1 value Time2 number
2 HTC 2015-01-01 1724 NA 1
6 APPLE 2015-10-10 3000 2015-10-30 1
2 APPLE 2016-01-15 430 NA 1
NA Samsung 2016-10-20 860 2016-12-20 1
I show 4 obs. of data
above, and I explain my data
more:
First, see the structure.
> str(data)
Classes ‘data.table’ and 'data.frame': 105907 obs. of 6 variables:
$ Category : num 2 2 2 2 2 2 2 2 2 2 ...
$ Brand : chr "HTC" "APPLE" "INFOCUS" "APPLE" ...
$ Time1 : POSIXct, format: "2015-01-01" "2015-01-01" "2015-01-01" "2015-01-01" ...
$ value : num 1724 2946 330 2946 2946 ...
$ Time2 : POSIXct, format: NA NA NA "2015-01-03" ...
$ number : chr "1" "1" "1" "1" ...
- attr(*, ".internal.selfref")=<externalptr>
Second, I want to replicate each obs. based on Time1
.
This is my code:
data[,rep:=ifelse(year(Time1)==2016, 12-month(Time1)+1, 13)][rep(1:.N,rep)][]
Now, the data
looks like:
Category Brand Time1 value Time2 number rep
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
2 HTC 2015-01-01 1724 NA 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
6 APPLE 2015-10-10 3000 2015-10-30 1 13
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
2 APPLE 2016-01-15 430 NA 1 12
NA Samsung 2016-10-20 860 2016-12-20 1 3
NA Samsung 2016-10-20 860 2016-12-20 1 3
NA Samsung 2016-10-20 860 2016-12-20 1 3
Third, I want to create a new column Lapse
, and the result I want is:
Category Brand Time1 value Time2 number rep Lapse
2 HTC 2015-01-01 1724 NA 1 13 0
2 HTC 2015-01-01 1724 NA 1 13 1
2 HTC 2015-01-01 1724 NA 1 13 2
2 HTC 2015-01-01 1724 NA 1 13 3
2 HTC 2015-01-01 1724 NA 1 13 4
2 HTC 2015-01-01 1724 NA 1 13 5
2 HTC 2015-01-01 1724 NA 1 13 6
2 HTC 2015-01-01 1724 NA 1 13 7
2 HTC 2015-01-01 1724 NA 1 13 8
2 HTC 2015-01-01 1724 NA 1 13 9
2 HTC 2015-01-01 1724 NA 1 13 10
2 HTC 2015-01-01 1724 NA 1 13 11
2 HTC 2015-01-01 1724 NA 1 13 12
6 APPLE 2015-10-10 3000 2015-10-30 1 13 0
6 APPLE 2015-10-10 3000 2015-10-30 1 13 1
6 APPLE 2015-10-10 3000 2015-10-30 1 13 2
6 APPLE 2015-10-10 3000 2015-10-30 1 13 3
6 APPLE 2015-10-10 3000 2015-10-30 1 13 4
6 APPLE 2015-10-10 3000 2015-10-30 1 13 5
6 APPLE 2015-10-10 3000 2015-10-30 1 13 6
6 APPLE 2015-10-10 3000 2015-10-30 1 13 7
6 APPLE 2015-10-10 3000 2015-10-30 1 13 8
6 APPLE 2015-10-10 3000 2015-10-30 1 13 9
6 APPLE 2015-10-10 3000 2015-10-30 1 13 10
6 APPLE 2015-10-10 3000 2015-10-30 1 13 11
6 APPLE 2015-10-10 3000 2015-10-30 1 13 12
2 APPLE 2016-01-15 430 NA 1 12 0
2 APPLE 2016-01-15 430 NA 1 12 1
2 APPLE 2016-01-15 430 NA 1 12 2
2 APPLE 2016-01-15 430 NA 1 12 3
2 APPLE 2016-01-15 430 NA 1 12 4
2 APPLE 2016-01-15 430 NA 1 12 5
2 APPLE 2016-01-15 430 NA 1 12 6
2 APPLE 2016-01-15 430 NA 1 12 7
2 APPLE 2016-01-15 430 NA 1 12 8
2 APPLE 2016-01-15 430 NA 1 12 9
2 APPLE 2016-01-15 430 NA 1 12 10
2 APPLE 2016-01-15 430 NA 1 12 11
NA Samsung 2016-10-20 860 2016-12-20 1 3 0
NA Samsung 2016-10-20 860 2016-12-20 1 3 1
NA Samsung 2016-10-20 860 2016-12-20 1 3 2
Above is the result I want, I try the code like this:
data[,Lapse := seq_len(.N)-1, by = (Category,Brand,Time1,value,Time2,number)]
However, it is wrong.
If it is right,
uniqie(data$Lapse)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12
But, I got 0~999
. I think my code is wrong.
Any suggestion?
Or maybe there are other good ways to do like this?
UPDATE
data <- " Category Brand Time1 value Time2 number
1: 2 HTC 2015-01-01 1724 NA 1
2: 2 APPLE 2015-01-01 2946 NA 1
3: 2 INFOCUS 2015-01-01 330 NA 1
4: 2 APPLE 2015-01-01 2946 2015-01-03 1
5: 2 APPLE 2015-01-01 2946 NA 1
6: 2 Samsung 2015-01-01 2189 NA 1
7: 2 HTC 2015-01-01 730 NA 1
8: 2 Samsung 2015-01-01 2189 NA 1
9: 2 Samsung 2015-01-01 2189 NA 1
10: 2 HTC 2015-01-01 1296 NA 1
11: 2 HTC 2015-01-01 730 NA 1
12: 2 APPLE 2015-01-01 2189 NA 1
13: 2 INFOCUS 2015-01-01 330 2015-01-02 1
14: 2 HTC 2015-01-01 2189 NA 1
15: 2 SONY 2015-01-01 1296 NA 1
16: 2 HTC 2015-01-01 730 NA 1
17: 2 APPLE 2015-01-01 2946 NA 1
18: 2 APPLE 2015-01-01 2946 NA 1
19: 2 HTC 2015-01-01 1724 NA 1
20: 2 Samsung 2015-01-02 1724 NA 1
21: 2 Samsung 2015-01-02 2189 NA 1
22: 2 HTC 2015-01-02 730 NA 1
23: 2 Samsung 2015-01-02 2189 NA 1
24: 2 HTC 2015-01-02 730 NA 1
25: 2 APPLE 2015-01-02 2946 NA 1
26: 2 HTC 2015-01-02 1724 NA 1
27: 2 HTC 2015-01-02 730 NA 1
28: 2 ASUS 2015-01-02 330 NA 1
29: 2 ASUS 2015-01-02 330 NA 1
30: 2 Samsung 2015-01-02 1724 NA 1
31: 2 APPLE 2015-01-02 2189 NA 1
32: 2 HTC 2015-01-02 730 NA 1
33: 2 Samsung 2015-01-02 730 NA 1
34: 2 HTC 2015-01-02 730 NA 1
35: 2 HTC 2015-01-02 730 NA 1
36: 2 HTC 2015-01-02 730 NA 1
37: 2 Samsung 2015-01-02 730 NA 1
38: 2 APPLE 2015-01-03 2189 NA 1
39: 2 APPLE 2015-01-03 2946 NA 1
40: 2 Benten 2015-01-03 330 NA 1
41: 2 APPLE 2015-01-03 2946 NA 1
42: 2 INFOCUS 2015-01-03 330 NA 1
43: 2 Samsung 2015-01-03 1296 NA 1
44: 2 HTC 2015-01-03 730 NA 1
45: 2 Samsung 2015-01-03 2189 NA 1
46: 2 SONY 2015-01-03 2189 NA 1
47: 2 TaiwanMobile 2015-01-03 730 NA 1
48: 2 HTC 2015-01-03 1296 NA 1
49: 2 HTC 2015-01-03 730 NA 1
50: 2 APPLE 2015-01-03 2189 NA 1
51: 2 APPLE 2015-01-03 2189 NA 1
52: 2 HTC 2015-01-03 730 NA 1
53: 2 Samsung 2015-01-03 330 NA 1
54: 2 TaiwanMobile 2015-01-03 730 NA 1
55: 2 HTC 2015-01-03 730 NA 1
56: 2 HTC 2015-01-03 730 NA 1
57: 2 TaiwanMobile 2015-01-03 330 NA 1
58: 2 Samsung 2015-01-03 1724 2015-01-04 1
59: 2 HTC 2015-01-03 730 NA 1
60: 2 INFOCUS 2015-01-03 330 NA 1
61: 2 SONY 2015-01-03 730 NA 1
62: 2 HTC 2015-01-04 730 NA 1
63: 2 HTC 2015-01-04 730 NA 1
64: 2 APPLE 2015-01-04 2189 2015-01-05 1
65: 2 TaiwanMobile 2015-01-04 730 2015-01-05 1"
data <- read.table(text=data, header = TRUE)
data <- as.data.table(data)
data <- data[,rep:=ifelse(year(Time1)==2016, 12-month(Time1)+1, 13)][rep(1:.N,rep)][]
data[, Lapse := seq_len(.N)-1 , .(Category, Brand, Time1, value, Time2, number)]
dput(droplevels(head(data,65)))
structure(list(Category = c(2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2), Brand = c("HTC", "APPLE",
"INFOCUS", "APPLE", "APPLE", "Samsung", "HTC", "Samsung", "Samsung",
"HTC", "HTC", "APPLE", "INFOCUS", "HTC", "SONY", "HTC", "APPLE",
"APPLE", "HTC", "Samsung", "Samsung", "HTC", "Samsung", "HTC",
"APPLE", "HTC", "HTC", "ASUS", "ASUS", "Samsung", "APPLE", "HTC",
"Samsung", "HTC", "HTC", "HTC", "Samsung", "APPLE", "APPLE",
"Benten", "APPLE", "INFOCUS", "Samsung", "HTC", "Samsung", "SONY",
"TaiwanMobile", "HTC", "HTC", "APPLE", "APPLE", "HTC", "Samsung",
"TaiwanMobile", "HTC", "HTC", "TaiwanMobile", "Samsung", "HTC",
"INFOCUS", "SONY", "HTC", "HTC", "APPLE", "TaiwanMobile"), Time1 = structure(c(1420070400,
1420070400, 1420070400, 1420070400, 1420070400, 1420070400, 1420070400,
1420070400, 1420070400, 1420070400, 1420070400, 1420070400, 1420070400,
1420070400, 1420070400, 1420070400, 1420070400, 1420070400, 1420070400,
1420156800, 1420156800, 1420156800, 1420156800, 1420156800, 1420156800,
1420156800, 1420156800, 1420156800, 1420156800, 1420156800, 1420156800,
1420156800, 1420156800, 1420156800, 1420156800, 1420156800, 1420156800,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420243200, 1420243200, 1420243200, 1420243200, 1420243200, 1420243200,
1420329600, 1420329600, 1420329600, 1420329600), class = c("POSIXct",
"POSIXt"), tzone = "UTC"), value = c(1724, 2946, 330, 2946,
2946, 2189, 730, 2189, 2189, 1296, 730, 2189, 330, 2189, 1296,
730, 2946, 2946, 1724, 1724, 2189, 730, 2189, 730, 2946, 1724,
730, 330, 330, 1724, 2189, 730, 730, 730, 730, 730, 730, 2189,
2946, 330, 2946, 330, 1296, 730, 2189, 2189, 730, 1296, 730,
2189, 2189, 730, 330, 730, 730, 730, 330, 1724, 730, 330, 730,
730, 730, 2189, 730), Time2 = structure(c(NA, NA, NA, 1420243200,
NA, NA, NA, NA, NA, NA, NA, NA, 1420156800, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, NA, 1420329600, NA, NA, NA, NA, NA, 1420416000,
1420416000), class = c("POSIXct", "POSIXt"), tzone = "UTC"),
number = c("1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1", "1",
"1", "1", "1", "1", "1", "1", "1")), .Names = c("Category",
"Brand", "Time1", "value", "Time2", "number"), row.names = c(NA,
-65L), .internal.selfref = <pointer: 0x003e24a0>, class = c("data.table",
"data.frame"))
And the problem is the result is weird.
unique(data$Lapse)
[1] 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
[38] 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73
[75] 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90