I am working with a big data base and I am trying to chunk a balance panel in balanced panels so I can perform a plm
loop regression.
I have been trying to divide my data frame bb3
(balance grouping) but I can't manage to get what I want, this is my sample data bb3
in my drive as a .RData
First attempt
Following this question I tried using split.default
:
n<-6
f<-gl(n,nrow(bb3) / n )
B<-split.default(bb3, f)
B
is close to what I want, but just for my first list B[["1"]]
the other 5 lists are empty and I get the following error:
Warning message:
In split.default(bb3, f) : data length is not a multiple of split variable
TL:DR I want to generate a list divided in 6 chunks, each with the 102 clvs
with consecutive date
. Even if they are not balanced, if they are consecutive I can balanced them with make.pbalanced
. This will be my ideal solution
Second attempt
I generated Index
to identify every date
of every clvs
(Index
is already generated in bb3
but for the sake of the exercise I am explaining how I did it)
bb3<- bb3 %>%
group_by(clvs) %>%
mutate(Index = row_number())
The problem is that I don't know how to split it:
si<-split.data.frame(bb3,bb3$Index)
Again si
closely relates to what I need, it gives me a list of 551 balance list (102 clvs
and one date
). I want to divide bb3
given 19 segments of Index
. This list (1:19,20:39,40:59,...) will determine how I divide my data frame
I am sure there should be an easy way with function
and lapply
but I can't come with this solution.
TL:DR I want to divide in 19 chunks my data frame, each chunk with 102 clvs
each one with 29 periods date
: a list of 19 chunks each with 2,958 observations
The solution I got by @Allan Cameron cut my data base in the size I need it but it doesn't work as a balance panel because it doesn't cut consecutive my datetime
B <- split(bb3, 1:6)
head(B[["6"]])
clvs fecha hora pml pml_ene pml_per pml_cng datetime date
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <fct> <dttm>
1 07AC~ 2017~ 6 389. 388. 1.07 0 2017-02~ 2017-02-28 07:00:00
2 07AC~ 2017~ 12 677. 674. 3.25 0 2017-02~ 2017-02-28 13:00:00
3 07AC~ 2017~ 18 667. 664. 3.15 0 2017-02~ 2017-02-28 19:00:00
4 07AC~ 2017~ 24 666. 664. 2.44 0 2017-03~ 2017-03-01 01:00:00
5 07AC~ 2017~ 6 664. 662. 2.05 0 2017-03~ 2017-03-01 07:00:00
6 07AC~ 2017~ 12 434. 431. 2.97 0 2017-03~ 2017-03-01 13:00:00
tail(B[["6"]])
clvs fecha hora pml pml_ene pml_per pml_cng datetime date
<fct> <chr> <chr> <dbl> <dbl> <dbl> <dbl> <fct> <dttm>
1 07XC~ 2017~ 18 430. 443. -13.6 0 2017-03~ 2017-03-21 19:00:00
2 07XC~ 2017~ 24 418. 426. -8.41 0 2017-03~ 2017-03-22 01:00:00
3 07XC~ 2017~ 6 442. 444. -2.15 0 2017-03~ 2017-03-22 07:00:00
4 07XC~ 2017~ 12 451. 464. -12.4 0 2017-03~ 2017-03-22 13:00:00
5 07XC~ 2017~ 18 450. 461. -11.2 0 2017-03~ 2017-03-22 19:00:00
6 07XC~ 2017~ 24 452. 467. -15 0 2017-03~ 2017-03-23 01:00:00
So I still can't balance my panel, nor use it with plm
How can I chunk it in 6 with consecutive dates?