0

I am working with a big data base and I am trying to chunk a balance panel in balanced panels so I can perform a plm loop regression.

I have been trying to divide my data frame bb3 (balance grouping) but I can't manage to get what I want, this is my sample data bb3 in my drive as a .RData

bb3

First attempt

Following this question I tried using split.default :

n<-6
f<-gl(n,nrow(bb3) / n )
B<-split.default(bb3, f)

B is close to what I want, but just for my first list B[["1"]] the other 5 lists are empty and I get the following error:

Warning message:
In split.default(bb3, f) : data length is not a multiple of split variable

TL:DR I want to generate a list divided in 6 chunks, each with the 102 clvs with consecutive date. Even if they are not balanced, if they are consecutive I can balanced them with make.pbalanced. This will be my ideal solution

Second attempt

I generated Index to identify every date of every clvs (Index is already generated in bb3 but for the sake of the exercise I am explaining how I did it)

bb3<- bb3 %>% 
  group_by(clvs) %>% 
  mutate(Index = row_number())

The problem is that I don't know how to split it:

si<-split.data.frame(bb3,bb3$Index)

Again si closely relates to what I need, it gives me a list of 551 balance list (102 clvs and one date). I want to divide bb3 given 19 segments of Index. This list (1:19,20:39,40:59,...) will determine how I divide my data frame

I am sure there should be an easy way with function and lapply but I can't come with this solution.

TL:DR I want to divide in 19 chunks my data frame, each chunk with 102 clvs each one with 29 periods date: a list of 19 chunks each with 2,958 observations

The solution I got by @Allan Cameron cut my data base in the size I need it but it doesn't work as a balance panel because it doesn't cut consecutive my datetime

B <- split(bb3, 1:6)
head(B[["6"]])
clvs  fecha hora    pml pml_ene pml_per pml_cng datetime date               
  <fct> <chr> <chr> <dbl>   <dbl>   <dbl>   <dbl> <fct>    <dttm>             
1 07AC~ 2017~ 6      389.    388.    1.07       0 2017-02~ 2017-02-28 07:00:00
2 07AC~ 2017~ 12     677.    674.    3.25       0 2017-02~ 2017-02-28 13:00:00
3 07AC~ 2017~ 18     667.    664.    3.15       0 2017-02~ 2017-02-28 19:00:00
4 07AC~ 2017~ 24     666.    664.    2.44       0 2017-03~ 2017-03-01 01:00:00
5 07AC~ 2017~ 6      664.    662.    2.05       0 2017-03~ 2017-03-01 07:00:00
6 07AC~ 2017~ 12     434.    431.    2.97       0 2017-03~ 2017-03-01 13:00:00

tail(B[["6"]])

clvs  fecha hora    pml pml_ene pml_per pml_cng datetime date               
  <fct> <chr> <chr> <dbl>   <dbl>   <dbl>   <dbl> <fct>    <dttm>             
1 07XC~ 2017~ 18     430.    443.  -13.6        0 2017-03~ 2017-03-21 19:00:00
2 07XC~ 2017~ 24     418.    426.   -8.41       0 2017-03~ 2017-03-22 01:00:00
3 07XC~ 2017~ 6      442.    444.   -2.15       0 2017-03~ 2017-03-22 07:00:00
4 07XC~ 2017~ 12     451.    464.  -12.4        0 2017-03~ 2017-03-22 13:00:00
5 07XC~ 2017~ 18     450.    461.  -11.2        0 2017-03~ 2017-03-22 19:00:00
6 07XC~ 2017~ 24     452.    467.  -15          0 2017-03~ 2017-03-23 01:00:00

So I still can't balance my panel, nor use it with plm

How can I chunk it in 6 with consecutive dates?

Daniel Widdis
  • 8,424
  • 13
  • 41
  • 63
Lilia
  • 109
  • 8
  • Does your `split` work if you use `split` rather than `split.default`? – Allan Cameron Oct 09 '20 at 18:41
  • When I use `split` it doesn't give me the `Warning` message but each list has only 17 `clvs` and I need the 102 `clvs` in each list `B[[i]]` – Lilia Oct 09 '20 at 18:49
  • Hi Lila I see you have unaccepted my answer. Is there a problem with it? Let me know if I can help. – Allan Cameron Oct 11 '20 at 13:19
  • Thanks for asking @AllanCameron, can you check my edit in the original post? My data frame is cutting in a no sequential date. Each list has 3 non consecutive hours for each day. I want the first 92 hours of the period (2017-02-28 1:00 to 2017-03-03 23:00) for the 102 `clvs` in the first list, then the following 92 hours in the second and ordered in chronological way, is there a way? – Lilia Oct 11 '20 at 13:38

2 Answers2

1

You can do

B <- split(bb3, 1:6)

So that B is a list of data frames, each with 9367 rows:

sapply(B, nrow)
#>    0    1    2    3    4    5 
#> 9367 9367 9367 9367 9367 9367 

And each contains all clvs at roughly equal amounts:

sapply(B, function(x) table(x$clvs))
#>            1  2  3  4  5  6
#> 07ACU-115 92 92 92 92 92 91
#> 07APD-230 92 92 92 92 91 92
#> 07BJO-69  92 92 92 91 92 92
#> 07CAE-69  92 92 91 92 92 92
#> 07CES-69  92 91 92 92 92 92
#> 07CHQ-230 91 92 92 92 92 92
#> 07CIP-115 92 92 92 92 92 91
#> 07CME-115 92 92 92 92 91 92
#> 07CNA-69  92 92 92 91 92 92
#> 07CNT-230 92 92 91 92 92 92
#> 07CNZ-161 92 91 92 92 92 92
#> 07COE-115 91 92 92 92 92 92
#> 07CPC-161 92 92 92 92 92 91
#> 07CPD-230 92 92 92 92 91 92
#> 07CPT-230 92 92 92 91 92 92
#> 07CPU-161 92 92 91 92 92 92
#> 07CRO-161 92 91 92 92 92 92
#> 07CSC-230 91 92 92 92 92 92
#> 07CTY-161 92 92 92 92 92 91
#> 07CTY-230 92 92 92 92 91 92
#> 07DRA-69  92 92 92 91 92 92
#> 07EDA-115 92 92 91 92 92 92
#> 07END-69  92 91 92 92 92 92
#> 07ENT-115 91 92 92 92 92 92
#> 07EPC-115 92 92 92 92 92 91
#> 07FCO-69  92 92 92 92 91 92
#> 07FEV-161 92 92 92 91 92 92
#> 07FLO-69  92 92 91 92 92 92
#> 07FMO-115 92 91 92 92 92 92
#> 07FOR-69  91 92 92 92 92 92
#> 07FTA-69  92 92 92 92 92 91
#> 07GER-69  92 92 92 92 91 92
#> 07GLL-115 92 92 92 91 92 92
#> 07GOR-161 92 92 91 92 92 92
#> 07HGO-161 92 91 92 92 92 92
#> 07HMO-69  91 92 92 92 92 92
#> 07HUN-69  92 92 92 92 92 91
#> 07INA-69  92 92 92 92 91 92
#> 07IOP-69  92 92 92 91 92 92
#> 07JAT-115 92 92 91 92 92 92
#> 07JOV-230 92 91 92 92 92 92
#> 07JOY-115 91 92 92 92 92 92
#> 07KCH-161 92 92 92 92 92 91
#> 07LGO-69  92 92 92 92 91 92
#> 07LMS-69  92 92 92 91 92 92
#> 07LOS-115 92 92 91 92 92 92
#> 07MAB-69  92 91 92 92 92 92
#> 07MEP-230 91 92 92 92 92 92
#> 07MEP-69  92 92 92 92 92 91
#> 07MND-115 92 92 92 92 91 92
#> 07MOE-161 92 92 92 91 92 92
#> 07MSN-115 92 92 91 92 92 92
#> 07MXC-69  92 91 92 92 92 92
#> 07MXI-161 91 92 92 92 92 92
#> 07MXI-230 92 92 92 92 92 91
#> 07NEL-69  92 92 92 92 91 92
#> 07NZI-161 92 92 92 91 92 92
#> 07ONG-69  92 92 91 92 92 92
#> 07OZA-230 92 91 92 92 92 92
#> 07PAF-69  91 92 92 92 92 92
#> 07PAP-69  92 92 92 92 92 91
#> 07PID-161 92 92 92 92 91 92
#> 07PID-230 92 92 92 91 92 92
#> 07PJZ-115 92 92 91 92 92 92
#> 07PJZ-230 92 91 92 92 92 92
#> 07PKD-161 91 92 92 92 92 92
#> 07PNM-115 92 92 92 92 92 91
#> 07PNU-115 92 92 92 92 91 92
#> 07POP-115 92 92 92 91 92 92
#> 07RII-69  92 92 91 92 92 92
#> 07RIN-161 92 91 92 92 92 92
#> 07RSI-69  91 92 92 92 92 92
#> 07RUB-69  92 92 92 92 92 91
#> 07RUM-69  92 92 92 92 91 92
#> 07RZC-161 92 92 92 91 92 92
#> 07SAF-115 92 92 91 92 92 92
#> 07SAZ-115 92 91 92 92 92 92
#> 07SED-69  91 92 92 92 92 92
#> 07SHA-115 92 92 92 92 92 91
#> 07SIO-69  92 92 92 92 91 92
#> 07SIS-161 92 92 92 91 92 92
#> 07SMN-115 92 92 91 92 92 92
#> 07SQN-115 92 91 92 92 92 92
#> 07SRY-230 91 92 92 92 92 92
#> 07STB-230 92 92 92 92 92 91
#> 07SVE-115 92 92 92 92 91 92
#> 07TCD-69  92 92 92 91 92 92
#> 07TCT-69  92 92 91 92 92 92
#> 07TEE-69  92 91 92 92 92 92
#> 07TEK-230 91 92 92 92 92 92
#> 07TJD-69  92 92 92 92 92 91
#> 07TJI-69  92 92 92 92 91 92
#> 07TOY-230 92 92 92 91 92 92
#> 07TRI-115 92 92 91 92 92 92
#> 07UND-69  92 91 92 92 92 92
#> 07VAS-69  91 92 92 92 92 92
#> 07VIG-69  92 92 92 92 92 91
#> 07VLP-69  92 92 92 92 91 92
#> 07VPA-230 92 92 92 91 92 92
#> 07VPM-69  92 92 91 92 92 92
#> 07WIS-230 92 91 92 92 92 92
#> 07XCO-230 91 92 92 92 92 92

And each contains 17 instances of each datetime:

sapply(B, function(x) table(as.character(x$datetime)))
#>                      1  2  3  4  5  6
#> 2017-02-28 01:00:00 17 17 17 17 17 17
#> 2017-02-28 02:00:00 17 17 17 17 17 17
#> 2017-02-28 03:00:00 17 17 17 17 17 17
#> 2017-02-28 04:00:00 17 17 17 17 17 17
#> 2017-02-28 05:00:00 17 17 17 17 17 17
#> 2017-02-28 06:00:00 17 17 17 17 17 17
#> 2017-02-28 07:00:00 17 17 17 17 17 17
#> 2017-02-28 08:00:00 17 17 17 17 17 17
#> 2017-02-28 09:00:00 17 17 17 17 17 17
#> 2017-02-28 10:00:00 17 17 17 17 17 17
#> 2017-02-28 11:00:00 17 17 17 17 17 17
#> 2017-02-28 12:00:00 17 17 17 17 17 17
#> 2017-02-28 13:00:00 17 17 17 17 17 17
#> 2017-02-28 14:00:00 17 17 17 17 17 17
#> 2017-02-28 15:00:00 17 17 17 17 17 17
#> 2017-02-28 16:00:00 17 17 17 17 17 17
#> 2017-02-28 17:00:00 17 17 17 17 17 17
#> 2017-02-28 18:00:00 17 17 17 17 17 17
#> 2017-02-28 19:00:00 17 17 17 17 17 17
#> 2017-02-28 20:00:00 17 17 17 17 17 17
#> 2017-02-28 21:00:00 17 17 17 17 17 17
#> 2017-02-28 22:00:00 17 17 17 17 17 17
#> 2017-02-28 23:00:00 17 17 17 17 17 17
#> 2017-03-01 00:00:00 17 17 17 17 17 17
#> 2017-03-01 01:00:00 17 17 17 17 17 17
#> 2017-03-01 02:00:00 17 17 17 17 17 17
#> 2017-03-01 03:00:00 17 17 17 17 17 17
#> 2017-03-01 04:00:00 17 17 17 17 17 17
#> 2017-03-01 05:00:00 17 17 17 17 17 17
#> 2017-03-01 06:00:00 17 17 17 17 17 17
#> 2017-03-01 07:00:00 17 17 17 17 17 17
#> 2017-03-01 08:00:00 17 17 17 17 17 17
#> 2017-03-01 09:00:00 17 17 17 17 17 17
#> 2017-03-01 10:00:00 17 17 17 17 17 17
#> 2017-03-01 11:00:00 17 17 17 17 17 17
#> 2017-03-01 12:00:00 17 17 17 17 17 17
#> 2017-03-01 13:00:00 17 17 17 17 17 17
#> 2017-03-01 14:00:00 17 17 17 17 17 17
#> 2017-03-01 15:00:00 17 17 17 17 17 17
#> 2017-03-01 16:00:00 17 17 17 17 17 17
#> 2017-03-01 17:00:00 17 17 17 17 17 17
#> 2017-03-01 18:00:00 17 17 17 17 17 17
#> 2017-03-01 19:00:00 17 17 17 17 17 17
#> 2017-03-01 20:00:00 17 17 17 17 17 17
#> 2017-03-01 21:00:00 17 17 17 17 17 17
#> 2017-03-01 22:00:00 17 17 17 17 17 17
#> 2017-03-01 23:00:00 17 17 17 17 17 17
#> 2017-03-02 00:00:00 17 17 17 17 17 17
#> 2017-03-02 01:00:00 17 17 17 17 17 17
#> 2017-03-02 02:00:00 17 17 17 17 17 17
#> 2017-03-02 03:00:00 17 17 17 17 17 17
#> 2017-03-02 04:00:00 17 17 17 17 17 17
#> 2017-03-02 05:00:00 17 17 17 17 17 17
#> 2017-03-02 06:00:00 17 17 17 17 17 17
#> 2017-03-02 07:00:00 17 17 17 17 17 17
#> 2017-03-02 08:00:00 17 17 17 17 17 17
#> 2017-03-02 09:00:00 17 17 17 17 17 17
#> 2017-03-02 10:00:00 17 17 17 17 17 17
#> 2017-03-02 11:00:00 17 17 17 17 17 17
#> 2017-03-02 12:00:00 17 17 17 17 17 17
#> 2017-03-02 13:00:00 17 17 17 17 17 17
#> 2017-03-02 14:00:00 17 17 17 17 17 17
#> 2017-03-02 15:00:00 17 17 17 17 17 17
#> 2017-03-02 16:00:00 17 17 17 17 17 17
#> 2017-03-02 17:00:00 17 17 17 17 17 17
#> 2017-03-02 18:00:00 17 17 17 17 17 17
#> 2017-03-02 19:00:00 17 17 17 17 17 17
#> 2017-03-02 20:00:00 17 17 17 17 17 17
#> 2017-03-02 21:00:00 17 17 17 17 17 17
#> 2017-03-02 22:00:00 17 17 17 17 17 17
#> 2017-03-02 23:00:00 17 17 17 17 17 17
#> 2017-03-03 00:00:00 17 17 17 17 17 17
#> 2017-03-03 01:00:00 17 17 17 17 17 17
#> 2017-03-03 02:00:00 17 17 17 17 17 17
#> 2017-03-03 03:00:00 17 17 17 17 17 17
#> 2017-03-03 04:00:00 17 17 17 17 17 17
#> 2017-03-03 05:00:00 17 17 17 17 17 17
#> 2017-03-03 06:00:00 17 17 17 17 17 17
#> 2017-03-03 07:00:00 17 17 17 17 17 17
#> 2017-03-03 08:00:00 17 17 17 17 17 17
#> 2017-03-03 09:00:00 17 17 17 17 17 17
#> 2017-03-03 10:00:00 17 17 17 17 17 17
#> 2017-03-03 11:00:00 17 17 17 17 17 17
#> 2017-03-03 12:00:00 17 17 17 17 17 17
#> 2017-03-03 13:00:00 17 17 17 17 17 17
#> 2017-03-03 14:00:00 17 17 17 17 17 17
#> 2017-03-03 15:00:00 17 17 17 17 17 17
#> 2017-03-03 16:00:00 17 17 17 17 17 17
#> 2017-03-03 17:00:00 17 17 17 17 17 17
#> 2017-03-03 18:00:00 17 17 17 17 17 17
#> 2017-03-03 19:00:00 17 17 17 17 17 17
#> 2017-03-03 20:00:00 17 17 17 17 17 17
#> 2017-03-03 21:00:00 17 17 17 17 17 17
#> 2017-03-03 22:00:00 17 17 17 17 17 17
#> 2017-03-03 23:00:00 17 17 17 17 17 17
#> 2017-03-04 00:00:00 17 17 17 17 17 17
#> 2017-03-04 01:00:00 17 17 17 17 17 17
#> 2017-03-04 02:00:00 17 17 17 17 17 17
#> 2017-03-04 03:00:00 17 17 17 17 17 17
#> 2017-03-04 04:00:00 17 17 17 17 17 17
#> 2017-03-04 05:00:00 17 17 17 17 17 17
#> 2017-03-04 06:00:00 17 17 17 17 17 17
#> 2017-03-04 07:00:00 17 17 17 17 17 17
#> 2017-03-04 08:00:00 17 17 17 17 17 17
#> 2017-03-04 09:00:00 17 17 17 17 17 17
#> 2017-03-04 10:00:00 17 17 17 17 17 17
#> 2017-03-04 11:00:00 17 17 17 17 17 17
#> 2017-03-04 12:00:00 17 17 17 17 17 17
#> 2017-03-04 13:00:00 17 17 17 17 17 17
#> 2017-03-04 14:00:00 17 17 17 17 17 17
#> 2017-03-04 15:00:00 17 17 17 17 17 17
#> 2017-03-04 16:00:00 17 17 17 17 17 17
#> 2017-03-04 17:00:00 17 17 17 17 17 17
#> 2017-03-04 18:00:00 17 17 17 17 17 17
#> 2017-03-04 19:00:00 17 17 17 17 17 17
#> 2017-03-04 20:00:00 17 17 17 17 17 17
#> 2017-03-04 21:00:00 17 17 17 17 17 17
#> 2017-03-04 22:00:00 17 17 17 17 17 17
#> 2017-03-04 23:00:00 17 17 17 17 17 17
#> 2017-03-05 00:00:00 17 17 17 17 17 17
#> 2017-03-05 01:00:00 17 17 17 17 17 17
#> 2017-03-05 02:00:00 17 17 17 17 17 17
#> 2017-03-05 03:00:00 17 17 17 17 17 17
#> 2017-03-05 04:00:00 17 17 17 17 17 17
#> 2017-03-05 05:00:00 17 17 17 17 17 17
#> 2017-03-05 06:00:00 17 17 17 17 17 17
#> 2017-03-05 07:00:00 17 17 17 17 17 17
#> 2017-03-05 08:00:00 17 17 17 17 17 17
#> 2017-03-05 09:00:00 17 17 17 17 17 17
#> 2017-03-05 10:00:00 17 17 17 17 17 17
#> 2017-03-05 11:00:00 17 17 17 17 17 17
#> 2017-03-05 12:00:00 17 17 17 17 17 17
#> 2017-03-05 13:00:00 17 17 17 17 17 17
#> 2017-03-05 14:00:00 17 17 17 17 17 17
#> 2017-03-05 15:00:00 17 17 17 17 17 17
#> 2017-03-05 16:00:00 17 17 17 17 17 17
#> 2017-03-05 17:00:00 17 17 17 17 17 17
#> 2017-03-05 18:00:00 17 17 17 17 17 17
#> 2017-03-05 19:00:00 17 17 17 17 17 17
#> 2017-03-05 20:00:00 17 17 17 17 17 17
#> 2017-03-05 21:00:00 17 17 17 17 17 17
#> 2017-03-05 22:00:00 17 17 17 17 17 17
#> 2017-03-05 23:00:00 17 17 17 17 17 17
#> 2017-03-06 00:00:00 17 17 17 17 17 17
#> 2017-03-06 01:00:00 17 17 17 17 17 17
#> 2017-03-06 02:00:00 17 17 17 17 17 17
#> 2017-03-06 03:00:00 17 17 17 17 17 17
#> 2017-03-06 04:00:00 17 17 17 17 17 17
#> 2017-03-06 05:00:00 17 17 17 17 17 17
#> 2017-03-06 06:00:00 17 17 17 17 17 17
#> 2017-03-06 07:00:00 17 17 17 17 17 17
#> 2017-03-06 08:00:00 17 17 17 17 17 17
#> 2017-03-06 09:00:00 17 17 17 17 17 17
#> 2017-03-06 10:00:00 17 17 17 17 17 17
#> 2017-03-06 11:00:00 17 17 17 17 17 17
#> 2017-03-06 12:00:00 17 17 17 17 17 17
#> 2017-03-06 13:00:00 17 17 17 17 17 17
#> 2017-03-06 14:00:00 17 17 17 17 17 17
#> 2017-03-06 15:00:00 17 17 17 17 17 17
#> 2017-03-06 16:00:00 17 17 17 17 17 17
#> 2017-03-06 17:00:00 17 17 17 17 17 17
#> 2017-03-06 18:00:00 17 17 17 17 17 17
#> 2017-03-06 19:00:00 17 17 17 17 17 17
#> 2017-03-06 20:00:00 17 17 17 17 17 17
#> 2017-03-06 21:00:00 17 17 17 17 17 17
#> 2017-03-06 22:00:00 17 17 17 17 17 17
#>  [ reached getOption("max.print") -- omitted 385 rows ]

Allan Cameron
  • 147,086
  • 7
  • 49
  • 87
0

Although it seems like an easy task, it was very challenging splitting a balanced panel data into small balance panels.

@Allan Cameron's answer got it right in the length of the list but not the content. My panels were unbalanced, each clvs had 188 or 187 in the same chunk, and datetime was not consecutive. B[["1"]] had a sequence of 7:00 ,13:00 and 19:00 for one clvs for example. With unbalanced panels my loop with an splm function didn't work.

The solution was using gl.unequal :

library(DTK)
f<-gl.unequal(n=6,k=c(92,92,92,92,92,91))
B<-split(bb3,f)

This way I get balanced panels, for example B[["1"]]

head(B3[["1"]])
1 07AC~ 2017~ 1      686.    684.    2.19       0 2017-02~ 2017-02-28 02:00:00
2 07AC~ 2017~ 2      665.    664.    1.79       0 2017-02~ 2017-02-28 03:00:00
3 07AC~ 2017~ 3      393.    392.    1.11       0 2017-02~ 2017-02-28 04:00:00
4 07AC~ 2017~ 4      383.    381.    1.4        0 2017-02~ 2017-02-28 05:00:00
5 07AC~ 2017~ 5      383.    381.    1.41       0 2017-02~ 2017-02-28 06:00:00
6 07AC~ 2017~ 6      389.    388.    1.07       0 2017-02~ 2017-02-28 07:00:00

is.pbalanced(B[["1"]])
TRUE
Lilia
  • 109
  • 8