1

I start with a "wide" data set of GIS data of Origin Destinations (OD), that I want to rearange into a longer data set of one row for each point. I already managed to melt and then dcast to rearange the first layer of the ODs, but I am strungling with the next step

My MWE is on example of a trip from Brussels to London in three legs. But there are many more IDs and they can have more or less legs.

library(data.table)

Start = c("Brussels","Lille","Dover")
Start_lon <- c(4.3570964,3.075685,1.3047866)
Start_lat <- c(50.845504, 50.6390876,51.12623)
Border = c("Baisieux", "Frethun","London")
Border_lon = c(3.075685,1.811221, -0.1244124)
Border_lat <- c(50.61848,  50.90148, 51.53165)

df <- data.table(ID = 1,
                 Sub_ID = 1:3,
                 Start = Start, 
                 Start_lon = Start_lon,
                 Start_lat,
                 Border = Border,
                 Border_lon = Border_lon,
                 Border_lat = Border_lat)

So i have one ID with three Sub_IDs each with two points and finally I would expect 6 rows for each ID for each station. I manage to expand the data from 3 to 6 rows so that one row is the start/origin point an the other the border/destination point indicated by the Type variable.

df_long <- melt(df, id.vars = c("ID", "Sub_ID", "Start", "Border"))
df_long <- df_long[, c("Type", "Coordinates") := tstrsplit(variable, "_", fixed=TRUE)]
df_long <- dcast(df_long, ID+Sub_ID+Start+Border+Type~Coordinates, value.var="value")

Now I do not know how to proceed to get into the structure of ID==1 and six Sub_Ids so that I would get 6 Rows with stations for
Brussels- Baisieux - Lill- Frethun - Dover- London

I was hoping to get something like this

df_goal <- data.table(ID = 1,
                      Sub_ID = 1:6,
                      Stop = c("Brussels","Baisieux","Lille", "Frethun", "Dover", "London"),
                      lat = NA,
                      lon = NA)

Maybe still with the information if the stop is a "Start" or a "Border"

Max M
  • 806
  • 14
  • 29

1 Answers1

1

Let me leave what I tried with tidyverse approach. I split the data by ID. (You have only one ID, but you may have multiple IDs. So I decided to go this way.) For each list component, I chose columns between Start and Border_lat, transposed, unlisted it, then created a matrix. I filled in this matrix with three columns (city, lon, and lat), and converted the matrix to a data frame. For each ID group, I added a new column called Type. I repeated Start and Border here. Finally, I changed column names and converted lon and lat to numeric. I am sure there are concise ways to handle this.

library(dplyr)
library(purrr)

map_dfr(.x = split(df, f = df$ID),
        .f = function(x){dplyr::select(x, Start:Border_lat) %>% 
                         t %>% 
                         unlist %>% 
                         matrix(ncol = 3, byrow = TRUE) %>% 
                         as.data.frame(stringsAsFactors = FALSE)},
        .id = "ID") %>% 
group_by(ID) %>% 
mutate(Type = rep(c("Start", "Border"), times = n()/2)) %>% 
rename(stop = "V1", lon = "V2", lat = "V3") %>% 
mutate_at(vars(lon:lat),
          .funs = list(~as.numeric(.)))

#   ID    stop        lon   lat Type  
#   <chr> <chr>     <dbl> <dbl> <chr> 
# 1 1     Brussels  4.36   50.8 Start 
# 2 1     Baisieux  3.08   50.6 Border
# 3 1     Lille     3.08   50.6 Start 
# 4 1     Frethun   1.81   50.9 Border
# 5 1     Dover     1.30   51.1 Start 
# 6 1     London   -0.124  51.5 Border
# 7 2     Brussels  4.36   50.8 Start 
# 8 2     Baisieux  3.08   50.6 Border
# 9 2     Lille     3.08   50.6 Start 
#10 2     Frethun   1.81   50.9 Border
#11 2     Dover     1.30   51.1 Start 
#12 2     London   -0.124  51.5 Border

Another option

This is an idea with data.table. Given what you said, the number of columns is 8 and the number of rows vary for each ID. Given this, I came up with the followign way.

df[, .(Stop = c(Start, Border),
       Type = c("Start", "Border"),
       lon = c(Start_lon, Border_lon),
       lat = c(Start_lat, Border_lat)),
   by = .(ID, Sub_ID)]

#    ID Sub_ID     Stop   Type        lon      lat
# 1:  1      1 Brussels  Start  4.3570964 50.84550
# 2:  1      1 Baisieux Border  3.0756850 50.61848
# 3:  1      2    Lille  Start  3.0756850 50.63909
# 4:  1      2  Frethun Border  1.8112210 50.90148
# 5:  1      3    Dover  Start  1.3047866 51.12623
# 6:  1      3   London Border -0.1244124 51.53165
# 7:  2      1 Brussels  Start  4.3570964 50.84550
# 8:  2      1 Baisieux Border  3.0756850 50.61848
# 9:  2      2    Lille  Start  3.0756850 50.63909
#10:  2      2  Frethun Border  1.8112210 50.90148
#11:  2      3    Dover  Start  1.3047866 51.12623
#12:  2      3   London Border -0.1244124 51.53165

DATA

df <- structure(list(ID = c(1, 1, 1, 2, 2, 2), Sub_ID = c(1L, 2L, 3L, 
1L, 2L, 3L), Start = c("Brussels", "Lille", "Dover", "Brussels", 
"Lille", "Dover"), Start_lon = c(4.3570964, 3.075685, 1.3047866, 
4.3570964, 3.075685, 1.3047866), Start_lat = c(50.845504, 50.6390876, 
51.12623, 50.845504, 50.6390876, 51.12623), Border = c("Baisieux", 
"Frethun", "London", "Baisieux", "Frethun", "London"), Border_lon = c(3.075685, 
1.811221, -0.1244124, 3.075685, 1.811221, -0.1244124), Border_lat = c(50.61848, 
50.90148, 51.53165, 50.61848, 50.90148, 51.53165)), row.names = c(NA, 
-6L), class = c("data.table", "data.frame"))
jazzurro
  • 23,179
  • 35
  • 66
  • 76
  • I maybe should have mentioned that the ODs have different length. There are ODs with three legs / Sub_IDs, only two, or more than three. Would this work then as well? The Matrix command uses a ´3´. – Max M Jan 10 '20 at 14:30
  • @MaxM What is `OD`? – jazzurro Jan 10 '20 at 14:37
  • Sorry Origin Destination = OD @jazzurro – Max M Jan 10 '20 at 14:42
  • @MaxM Thank you. Another question. Each row represents one "leg". Is that right? – jazzurro Jan 10 '20 at 14:45
  • Yes in the original example we start with 3 legs for the trip from Brussels to London. One leg per row. THere could be more or less legs – Max M Jan 10 '20 at 14:50
  • @MaxM So the minimum number of legs is 1 (non stop trip). In that case, there is just one row with six columns (Start, Start_lon, Start_lat, Border, Border_lon, and Border_lat) as well as ID and Sub_ID. Right? That 3 in matrix, I think, should still work. I am creating a row with stop or border + lon + lat. – jazzurro Jan 10 '20 at 15:04
  • @MaxM I threw another idea into my post. This one is with data.table. I guess this is the straightforward way to get the expected result. – jazzurro Jan 10 '20 at 16:29
  • Thanks for the `data.table` answer. In hindsight it seems pretty simple, but it alwayys does :) – Max M Jan 13 '20 at 16:06
  • @MaxM You are welcome. Once I realized your data structure, I saw the solution in my mind. Till then, I was blind too. :) – jazzurro Jan 14 '20 at 02:52