0

I have a table which have primarily 3 columns I am interested in: old_item_id, new_item_id and date_of_change. I want to traverse the sequence and want to find out latest id of some item ids. Example data below:

old_item_id new_item_id date_of_change
     1            2        2015-01-01
     2            5        2015-01-02
     5            12       2015-10-01
     4            5        2015-01-02
     6            7        2015-02-02

So if I want the latest ids of say item 1, 4, 6 and 8; in such case I should get output of:

item_id    latest_item_id 
     1            12               
     4            12        
     6            7
     8            8

as 1 and 4 could be traced to 12. item id 6 got changed to 7 and item id 8 was never changed.

Currently I am doing this by repeatedly hitting the table in a while loop from another script. However I am looking for query to do it in single database hit.

abhiieor
  • 3,132
  • 4
  • 30
  • 47
  • 1
    So you are expecting an answer in R/Python/SQL ? – Ronak Shah Oct 18 '18 at 09:30
  • 1
    Which database? If you don't have one that supports hierarchical queries, this one's gonna be a bit of ballache with a lot of LEFT joins.... – Caius Jard Oct 18 '18 at 09:31
  • I am working with exasol which support hierarchical joins. Anyway in spirit of SO I am looking for a sample solution query and then I will adapt it for my database. – abhiieor Oct 18 '18 at 09:33

1 Answers1

2

This can be done with package igraph, but it's a graph theory solution, not a database one.

library(igraph)

g <- graph_from_data_frame(dat)
res <- lapply(V(g), function(i) dfs(g, i, unreachable = FALSE)$order)
res <- lapply(res, function(e) e[!is.na(e)])
sapply(res, function(e) names(e)[length(e)])
#   1    2    5    4    6   12    7 
#"12" "12" "12" "12"  "7" "12"  "7"

Note that you can coerce the final result to class integer if needed.

Data.

dat <-
structure(list(old_item_id = c(1L, 2L, 5L, 4L, 6L), new_item_id = c(2L, 
5L, 12L, 5L, 7L), date_of_change = structure(c(16436, 16437, 
16709, 16437, 16468), class = "Date")), row.names = c(NA, -5L
), class = "data.frame")
Rui Barradas
  • 70,273
  • 8
  • 34
  • 66
  • thanks for your answer but unfortunately it might not be practical. For this to work you need to pull entire table into R and then only it could be used. pulling complete table into RAM is not practical. – abhiieor Oct 18 '18 at 10:02