I need to write a query in Hive or define a function that needs to do the followings:
The dataset:
Student || Time || ComuputerPool
-------------------------------------
A || 9:15AM || Pool1.Machine2
-------------------------------------
A || 9:45AM || Pool1.Machine7
-------------------------------------
A || 10:15AM|| Pool1.Machine9
-------------------------------------
A || 11:00AM|| Pool2.Machine2
-------------------------------------
A || 12:05 || Pool2.Machine3
-------------------------------------
A || 12:40 || Pool3.Machine5
-------------------------------------
A || 13:10 || Pool1.Machine3
-------------------------------------
A || 13:50 || Pool1.Machine10
-------------------------------------
B ..........................
so now the query should find out how long a particular student has spent in a particular computer pool by calculating the difference of when he first used a machine in a pool and when he first start using a machine in another pool. So this example the time he spent would be the difference of : 11:00AM - 9:15AM = 1Hour45Mins
My question here is how am I going to mark the first use in one store time value and use it later when I find the next pool data.