5

Here is the table:

+------+------+
| Name | Time |   
+------+------+
| A    |    1 |   
| A    |    2 |   
| A    |    3 |   
| A    |    4 |   
| B    |    5 |   
| B    |    6 |   
| A    |    7 |   
| B    |    8 |   
| B    |    9 |   
| B    |   10 |   
+------+------+

I want to write a query to get:

+-------+--------+-----+
| Name  | Start  | End |
+-------+--------+-----+
| A     |     1  |   4 |
| B     |     5  |   6 |
| A     |     7  |   7 |
| B     |     8  |  10 |
+-------+--------+-----+

Does anyone know how to do it?

ekad
  • 14,436
  • 26
  • 44
  • 46
GoGoGo
  • 51
  • 1
  • 3
    This is called islands problem. I don't know about `HIVE` but in `SQL Server` we solve it using window function. Here is the **[DEMO](http://www.sqlfiddle.com/#!3/9eecb7db59d16c80417c72d1/6243)** – Pரதீப் Apr 02 '16 at 07:57

1 Answers1

0

This is not the most efficient way, but it this works.

SELECT name, min(time) AS start,max(time) As end 
FROM (
   SELECT name,time, time- DENSE_RANK()  OVER (partition by name ORDER BY 
   time) AS diff
   FROM foo
) t 
GROUP BY name,diff;

I would suggest try the following query and build a GenericUDF to identify the gaps, much more easier :)

SELECT name, sort_array(collect_list(time)) FROM foo GROUP BY name;
an33sh
  • 1,089
  • 16
  • 27
hlagos
  • 7,690
  • 3
  • 23
  • 41