0

I wonder what is the best way to convert the following source data set to target data set:
Source data set:

ID | Type | Count
111|  A   | 1
111|  B   | 2
111|  C   | 1
222|  A   | 3
333|  C   | 2

Target data set:

ID  |#ofTypeA | #ofTypeB | #ofTypeC
111 |   1     |    2     |    1
222 |   3     |    0     |    0
333 |   0     |    0     |    2

The source data set was created from a raw data set using GroupBy(ID, Type).count() Right now the only way I can think of is join the raw data-set with the source data set multiple times, each time create a new column for counting of each type. But it will resolved in 3 joins. I wonder if there is any an better way to do it

The raw data set is like this:

ID | Type
111| A
111| B
111| B
.......
zero323
  • 322,348
  • 103
  • 959
  • 935
Alex
  • 57
  • 1
  • 5

1 Answers1

3

You can use pivot to with groupby to get the result you want

  val df = Seq(
    (111, "A", 1),
    (111, "B", 2),
    (111, "C", 1),
    (222, "A", 3),
    (333, "C", 2)
  ).toDF("Id", "Type", "Count")

  df.groupBy("Id")
    .pivot("Type")
    .agg(sum($"Count"))
    .na.fill(0)
    .show(false)

You can use na.fill(0) to replace the null with 0

Hope this helps!

koiralo
  • 22,594
  • 6
  • 51
  • 72