2

I am doing SUM on multiple column, those columns want to include in the SELECT list.

Below are my work:

val df=df0
                             .join(df1, df1("Col1")<=>df0("Col1"))
                             .filter((df1("Colum")==="00")
                             .groupBy(df1("Col1"),df1("Col1"))
                             .agg(sum(df1("Amount").alias("Amount1")),sum(df1("Amount2").alias("Amount2")))
                             .select(
                                         df1("Col1").alias("co11"),
                                         df1("Col2").alias("Col2"),
                                         Amount1, Amount2 --getting error here
                                          )

How to include the alias column in the SELECT list?

sks
  • 169
  • 4
  • 15

1 Answers1

1

Use col function or '

import org.apache.spark.sql.functions._
import spark.implicits._
val df=df0
    .join(df1, df1("Col1")<=>df0("Col1"))
    .filter((df1("Colum")==="00")
    .groupBy(df1("Col1"),df1("Col1"))
    .agg(sum(df1("Amount")).alias("Amount1"),sum(df1("Amount2")).alias("Amount2"))
    .select(
        df1("Col1").alias("co11"),
        df1("Col2").alias("Col2"),
        col("Amount1"), 'Amount2 
    )
T. Gawęda
  • 15,706
  • 4
  • 46
  • 61
  • I tried but getting error "User class threw exception: org.apache.spark.sql.AnalysisException: cannot resolve '`Amount1`' given input columns" – sks Mar 13 '17 at 14:04
  • @sks - I've corrected my answer. Order of alias was wrong, it must be done on sum, not on source column – T. Gawęda Mar 13 '17 at 14:07
  • I am using Alias column not source column, but still the same error. Cannot resolve Amount1. – sks Mar 13 '17 at 14:26
  • Yes,I copied your answer. – sks Mar 13 '17 at 14:35
  • @sks Strange, I've tested it and for me it works. Could you please post what's in the cut part of of message? There should be a list of visible columns – T. Gawęda Mar 13 '17 at 14:44
  • Sorry for the confusion, it is working fine, problem with brackets. – sks Mar 13 '17 at 15:04