0

I have this:

df = sqlContext.sql(qry)
df2 = df.withColumn("ext", df.lvl * df.cnt)
ttl = df2.agg(F.sum("ext")).collect()

which returns this:

[Row(sum(ext)=1285430)]

How do devolve this down to just the discreet value 1285430 without it being a list Row(sum())?

I've researched and tried so many things I'm totally stymed.

thebluephantom
  • 16,458
  • 8
  • 40
  • 83
user5903880
  • 102
  • 8
  • Does this answer your question? [PySpark - Sum a column in dataframe and return results as int](https://stackoverflow.com/questions/47812526/pyspark-sum-a-column-in-dataframe-and-return-results-as-int) – DavidP May 08 '20 at 13:46
  • 1
    Also, you flagged this as databricks, but I don't think there's anything here that's specific to databricks and there is a lot more information available if you search for spark or pyspark instead of specifying databricks. – DavidP May 08 '20 at 13:48

3 Answers3

1

Access the first row and then get the first element as int.

df2.agg(F.sum("ext")).collect()(0).getInt(0)

Take a look at the documentation: Spark ScalaDoc.

1

No need for collect:

n = ...your transformation logic and agg... .first().getInt(0)
thebluephantom
  • 16,458
  • 8
  • 40
  • 83
0

Also can df.collect()[0][0] -or- df.collect()[0]['sum(ext)']

user5903880
  • 102
  • 8