Dataframe Row(sum(fld)) to a discrete value

Question

I have this:

df = sqlContext.sql(qry)
df2 = df.withColumn("ext", df.lvl * df.cnt)
ttl = df2.agg(F.sum("ext")).collect()

which returns this:

[Row(sum(ext)=1285430)]

How do devolve this down to just the discreet value 1285430 without it being a list Row(sum())?

I've researched and tried so many things I'm totally stymed.

Does this answer your question? [PySpark - Sum a column in dataframe and return results as int](https://stackoverflow.com/questions/47812526/pyspark-sum-a-column-in-dataframe-and-return-results-as-int) — DavidP, May 08 '20 at 13:46
Also, you flagged this as databricks, but I don't think there's anything here that's specific to databricks and there is a lot more information available if you search for spark or pyspark instead of specifying databricks. — DavidP, May 08 '20 at 13:48

score 1 · Answer 1 · answered May 10 '20 at 03:15

1

Access the first row and then get the first element as int.

df2.agg(F.sum("ext")).collect()(0).getInt(0)

Take a look at the documentation: Spark ScalaDoc.

answered May 10 '20 at 03:15

thebluephantom · Accepted Answer · 2020-05-10T15:20:37.153

1

No need for collect:

n = ...your transformation logic and agg... .first().getInt(0)

edited May 10 '20 at 15:20

answered May 10 '20 at 10:00

thebluephantom

score 0 · Answer 3 · answered May 11 '21 at 21:09

0

Also can df.collect()[0][0] -or- df.collect()[0]['sum(ext)']

answered May 11 '21 at 21:09

user5903880

3 Answers3