0

I have a pyspark dataframe that I'd like to get the row count for. Once I get the row count, I'd like to add it to the top left corner of the data frame, as shown below.

I've tried creating the row first and doing a union on the empty row and the dataframe, but the empty row gets overwritten. I've tried adding it as a literal in a column, but having trouble nulling the remainder of the column as well as the row. Any advice?

dataframe:

col1 col2 col3 ... col13
string string timest ... int

for a few rows.

desired output:

row_count col1 col2 col3 ... col13
numofrows
string string timest ... int

So the row count would sit where an otherwise empty row and empty column meet.

BeRT2me
  • 12,699
  • 2
  • 13
  • 31

1 Answers1

0

Assuming df is your dataframe:

from pyspark.sql import functions as F

cnt = df.count()

columns_list = df.columns

df = df.withColumn("row_count", F.lit(None).cast("int"))
schema = df.schema

cnt_line = spark.createDataFrame([[None for x in columns_list] + [cnt]], schema=schema)

df.unionAll(cnt_line).show()
Steven
  • 14,048
  • 6
  • 38
  • 73