How to output pyspark aggregation function results as a string?

Asked Mar 26 '21 at 21:57

Active Mar 26 '21 at 21:57

Viewed 61 times

I am doing some data quality check using PySpark. What I want to achieve is output all results to a txt file. The basic code logic is as follows:

def data_quality_check(df):
    output = ''
    output += func1(df) # func1 and func2 return check results as strings
    output += func2(df)
    return output

The challenge I encountered is how to output aggregation results from pySpark dataframe. For example, I want to output a groupBy/count result using the following code:

output += 'Counts group by device type is : ' + str(df.groupBy('DEVICE_TYPE').count().show()) + '\n'

The output below is not what I expected:

Counts group by device type is : None

Thanks for any suggestions in advance!

asked Mar 26 '21 at 21:57

CathyQian

1,081
15
30

Another way to ask this question is how to save xx.show() results to text file? Thanks! – CathyQian Mar 26 '21 at 22:01
df.groupBy('DEVICE_TYPE').count().write.format('csv').save('test', mode="overwrite") should write into a file in csv format. Are you facing any issue with that? – Hussain Bohra Mar 26 '21 at 22:07
@HussainBohra Yes I can do that. Is there anyway that I can write multiple such strings into the same csv or txt file as the code runs? Thanks again! – CathyQian Mar 26 '21 at 23:00
Can you provide an example of your input data and output file you are looking for? – Hussain Bohra Mar 27 '21 at 01:04
2

Does this answer your question? [Saving result of DataFrame show() to string in pyspark](https://stackoverflow.com/questions/55653609/saving-result-of-dataframe-show-to-string-in-pyspark) – mck Mar 27 '21 at 07:01
@mck Yes, that answered my question. Thank you all! – CathyQian Mar 30 '21 at 17:20

How to output pyspark aggregation function results as a string?

0 Answers0