I am doing a filter and count on the pyspark dataframe col..how to store the result in another col in the same dataframe?

Question

I want to store the result of below line in a col in the same df dataframe.

df.filter(F.abs(df.Px)< 0.005).count()

How can I do that?

have you already seen this [stackoverflow question](http://stackoverflow.com/questions/33681487/how-do-i-add-a-new-column-to-a-spark-dataframe-using-pyspark)? I think you can find your answer there. — titiro89, May 12 '17 at 13:29
Thanks titiro89 for your response but when i use the withColumn df=df.withColumn("new",df.filter(F.abs(df.Px)< 0.005).count()) i am getting the follwing error: col should be Column because the value returned is int. please provide your inputs on this — LKA, May 12 '17 at 14:10
Could you provide a brief example of what your df and your F are? — titiro89, May 12 '17 at 14:20
df is my pyspark dataframe and import pyspark.sql.functions as F — LKA, May 12 '17 at 14:41
Ok, but if you want to be helped by the other users, you have to provide more code than that, you have to tell which is your starting point, what you want to get, also specify the meaning of your variables, as you can see at this link: [How to ask](https://stackoverflow.com/help/how-to-ask) — titiro89, May 12 '17 at 15:06

score 0 · Answer 1 · answered May 12 '17 at 20:44

The answer is you can do that using union. However, it's not a good practice to append the row below particular column because you can also have multiple columns and that will give you only one extra row with new count value.

I give an example snippet below.

from pyspark.sql import Row

df = spark.createDataFrame(pd.DataFrame([0.01, 0.003, 0.004, 0.005, 0.02], 
                                        columns=['Px']))
n_px = df.filter(func.abs(df['Px']) < 0.005).count() # count
df_count = spark.sparkContext.parallelize([Row(**{'Px': n_px})]).toDF() # new dataframe for count
df_union = df.union(df_count)

+-----+
|   Px|
+-----+
| 0.01|
|0.003|
|0.004|
|0.005|
| 0.02|
|  2.0|
+-----+

I am doing a filter and count on the pyspark dataframe col..how to store the result in another col in the same dataframe?

1 Answers1