1

Here my main purpose in the code is to get all the columns from the dataframe after the group by condition but after the group by condition only the selected columns are coming. so I tried using Join condition to merge it the group by data frame with the original dataframe. But it is throwing error.

code

from pyspark.sql.types import IntegerType
from pyspark.sql.types import *
from pyspark.sql.functions import *


retail_sales_transaction = glueContext.create_dynamic_frame.from_catalog(
database="conform_main_mobconv",
table_name="retail_sales_transaction"
    
    
df_retail_sales_transaction=df_retail_sales_transaction
.select("transaction_id","transaction_key","transaction_timestamp","personnel_key","retail_site_id","personnel_role","country_code","business_week")
    
df_retail_sales_transaction = df_retail_sales_transaction.groupBy("business_week", "location_id", "country_code", "personnel_key")

df_retail_sales_transaction =df_retail_sales_transaction.join(df_retail_sales_transaction,['business_week'],'outer')

Error I'm getting is :

df_retail_sales_transaction = df_retail_sales_transaction.join(df_retail_sales_transaction,['business_week'],'outer') AttributeError: 'GroupedData' object has no attribute 'join'

Sonam Garg
  • 13
  • 5
  • See this https://stackoverflow.com/questions/51820994/groupeddata-object-has-no-attribute-show-when-doing-doing-pivot-in-spark-dat as similar issue. You can't use `groupby` without using another aggregate function after it. To achieve what you want, try `df_retail_sales_transaction = df_retail_sales_transaction.select("business_week", "location_id", "country_code", "personnel_key").distinct()` – Assaf Segev Sep 05 '21 at 11:53
  • Still getting the error : 'GroupedData' object has no attribute 'distinct' – Sonam Garg Sep 06 '21 at 03:49

0 Answers0