I am trying to pivot a SQL table in Databricks on the FeatureName
column using the FeatureDescription
column as the value.
But somehow the SQL query that I have resolves to null values being populated across the table.
Here is the original table:
AddressKey | FeatureName | FeatureDescription |
---|---|---|
11101 | Groups | AG WAREHOUSE |
11101 | National Groups | AG WAREHOUSE |
5528 | National Groups | AIRR |
5528 | Groups | AUSTRALIAN INDEPENDENT RURAL RESELLERS |
Here is the SQL query:
SELECT *
FROM
(SELECT AddressKey, FeatureName, FeatureDescription
FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
max(FeatureDescription)
FOR FeatureName IN (
'Carcasspricecategory' AS Carcasspricecategory,
'Channel' AS Channel,
'Class' AS Class,
'Contractor' AS Contractor,
'DebtorCategory' AS DebtorCategory,
'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
'FarmingOperations' AS FarmingOperations,
'FarmingProductionType' AS FarmingProductionType,
'FeatureName' AS FeatureName,
'FMInternalRegions' AS FMInternalRegions,
'Groups' AS Groups,
'MillITSupplierType' AS MillITSupplierType,
'NationalGroups' AS NationalGroups,
'PaymentMethod' AS PaymentMethod,
'PurchaseGroup' AS PurchaseGroup,
'RegisteredforGST' AS RegisteredforGST,
'SalesGroup' AS SalesGroup,
'SiteBaxters' AS SiteBaxters,
'SiteCorowaMeats' AS SiteCorowaMeats,
'SiteDVP' AS SiteDVP,
'SiteFeedmill' AS SiteFeedmill,
'SiteNSWWholesale' AS SiteNSWWholesale,
'Type' AS Type
)
);
The result:
AddressKey | Carcass price category | Channel | Class | Contractor | Debtor Category | Domestic Cust Trading as Export | FM Internal Regions | Farming Operations | Farming Production Type | Payment Method | Purchase Group | Registered for GST | Sales Group | Site Baxters | Site Corowa Meats | Site DVP | Site Feedmill | Site NSW Wholesale | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
24 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
1000 | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null | null |
All null values for all columns across the table.
I tried using any
, first
, max
, etc. for the aggregate function. Same result.
But it works why I do the same thing using Python:
%python
import pandas as pd
df = spark.sql('SELECT * FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures')
df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()
df_pivot
AddressKey | Carcass price category | Channel | Class | Contractor | Debtor Category | Domestic Cust Trading as Export | FM Internal Regions | Farming Operations | Farming Production Type | Payment Method | Purchase Group | Registered for GST | Sales Group | Site Baxters | Site Corowa Meats | Site DVP | Site Feedmill | Site NSW Wholesale | Type |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
9 | None | MEAT WHOLESALE | MEAT | None | None | None | None | None | None | EFT | GRAIN GROWERS | No | None | No | No | No | Yes | No | BUTCHER |
24 | None | RETAIL | MEAT | None | None | None | None | None | None | EFT | None | None | None | None | None | None | None | None | BUTCHER |
Why does this happen? Is something wrong with my SQL query?
Thanks.