Databricks: SQL Pivot does not work - but Python does

Question

I am trying to pivot a SQL table in Databricks on the FeatureName column using the FeatureDescription column as the value.

But somehow the SQL query that I have resolves to null values being populated across the table.

Here is the original table:

AddressKey	FeatureName	FeatureDescription
11101	Groups	AG WAREHOUSE
11101	National Groups	AG WAREHOUSE
5528	National Groups	AIRR
5528	Groups	AUSTRALIAN INDEPENDENT RURAL RESELLERS

Here is the SQL query:

SELECT *
FROM
  (SELECT AddressKey, FeatureName, FeatureDescription
   FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
  max(FeatureDescription)
  FOR FeatureName IN (
    'Carcasspricecategory' AS Carcasspricecategory,
    'Channel' AS Channel,
    'Class' AS Class,
    'Contractor' AS Contractor,
    'DebtorCategory' AS DebtorCategory,
    'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
    'FarmingOperations' AS FarmingOperations,
    'FarmingProductionType' AS FarmingProductionType,
    'FeatureName' AS FeatureName,
    'FMInternalRegions' AS FMInternalRegions,
    'Groups' AS Groups,
    'MillITSupplierType' AS MillITSupplierType,
    'NationalGroups' AS NationalGroups,
    'PaymentMethod' AS PaymentMethod,
    'PurchaseGroup' AS PurchaseGroup,
    'RegisteredforGST' AS RegisteredforGST,
    'SalesGroup' AS SalesGroup,
    'SiteBaxters' AS SiteBaxters,
    'SiteCorowaMeats' AS SiteCorowaMeats,
    'SiteDVP' AS SiteDVP,
    'SiteFeedmill' AS SiteFeedmill,
    'SiteNSWWholesale' AS SiteNSWWholesale,
    'Type' AS Type
  )
);

The result:

AddressKey	Carcass price category	Channel	Class	Contractor	Debtor Category	Domestic Cust Trading as Export	FM Internal Regions	Farming Operations	Farming Production Type	Payment Method	Purchase Group	Registered for GST	Sales Group	Site Baxters	Site Corowa Meats	Site DVP	Site Feedmill	Site NSW Wholesale	Type
9	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null
24	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null
1000	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null	null

All null values for all columns across the table.

I tried using any, first, max, etc. for the aggregate function. Same result.

But it works why I do the same thing using Python:

%python
import pandas as pd

df = spark.sql('SELECT * FROM  radp_dev_cat.rdp_svr_sales.vw_customerfeatures')

df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()

df_pivot

AddressKey	Carcass price category	Channel	Class	Contractor	Debtor Category	Domestic Cust Trading as Export	FM Internal Regions	Farming Operations	Farming Production Type	Payment Method	Purchase Group	Registered for GST	Sales Group	Site Baxters	Site Corowa Meats	Site DVP	Site Feedmill	Site NSW Wholesale	Type
9	None	MEAT WHOLESALE	MEAT	None	None	None	None	None	None	EFT	GRAIN GROWERS	No	None	No	No	No	Yes	No	BUTCHER
24	None	RETAIL	MEAT	None	None	None	None	None	None	EFT	None	None	None	None	None	None	None	None	BUTCHER

Why does this happen? Is something wrong with my SQL query?

Thanks.

please read https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question and edit your question — nbk, Jun 22 '23 at 06:52
have you checked for spaces in the Text or add a TRIM to FeatureName ? — nbk, Jun 22 '23 at 06:57
The images are unreadable, and data must be also in text form — nbk, Jun 23 '23 at 01:55
Ich no the problem is that you don't need a trim but an upper, the text in the first image is all upper character but the text you compare is only the first character big so none is ever going to be equal the the pivot will find a match — nbk, Jun 23 '23 at 02:00
@nbk - I have edited the question to include markdown tables instead of images. Thanks for the heads up. — Ronak Vachhani, Jun 23 '23 at 02:03
All the table and names are not equal for example you have groups in the first table and in the query but not in the result. Also try to find with a simple select of the texted in the IN of the pivot and see if there is a match because the database thinks that you have none — nbk, Jun 23 '23 at 02:09

score 0 · Accepted Answer · answered Jun 23 '23 at 02:24

The problem was the underlying view that had the FeatureName, FeatureDescription data. The columns had trailing whitespace in the records so none of the PIVOT columns matched.

FeatureName	len(FeatureName)	FeatureDescription	len(FeatureDescription)
Debtor Category	50	TRADE	70
Type	50	DISTRIBUTOR	70
Class	50	LOCAL	70

I trimmed the underlying view and it worked.

Thanks for the insight @nbk.

Databricks: SQL Pivot does not work - but Python does

1 Answers1