1

I am trying to pivot a SQL table in Databricks on the FeatureName column using the FeatureDescription column as the value.

But somehow the SQL query that I have resolves to null values being populated across the table.

Here is the original table:

AddressKey FeatureName FeatureDescription
11101 Groups AG WAREHOUSE
11101 National Groups AG WAREHOUSE
5528 National Groups AIRR
5528 Groups AUSTRALIAN INDEPENDENT RURAL RESELLERS

Here is the SQL query:

SELECT *
FROM
  (SELECT AddressKey, FeatureName, FeatureDescription
   FROM radp_dev_cat.rdp_svr_sales.vw_customerfeatures) src
PIVOT (
  max(FeatureDescription)
  FOR FeatureName IN (
    'Carcasspricecategory' AS Carcasspricecategory,
    'Channel' AS Channel,
    'Class' AS Class,
    'Contractor' AS Contractor,
    'DebtorCategory' AS DebtorCategory,
    'DomesticCustTradingasExport' AS DomesticCustTradingasExport,
    'FarmingOperations' AS FarmingOperations,
    'FarmingProductionType' AS FarmingProductionType,
    'FeatureName' AS FeatureName,
    'FMInternalRegions' AS FMInternalRegions,
    'Groups' AS Groups,
    'MillITSupplierType' AS MillITSupplierType,
    'NationalGroups' AS NationalGroups,
    'PaymentMethod' AS PaymentMethod,
    'PurchaseGroup' AS PurchaseGroup,
    'RegisteredforGST' AS RegisteredforGST,
    'SalesGroup' AS SalesGroup,
    'SiteBaxters' AS SiteBaxters,
    'SiteCorowaMeats' AS SiteCorowaMeats,
    'SiteDVP' AS SiteDVP,
    'SiteFeedmill' AS SiteFeedmill,
    'SiteNSWWholesale' AS SiteNSWWholesale,
    'Type' AS Type
  )
);

The result:

AddressKey Carcass price category Channel Class Contractor Debtor Category Domestic Cust Trading as Export FM Internal Regions Farming Operations Farming Production Type Payment Method Purchase Group Registered for GST Sales Group Site Baxters Site Corowa Meats Site DVP Site Feedmill Site NSW Wholesale Type
9 null null null null null null null null null null null null null null null null null null null
24 null null null null null null null null null null null null null null null null null null null
1000 null null null null null null null null null null null null null null null null null null null

All null values for all columns across the table.

I tried using any, first, max, etc. for the aggregate function. Same result.

But it works why I do the same thing using Python:

%python
import pandas as pd

df = spark.sql('SELECT * FROM  radp_dev_cat.rdp_svr_sales.vw_customerfeatures')

df_pivot = df.groupBy('AddressKey').pivot('FeatureName').agg({"FeatureDescription": "first"}).toPandas()

df_pivot
AddressKey Carcass price category Channel Class Contractor Debtor Category Domestic Cust Trading as Export FM Internal Regions Farming Operations Farming Production Type Payment Method Purchase Group Registered for GST Sales Group Site Baxters Site Corowa Meats Site DVP Site Feedmill Site NSW Wholesale Type
9 None MEAT WHOLESALE MEAT None None None None None None EFT GRAIN GROWERS No None No No No Yes No BUTCHER
24 None RETAIL MEAT None None None None None None EFT None None None None None None None None BUTCHER

Why does this happen? Is something wrong with my SQL query?

Thanks.

Ronak Vachhani
  • 214
  • 2
  • 14
  • please read https://meta.stackoverflow.com/questions/285551/why-not-upload-images-of-code-on-so-when-asking-a-question and edit your question – nbk Jun 22 '23 at 06:52
  • 1
    have you checked for spaces in the Text or add a TRIM to FeatureName ? – nbk Jun 22 '23 at 06:57
  • The images are unreadable, and data must be also in text form – nbk Jun 23 '23 at 01:55
  • Ich no the problem is that you don't need a trim but an upper, the text in the first image is all upper character but the text you compare is only the first character big so none is ever going to be equal the the pivot will find a match – nbk Jun 23 '23 at 02:00
  • @nbk - I have edited the question to include markdown tables instead of images. Thanks for the heads up. – Ronak Vachhani Jun 23 '23 at 02:03
  • All the table and names are not equal for example you have groups in the first table and in the query but not in the result. Also try to find with a simple select of the texted in the IN of the pivot and see if there is a match because the database thinks that you have none – nbk Jun 23 '23 at 02:09

1 Answers1

0

The problem was the underlying view that had the FeatureName, FeatureDescription data. The columns had trailing whitespace in the records so none of the PIVOT columns matched.

FeatureName len(FeatureName) FeatureDescription len(FeatureDescription)
Debtor Category 50 TRADE 70
Type 50 DISTRIBUTOR 70
Class 50 LOCAL 70

I trimmed the underlying view and it worked.

Thanks for the insight @nbk.

Ronak Vachhani
  • 214
  • 2
  • 14