0

I have encountered a strange (until I understand the logical reason) behaviour of group by in a SQL Server database. There are many duplicates in a table, for which when I query I get duplicate rows but when I try to find all dupes using group by or row_number strategy I get 0 records. But when I add "Cast" to the group by / row_number I get correct list of duplicates.

The datatype is nvarchar for all 3 keys.
Can someone tell me why this is happening?

Added the query and its output

enter image description here

select top 10 len(VBELN) len_vblen, len(MANDT) , len(posnr) ,  * from [SRC_SAP_R3].[LIPS] where VBELN = '6316785926'

select cast(MANDT as nvarchar) as "MANDT",cast(VBELN as nvarchar) as "VBELN" , cast(posnr as nvarchar) as "posnr", count(*) from [SRC_SAP_R3].[LIPS]
group by cast(MANDT as nvarchar),cast(VBELN as nvarchar)  , cast(posnr as nvarchar) 
having count(*)>1;

select cast(MANDT as varchar) as "MANDT",cast(VBELN as varchar) as "VBELN" , cast(posnr as varchar) as "posnr", count(*) from [SRC_SAP_R3].[LIPS]
group by cast(MANDT as varchar),cast(VBELN as varchar)  , cast(posnr as varchar) 
having count(*)>1;

select MANDT, VBELN ,posnr, count(1) from [SRC_SAP_R3].[LIPS]
group by MANDT, VBELN ,posnr
having count(1)>1;
Dale K
  • 25,246
  • 15
  • 42
  • 71
Aditya Rathi
  • 59
  • 1
  • 6
  • 1
    As per the question guide, please do not post images of code, data, error messages, etc. - copy or type the text into the question. Please reserve the use of images for diagrams or demonstrating rendering bugs, things that are impossible to describe accurately via text. – Dale K Jan 19 '23 at 06:16
  • If we can't replicate this issues it is unlikely we can provide an explanation. Perhaps provide the DDL of your table and some inserts of the data that is giving you these results. – Paul Maxwell Jan 19 '23 at 06:37
  • 2
    try DATALEN() instead of LEN() --- `select top 10 datalen(VBELN) datalen_vblen, datalen(MANDT) , datalen(posnr) , * from [SRC_SAP_R3].[LIPS] where VBELN = '6316785926'` – Paul Maxwell Jan 19 '23 at 06:41
  • I have also tried to replicate your issue and it's working fine for me see [image](https://i.imgur.com/IxGJLqB.png) – Pratik Lad Jan 19 '23 at 06:54
  • I tried with your approach [it](https://i.imgur.com/Y5HhY2g.png) is working fine for me. – Bhavani Jan 19 '23 at 07:25
  • Did you check count(*) also? – Aswin Jan 19 '23 at 07:50
  • 1
    Casting them without length will truncate them to 30 characters. Maybe they have many trailing spaces and differing amounts – Martin Smith Jan 19 '23 at 08:44
  • Don't `MANDT, VBELN, POSNR` form the key for the `LIPS` table in SAP ABAP by default? How have you got duplicate keys in there? What modifications have you been making? – AlwaysLearning Jan 19 '23 at 09:37
  • Perhaps azure-synapse is the key here. It might handle stuff differently from regular SQL Server. Looks like a bug to me. Can you post the query plan for the "OK" and "Bad" query? – siggemannen Jan 19 '23 at 13:01
  • I figured it out, it was special char issue. If I do cast to string and group by it works.. n also if I do row_number then also it works.. probably due to diff in coallation between sap n sql server – Aditya Rathi Aug 26 '23 at 00:08

1 Answers1

0

I tried to repro this in Azure Synapse Analytics. As @Martin Smith said the len() function will ignore the trailing spaces while computing the total length of the column. When I tried with datalength() function, the length of trailing spaces is also included. Below is the repro.

  • A table is created with varchar column and one data is inserted with trail spaces and other data is without spaces.
 create  table SAP_TAB (VBELN varchar(100))
insert  into SAP_TAB values('500 ')
insert  into SAP_TAB values('500')
  • Then len() function, datalength() function is applied to data. Also, casted the data as varchar and length function is applied to casted data. Below is the query.
select VBELN,len(VBELN) as [length_VBELN],
datalength(VBELN) as [data_length_VBELN],
len(cast(VBELN as  varchar(10))) as 
[length_varchar_casted_VBELN]
from sap_tab

Result

VBELN length_VBELN data_length_VBELN length_varchar_casted_VBELN
500 3 5 3
500 3 3 3

enter image description here

Aswin
  • 4,090
  • 2
  • 4
  • 16
  • len will ignore the spaces.. but it will not work with special characters not detected by sql. len() was giving same result in value. – Aditya Rathi Aug 26 '23 at 00:09