0

I have two tables with a column that contains specific IDs.

For example,

TABLE A

cust_id 
1234
32145
1235692
9293159

TABLE B

cust_id 
0001234
0032145
1235692
9293159 

I'm trying to apply inner join using sql, but since they do not exactly match, the first two ids 1234 (0001234), 32145 (0032145) are missing from the results using inner join clause.

So here is my question.

  1. Is there any way to add additional 0s in front of cust_id of TABLE A to match that of TABLE B's cust_id?

For example, 1234 is only 4 digits, but after transformation, it would be in 7 digits with additional three 0s in front of it --> 0001234.

  1. Are these kinds of tasks easier to handle with Python? or SQL? My past experiences with data cleansing tells me it's Python, but are there any other suggestions?

If possible, for this specific case, I am hoping I can get help using Python or SQL.

Thanks in advance.

Lee Vincent
  • 35
  • 1
  • 6
  • 1
    I think you're looking for something like this: https://stackoverflow.com/questions/9520661/formatting-numbers-by-padding-with-leading-zeros-in-sql-server – PaoloJ42 Jul 14 '21 at 08:49
  • @PaoloJ42 Sounds great! I'll give it a shot! Although, transforming the column to an int isn't an option for me since some of the entries do include characters, for example, 123C2F. But I will definitely try that solution thx a bunch! – Lee Vincent Jul 14 '21 at 10:58
  • Tag your question with the database you are using. – Gordon Linoff Jul 14 '21 at 11:08
  • @GordonLinoff just did! I'm using SSMS. BTW, I just figured out that SSMS is perfectly handling those missing 0s with inner join clause. I don't know what is the logic behind it since the columns I've mentioned are saved as nvarchar(50). To my acknowledgement, database literally compares each characters when finding a match for example inner join, but apparently it's smart enough to find a match excluding the unnecessary 0s in front. – Lee Vincent Jul 15 '21 at 10:23

1 Answers1

1

For python, this can be easily handled using pandas.

Assuming cust_id is already a string column:

>>> df.cust_id.apply(lambda x: x.zfill(7))
0    0001234
1    0032145
2    1235692
3    9293159

For SQL, you state you're using SSMS, so that implies SQL Server. SQL Server unfortunately doesn't have an LPAD function out of the box, but you can replicate it using:

REPLACE(STR(<column_name>, <desired_length>),' ','0')

For example:

with cust_ids as (
    select * from (
        values 
            ('1234'),
            ('32145'),
            ('1235692'),
            ('9293159')
        ) a (cust_id)
)

select 
    cust_id,
    REPLACE(STR(cust_id, 7),' ','0') as padded_cust_id
from cust_ids
greg_data
  • 2,247
  • 13
  • 20
  • 1
    Thank you it worked out perfectly :) I was able to achieve the data set I wanted with the Pandas as you suggested, but interestingly, I figured that SSMS (or the SQL Server) figured out the missing 0s by itself somehow, and I got the exact same results for both cases when 0s were missing and filled afterwards. – Lee Vincent Jul 21 '21 at 07:35
  • I noticed that from the comments on the question, but glad the answer was still helpful. It's very interesting seeing what kinds of inference SQL Server is doing under the hood for those comparisons. – greg_data Jul 21 '21 at 08:11