0

enter image description hereThis code should return the street address without the street number. These EU address have their street number at the end of the address. I am not sure why the error is happening.

    UPDATE STAGING_1_1_FACT_CUSTOMERS_B2B_LGP                       
        SET [StreetAddress] = SUBSTRING([Address], 1, PATINDEX('%[1-9]%', [Address])-1)             
    FROM    [dbo].[STAGING_1_1_FACT_CUSTOMERS_B2B_LGP]
    WHERE [Country Code] IN ('NL','DE','LT','AT','BE','ES','DK','IT', 'SE', 'CZ', 'SI', 'SUI', 'EE','PL','HU','LIE','FI','LV')
Sdot4allg
  • 9
  • 6
  • 3
    Do they all have a street number. If the return is 0, you'll be looking for -1 – Nathan_Sav Apr 20 '22 at 12:32
  • This is a very risky idea since (at least here in Germany) street numbers often includes letters, like 'example street 10a' – Jonas Metzler Apr 20 '22 at 12:37
  • A few do not have street numbers. – Sdot4allg Apr 20 '22 at 12:41
  • So explain what you want in `StreetAddress` when `Address = 'Denmark Street'` with no street numbers. It can't be `SUBSTRING(StreetAddress, 1, -1)` because, well, a string can't be -1 characters long. And actually explain why you want `StreetAddress` to be everything _up to but not including_ the number? – Aaron Bertrand Apr 20 '22 at 12:41
  • And look, I've done some work in the address cleaning business, and you don't want to go down this path. I've been working with T-SQL for two decades and there's no way I'd try to clean up address data in T-SQL. No offense intended but it certainly shouldn't be attempted by someone challenged by an invalid length error message. – Aaron Bertrand Apr 20 '22 at 13:04

1 Answers1

3

Identify rows without a number in the address:

SELECT * FROM dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP -- wow
WHERE PATINDEX('%[1-9]%', [Address]) = 0;

To get the entire address when a number doesn't occur, you can use:

SUBSTRING(Address, 1, COALESCE(NULLIF(
  PATINDEX('%[1-9]%', [Address]), 0),LEN(Address)+1)-1)

Which - finding no number - will add 1 to the length so you can still subtract 1 to get the whole string. That's assuming you want the whole string in that case.

In order to perform the update you're still going to have to prepare for garbage data that you obviously have (or you wouldn't be here) but that you didn't include in your screenshot (also don't post data as screenshots). Given this sample data:

CREATE TABLE dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP
(
  Address       nvarchar(255),
  StreetNumber  nvarchar(255),
  StreetAddress nvarchar(255)
);

INSERT dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP(Address) 
VALUES(N'Gewerbegebiet 5'),
(N'Spännigweg 1'),
(N'Hauptstr 113A'),
(N'Viale Francesco Redi 39'),
(N'Garbage your code does not handle.'),
(N'More garbage 20th promenade 225 W');

You can run the following update:

; /* <--- ensure previous statement terminated */ 
WITH src AS
(
  SELECT *, FirstNumber = 
    COALESCE(NULLIF(PATINDEX('%[1-9]%', [Address]), 0),LEN(Address)+1)
  FROM dbo.STAGING_1_1_FACT_CUSTOMERS_B2B_LGP
  -- WHERE CountryCode IN ('some', 'list');
)
UPDATE src SET 
  StreetNumber  = SUBSTRING(Address, FirstNumber, 255),
  StreetAddress = LEFT(Address, FirstNumber-1);

Output (which shows what happens to garbage):

Address StreetNumber StreetAddress
Gewerbegebiet 5 5 Gewerbegebiet
Spännigweg 1 1 Spännigweg
Hauptstr 113A 113A Hauptstr
Viale Francesco Redi 39 39 Viale Francesco Redi
Garbage your code does not handle. Garbage your code does not handle.
More garbage 20th promenade 225 W 20th promenade 225 W More garbage

Also you don't need the FROM line in the update. You're updating the same table.

Finally, the requirement makes little sense to me.

  • Why do you want StreetAddress to be everything up to but not including the number?
  • What happens if there is a number in a street name?
  • If you're trying to clean up address data, there is very expensive software that does this and still isn't perfect, so trying to re-invent the wheel is going to lead to lots of little frustrating issues like this one.
Aaron Bertrand
  • 272,866
  • 37
  • 466
  • 490
  • This is risky because in Germany some street names include numbers in their name (most of them a date). Maybe it would be a good idea to restrict your condition on the part after the first space (assuming the street name doesn't contain spaces, so the first space occurs before the street number)? – Jonas Metzler Apr 20 '22 at 12:47
  • @Jonas I'm not the OP. I still don't understand why they want everything up to but not including the street number (or, like you say, a number in a street name, like `2nd Avenue 10a`). Also there are lots and lots of street names with spaces, so that is risky, too. – Aaron Bertrand Apr 20 '22 at 12:47
  • You are right, I don't understand, too. And of yourse, your idea will help. I just want to point out such risks. Likely, the only safe way would be to save street name and street number in separate columns. – Jonas Metzler Apr 20 '22 at 12:49
  • You are correct. So I have two columns, StreetAddress and StreetNumber. The addresses with that has numbers, the number will be extracted to the StreetNumber column and the other part of the address will be in the StreetAddress column. While address without a street number will just be in the StreetAddress column, leaving their streetnumber as NULL – Sdot4allg Apr 20 '22 at 12:52
  • @Sdot4allg ok, but (a) you don't even mention StreetNumber in the question, and (b) again, just finding the first number and assuming everything after that is the StreetNumber is naive. What about `South 20th Street 455 W` or `New Wales 455, Unit 5`? – Aaron Bertrand Apr 20 '22 at 12:56
  • Ok, of yourse you can try such a solution like Aaron proposed and in most (perhabs even in all) cases within your DB, this will do fine. But a 100% save solution will not be possible due to the risks he and me mentioned. To be 100% save, I think you have to do this process manually and in future directly separate the columns before filling them or leave it in one and don't try to separate it. – Jonas Metzler Apr 20 '22 at 12:58
  • Apologies, but I was specific with EU address only and they have their street number at the end of it. I can extract the number correctly into a streetaddress column but i am not able to extract the rest of the address in to the street address column – Sdot4allg Apr 20 '22 at 13:06
  • @Sdot4allg So you don't think it's possible for any street in all of Europe to have a number in the name? Or for a full street address (entered by some person manually somewhere at some point) to include multiple numbers that may or may not represent street numbers? Your code needs to handle those cases even if you don't think they're possible. – Aaron Bertrand Apr 20 '22 at 13:09
  • I agree with you @AaronBertrand I have looked at all this scenario but I am tailoring the data I have to what I need as there wont be one fit code so solve this. I have included an image of the out put I have now, to my original question so you can understand clearly what I am trying to achieve. The data I have is limited to a few EU countries which is included in the where clause. – Sdot4allg Apr 20 '22 at 13:13
  • You included DE, i.e. Germany, in your where clause. I am living in Germany and can inform you that here exist a lot of streets having numbers in their name and letters in their house number. You can believe it or not, you can believe Aaron or not and of course, you can do whatever select you want. You just should be aware of the issues that could occur. Good luck anyway! – Jonas Metzler Apr 20 '22 at 13:19
  • @AaronBertrand The CTE statement above has actually fixed the problem for me. Yes, it did throw out some garbage (9 records) which were addresses written wrongly and I can manually correct them in the source system. Thank you for your help and thanks to everyone who made their input. Much appreciated – Sdot4allg Apr 20 '22 at 15:16
  • @Sdot4allg I mean here's an example street from Germany that has its own Wikipedia entry: [Straße des 17. Juni](https://en.m.wikipedia.org/wiki/Stra%C3%9Fe_des_17._Juni). – Aaron Bertrand Apr 20 '22 at 17:02