2

I need to extract number from french addresses.

Here is my regex:

\d+( |\d+|bte|-|boite|[\w] {1}|([a-z] ){1}){0,2}

Example on regex101: https://regex101.com/r/ZP8DSV/1 It's partialy working but not for all the lines.

I need to extract the number + extra info.

If I take this list, it should give me this (for each line):

|---------------------------------------|--------------|
|              Original                 |     Result   |
|---------------------------------------|--------------|
| rue hovémont 3                        | 3            |
| rue hovémont 3-5                      | 3-5          |
| rue hovémont 3 5                      | 3 5          |
| Rue hovémont 35                       | 35           |
| Rue hovémont 46 A                     | 46 A         |
| Rue hovémont 46 A1                    | 46 A1        |
| 46 A1 Rue hovémont                    | 46 A1        |
| 46 A Rue hovémont                     | 46 A         |
| Rue du pont de pierre, 1              | 1            |
| Chaussée d alseg 416 c                | 416 c        |
| Chaussée d alseg, 416 c               | 416 c        |
| Chaussée d alseg 416c                 | 416c         |
| Chaussée d alseg, 416c                | 416c         |
| 416 c Chaussée d alseg                | 416 c        |
| 416 c, Chaussée d alseg               | 416 c        |
| 416c Chaussée d alseg                 | 416c         |
| 416c, Chaussée d alseg                | 416c         |
| Square de la demi-lune 7 boite 5      | 7 boite 5    |
| 7 boite 5 Square de la demi-lune      | 7 boite 5    |
| Rue aux laines 150/58                 | 150/58       |
| Rue de la forêt, 95                   | 95           |
| Chaussée d'anvers 294                 | 294          |
| Avenue jean sébastien bach, 24 bte 32 | 24 bte 32    |
| 10 bte 1 rue des volontaires          | 10 bte 1     |
| Rue du 5ème Tïme 5 bte 2              | 5 bte 2      |
| Rue du 5eme Tïme 5 bte 2              | 5 bte 2      |
| Rue du 5 eme Tïme 5 bte 2             | 5 bte 2      |
| Rue du 5 ème Tïme 5 bte 2             | 5 bte 2      |
| Rue du 1 er Tïme 5 bte 2              | 5 bte 2      |
| 20a Test Strasse                      | 20a          |
|---------------------------------------|--------------|

Can you help me on this case ? :)

Everno
  • 33
  • 5

1 Answers1

2

Here a working regex to capture your text starting with a digit either at start or at the end:

^\d\w*(?:\h+(?>boite|bte|\pL\d?|\d)\b)*|\h\K\d+\pL?(?:[-/]\d+|\h+(?:boite|bte|\pL\d?|\d+)\b)*$

Updated RegEx Demo

For PHP use following:

$re = '~^\d\w*(?:\h+(?>boite|bte|\pL\d?|\d)\b)*|\h\K\d+\pL?(?:[-/]\d+|\h+(?:boite|bte|\pL\d?|\d+)\b)*$~miu'

RegEx Details:

  • ^: Start
  • \d\w*: Match a word starting with a digit
  • (?:: Start a non-capture group
    • \h+: Match 1+ whitespaces
    • (?>boite|bte|\pL\d?|\d): Match boite or bte or a single digit or a letter optionally followed by a digit
    • \b: Word boundary
  • )*: End non-capture group. Match 0 or more of this group.
  • |: OR
  • \h: Match a whitespace
  • \K: reset the match
  • \d+\pL?: Match 1+ digit followed by an optional letter
  • \b: Word boundary
  • (?:[-/]\d+|\h+(?:boite|bte|\pL\d?|\d+)\b)*: Match remaining parts
  • $: End
anubhava
  • 761,203
  • 64
  • 569
  • 643