2

I'm trying to figure out the best way to create a regex rule that works for both the IPv4 and IPv6 address space when the format is:

[2a00:1397:4:2a02::a1]:50434
129.13.252.47:13456

I'm close, but basically, I need the match to be on the very last colon character (right before the port), rather than matching the rest of the line, as I don't want to discard the port value. The idea is the last colon character is a delimiter.

I currently have the following regex :[^:]+$ which basically gets me the last colon character but matches the port value as well. I need the match to stop at the colon character. Is this possible?

https://regexr.com/3hpvk

James White
  • 676
  • 1
  • 10
  • 18
  • I took the freedom to repost this in a more general manner, where I clearly described what kind of formats were allowed based on rfc 3986 here. https://stackoverflow.com/questions/64282039/how-to-match-something-which-may-or-may-not-be-available-in-regex. The answers there will cover the more general case, accepting any allowed formats. – patrik Oct 12 '20 at 15:21

3 Answers3

2

The regex ((?::))(?:[0-9]+)$ will do a reverse lookup based on ending number and then a collon. The colon (:) will than be group(1).

See https://regexr.com/3hpvt

and this will group all individual parts: (.*)((?::))((?:[0-9]+))$

Ivonet
  • 2,492
  • 2
  • 15
  • 28
  • Thanks for your answer. The regex I'm after is basically matching the very last `:` character for any IPv4 or IPv6 value without the remaining port value being tagged in the group. Is that possible, to restrict the regex match to not match the remaining line content in either example? The idea this will be used for a PHP array to split the IP and port value as [0] and [1]. – James White Dec 10 '17 at 11:15
  • 1
    The answer is already in there. it is group(1) or group(2) in the second example – Ivonet Dec 10 '17 at 11:23
  • Thanks. I'm not sure I can use groups in PHP however. Original intention was to use `preg_split` to essentially form the IP and Port value as [0] and [1] in an array. – James White Dec 10 '17 at 11:27
  • Then I would not do it with regex but with simple line splits and string concatenation – Ivonet Dec 10 '17 at 11:34
  • That might be the way to go in this case, but I think I have found a regex pattern that works for what I need `\:(?=[^:]*$)`, after a bit of trial and error. This selects the last colon character and nothing after it, but groups would be much better for this. – James White Dec 10 '17 at 11:37
  • While this is an ok answer, is not really possible to use. The RFC rfc3986 provides a regex to filter out host and port https://tools.ietf.org/html/rfc3986#appendix-B, however, port is optional. Any idea how to deal with this kind of problem (assuming host and port is handled via a second regex)? – patrik Oct 09 '20 at 15:47
  • @patrik Something like this? https://regexr.com/5dspn `\[(.*:.*:.*:.*::.*)\]((?::)((?:[0-9]+)))?$` – Ivonet Oct 10 '20 at 16:15
0

Regex pattern for the tail of text from digits only which must follow after colons is /:(\d+)$/gm

When you need to detects which kind of ip6 or ip4 format is used should use more complex Regex pattern like /(:(\d+)$)|(]:(\d+)$)/gm

The flags [g] & [m] are optional and redundant, they were used to debug via regexr.com only.

In actual software i use regex like

RegEx.Create(
  '(:(?<IPv4>\d+$))|(]:(?<IPv6>\d+)$)',
  [roCompiled, roExplicitCapture]
);
  • Your pattern matches only the _port_. From the title of the question it looks like both, _ip_ and _port_ are needed. The question text, on the other hand, implies that just the delimiter between them is needed... Also, you should describe what the flags `g` and `m` do. Depending on the regex api used those flags may have to be provided in another way than you do. – Leviathan Nov 28 '20 at 19:27
  • The flags [g] & [m] are optional and redundant, they were used to debug via https://regexr.com only.In actual software i use regex like RegEx.Create('(:(?\d+$))|(]:(?\d+)$)', [roCompiled, roExplicitCapture]); – М. Ю. Брызгалов Nov 29 '20 at 21:19
  • You should edit your answer then and remove the flags, so that future readers aren't distracted. – Leviathan Nov 30 '20 at 16:11
0

If you are using java there is no need to do that via regex. You can simply used the built-in URI class which works both with ipv4 and ipv6

        import java.net.URI;    

        String testUri = "ldap://[2001:678:8a0:ff00:250:56ff:fe84:222a]:10389/o=Airius.com??sub?(sn=Jensen)";
        URI uri = new URI(testUri);
        System.out.println(uri.getScheme()); //ldap
        System.out.println(uri.getHost()); // [2001:678:8a0:ff00:250:56ff:fe84:222a]
        System.out.println(uri.getPort()); //10389
Radoslav
  • 1,446
  • 1
  • 16
  • 30