7

I have a directory of nuget packages that I've downloaded from nuget.org. I'm trying to create a regex that will parse out the package name and version number from the filename. It doesn't seem difficult at first glance; the filenames have a clear pattern:

{PackageName}.{VersionNumber}.nupkg

Edge cases make it challenging though.

  • Package names can have dashes, underscores, and numbers
  • Package names can have effectively unlimited parts separated by dots
  • Version numbers consist of 3-4 groups of numbers, separated by dots
  • Version numbers sometimes are suffixed with pre-release tags (-alpha, -beta, etc)

Here's a sample list of nuget package filenames:

knockoutjs.3.4.2.nupkg
log4net.2.0.8.nupkg
runtime.tizen.4.0.0-armel.microsoft.netcore.jit.2.0.0.nupkg
nuget.core.2.7.0-alpha.nupkg
microsoft.identitymodel.6.1.7600.16394.nupkg

I want to be able to do a search/replace in a Serious Text Editor where the search is a regex with two groups, one for the package name and one for the version number. The output should be "Package: \1 Version: \2". With the 5 packages above, the output should be:

Package: knockoutjs Version: 3.4.2
Package: log4net Version: 2.0.8
Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0
Package: nuget.core Version: 2.7.0-alpha
Package: microsoft.identitymodel Version: 6.1.7600.16394

The closest relatively concise regex I've come up with is:

^([^\s]*)\.((?:[0-9]+\.){3,})nupkg$

...which results in the following output:

Package: knockoutjs Version: 3.4.2.
Package: log4net Version: 2.0.8.
Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0.
nuget.core.2.7.0-alpha.nupkg
Package: microsoft.identitymodel.6 Version: 1.7600.16394.

It handles the first three decently, although I don't want that trailing dot. It doesn't even match on the fourth one, and the fifth one has the first part of the version number lumped in with the package name.

Save the day!

Paolo
  • 21,270
  • 6
  • 38
  • 69
BobbyA
  • 2,090
  • 23
  • 41
  • [No guarantees that this will match everything, but give it a try and let me know](https://regex101.com/r/nJBIAB/1) – emsimpson92 Aug 02 '18 at 22:25

3 Answers3

11

I modified your expression slightly to:

^(.*?)\.((?:\.?[0-9]+){3,}(?:[-a-z]+)?)\.nupkg$

The main points are that I moved the . in front of the digits in the first non capturing group, and that I added an optional non capturing group for -alpha in the fourth string.

Replace with:

Package: \1 Version: \2

Test the regex live here.

Mikael Dúi Bolinder
  • 2,080
  • 2
  • 19
  • 44
Paolo
  • 21,270
  • 6
  • 38
  • 69
  • 1
    This is great. Could you explain how the regex engine is processing that first '?' that makes '.*' optional? Without that first '?', your regex works in every case but the last. The last case ends up outputting: "Package: microsoft.identitymodel.6.1.7600 Version: 16394". I know '?' to be a greedy operator, so I'm having trouble wrapping my head around why adding it here seems to make it .* less greedy. – BobbyA Aug 03 '18 at 14:10
  • 1
    Ah ha. The ? operator in that context doesn't mean optional, it makes the * operator lazy instead of greedy. https://www.regular-expressions.info/repeat.html – BobbyA Aug 03 '18 at 14:17
  • 1
    @BobbyA That is correct, it will try to match as few as possible.`.?` is different than `.*?`. – Paolo Aug 03 '18 at 14:26
  • 1
    ^(?.*?)\.(?(?:\.?[0-9]+){3,}(?:[-a-z]+)?)\.nupkg$ adding named groups – honzajscz Dec 29 '18 at 15:33
  • I had a similar use cases but the above regex was failing for the following (super-weird) use-cases: Thinktecture.IdentityServer.v3.1.0.0-beta4-1.nupkg rhinomocks.3.6.nupkg So I adjusted it slightly: ^(.*?)\.((?:\.?v?[0-9]+){2,}(?:[-a-z0-9]+)?)\.nupkg$ This seems to match all my packages now. – ciaranj Jul 02 '19 at 21:25
  • wont work if you have dot notation after the prerelease, e.g. nuget.core.2.7.0-alpha.123.nupkg – Leon Jul 16 '23 at 13:15
  • @Leon is that package actually a valid one? – Paolo Jul 17 '23 at 09:58
  • [of course](https://semver.org/spec/v2.0.0.html#spec-item-9), with other examples: 1.0.0-alpha, 1.0.0-alpha.1, 1.0.0-0.3.7, 1.0.0-x.7.z.92, 1.0.0-x-y-z.--. – Leon Jul 18 '23 at 11:23
  • @Leon how about https://regex101.com/r/u4iRKw/1 ? – Paolo Jul 18 '23 at 12:41
  • still does not work correctly - "nuget.core.1.0.0-x.alpha.123-y.nupkg" does not work, also "nuget.core.1.0.0-x+123.nupkg". You can [try here](https://jubianchi.github.io/semver-check/#/1.*/nuget.core.1.0.0-x.alpha.123-y.nupkg) – Leon Jul 18 '23 at 17:29
1

I think this regex will do what you want:

^(.*?)\.(?=(?:[0-9]+\.){2,}[0-9]+(?:-[a-z]+)?\.nupkg)(.*?)\.nupkg$

It uses a positive lookahead to look for the version number followed (possibly) by a tag in the form -[a-z]+ (e.g. -alpha) followed by \.nupkg. This last part prevents it matching the 4.0.0-armel in the third sample. For your edge cases, and substituting with Package: $1 Version: $2 the output is:

Package: knockoutjs Version: 3.4.2
Package: log4net Version: 2.0.8
Package: runtime.tizen.4.0.0-armel.microsoft.netcore.jit Version: 2.0.0
Package: nuget.core Version: 2.7.0-alpha
Package: microsoft.identitymodel Version: 6.1.7600.16394

Demo

Nick
  • 138,499
  • 22
  • 57
  • 95
1

To include the entire version, everything before ".nupkg":

^(.*?)\.((?:\.?[0-9]+){3,}(?:[-a-z0-9]+?\.?)*)\.nupkg$

This gives these groups for My.Package.1.2.3.4-pre.1.other-thing:

  1. My.Package
  2. 1.2.3.4-pre.1.other-thing
Mikael Dúi Bolinder
  • 2,080
  • 2
  • 19
  • 44