2

Cannot find an example that looks like what I am looking for.

I am trying to capture ASN numbers in an FAA Aeronautical NOTAM. Example Below:

Example Text:

2019-AWP-7268-OE

Regex Match (findall):

\d{4}-(?:AAL|ACE|AEA|AGL|ANE|ANM|ASO|ASW|AWP|WTE|WTW)-(?:\d{3,6})-(?:OE|NRA)

However, I also want to capture it when multiple are issued:

  • 2019-AWP-659 THRU 662-NRA
  • 2019-AWP-3823/3825-NRA
  • 2019-AWP-4593/4594/4595/4596-NRA
  • 2019-ASW-4791, 4794 THRU 4796, 4798 THRU 4800-NRA

I get caught up trying to make an expression where any amount of characters but the expression ends in OE/NRA. Is there any way to match the Year (2019), Region (ASW|AWP), Any Text (3823/3825), then Type (OE|NRA)?

The fourth bird
  • 154,723
  • 16
  • 55
  • 70

2 Answers2

2

I would use something like this:

r'((\d{4})-(AAL|ACE|AEA|AGL|ANE|ANM|ASO|ASW|AWP|WTE|WTW)-([^-]+)-(OE|NRA))'

which, when given as a key to re.findall(), should produce a nested list, where the outer list contains each big match, and the inner list contains the four sub-matches within each match (as elements 1, 2, 3, and 4 of a tuple, respectively; element 0 is the full single pattern match).

Thereafter, you can do another iteration of regex operations, or some other operations, specifically on the any text portion of the match, to isolate exactly which IDs or whatever you're dealing with.

Green Cloak Guy
  • 23,793
  • 4
  • 33
  • 53
1

To match any text you could also use .* which will match until the end of the line and will then backtrack to match the last occurrence of of - and will then match either OE or NRA

You could shorten the alternation a bit by adding some characters to a character class like A[AG]L to match either AAL or AGL

Note that you don't need the non capturing group around (?:\d{3,6})

^(\d{4})-(ACE|AEA|A[AG]L|AN[EM]|AS[WO]|AWP|WT[EW])-(\d{3,6}.*)-(OE|NRA)$

Regex demo

Without the anchors you could make the quantifier non greedy and use word boundaries:

\b(\d{4})-(ACE|AEA|A[AG]L|AN[EM]|AS[WO]|AWP|WT[EW])-(\d{3,6}.*?)-(OE|NRA)\b

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    ..or `(A(?:AL|CE|EA|GL|N[EM]|S[OW]|WP)|WT[EW])`, which quickly evaluates three-character strings that don't begin with `'A'`. – Cary Swoveland Aug 11 '20 at 18:07