26

I'm trying to extract a JIRA identifier from a line of text.

JIRA identifiers are of the form [A-Z]+-[0-9] - I have the following pattern:

foreach my $line ( @textBlock ) {
    my ( $id ) = ( $line =~ /[\s|]?([A-Z]+-[0-9]+)[\s:|]?/ );
    push @jiraIDs, $id if ( defined $id && $id !~ /^$/ );
}

This doesn't cope if someone specifies some text which contains the pattern inside another string - for example blah_blah_ABC-123 would match upon ABC-123. I don't want to mandate that there must be a space or other delimiter in front of the match as that would fail if the identifier were at the start of the line.

Can anyone suggest the necessary runes?

Thanks.

DaveG
  • 426
  • 1
  • 5
  • 13

4 Answers4

29

Official JIRA ID Regex (Java):

Atlassian themselves have a couple webpages floating around that suggest a good (java) regex is this:

((?<!([A-Z]{1,10})-?)[A-Z]+-\d+)

(Source: https://confluence.atlassian.com/display/STASHKB/Integrating+with+custom+JIRA+issue+key)

Test String:
"BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"

Matches:
BF-18, X-88, ABCDEFGHIJKL-999, DEF-33, ABC-1

Improved JIRA ID Regex (Java):

But, I don't really like it because it will match the "DEF-33" from "abcDEF-33", whereas I prefer to ignore "abcDEF-33" altogether. So in my own code I'm using:

((?<!([A-Za-z]{1,10})-?)[A-Z]+-\d+)

Notice how "DEF-33" is no longer matched:

Test String:
"BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"

Matches:
BF-18, X-88, ABCDEFGHIJKL-999, ABC-1

Improved JIRA ID Regex (JavaScript):

I also needed this regex in JavaScript. Unfortunately, JavaScript does not support the LookBehind (?<!a)b, and so I had to port it to LookAhead a(?!b) and reverse everything:

var jira_matcher = /\d+-[A-Z]+(?!-?[a-zA-Z]{1,10})/g

This means the string to be matched needs to be reversed ahead of time, too:

var s = "BF-18 abc-123 X-88 ABCDEFGHIJKL-999 abc XY-Z-333 abcDEF-33 ABC-1"
s = reverse(s)
var m = s.match(jira_matcher);

// Also need to reverse all the results!
for (var i = 0; i < m.length; i++) {
    m[i] = reverse(m[i])
}
m.reverse()
console.log(m)

// Output:
[ 'BF-18', 'X-88', 'ABCDEFGHIJKL-999', 'ABC-1' ]
Julius Musseau
  • 4,037
  • 23
  • 27
  • Any idea how to do it in python? The "Official JIRA ID Regex" and the "Improved JIRA ID Regex" cause a python error, "look-behind requires fixed-width pattern". The "Improved JIRA ID Regex" in python seems to be the best bet, but it matches things like 'INXX-2222s'[::-1]. Mabe this is worth a standalone question, rather than a comment? – grayaii Sep 25 '17 at 14:25
  • 1
    @grayaii Ruby has the same problem, I solved it with the JavaScript method (reverse, match, reverse back). However, I prefer the official one (just remove the lowercase `a-z`), as it adds some tolerance to formatting errors (let's say the commit message was supposed to be "Fixed this\nABC-123", but, for some reason, you got "Fixed thisABC-123"). I bet that's the reasoning behind the official regex. – rafasoares Dec 08 '17 at 00:42
  • I had an issue with this because my JIRA Issue ID key was like this: ABC1-123 where it had a number after the letters to the left of the dash. I ended up with this regex that worked: `((?<!([A-Z])-?)[A-Za-z0-9_]+-\d+)` – Rounder Dec 05 '18 at 19:58
  • 1
    The regex in Jira itself [changed](https://jira.atlassian.com/browse/JRASERVER-37162?focusedCommentId=699814&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-699814), such that the "correct" regex for a Jira issue is apparently `[A-Z][A-Z0-9]+-[0-9]+` (the product code must be at least two characters long, must start with a letter and must be all uppercase or digits). – Kit Grose Feb 09 '22 at 23:10
  • Another link describing how JIRA has changed the default project key regex to allow upper case letters and numbers (though the first letter must be uppercase): https://community.atlassian.com/t5/Jira-questions/Numbers-in-Project-Key/qaq-p/1181713 Provided an alternative answer with an updated regex: https://stackoverflow.com/a/73914895/412691 – David Oct 01 '22 at 17:16
6

You can make sure that character before your pattern is either a whitespace, or the beginning of the string using alternation. Similarly make sure, it is followed by either whitespace or end of the string.

You can use this regex:

my ( $id ) = ( $line =~ /(?:\s|^)([A-Z]+-[0-9]+)(?=\s|$)/ );
Rohit Jain
  • 209,639
  • 45
  • 409
  • 525
  • That doesn't quite work ... because the lookbehind is variable length (one character [\s] or none [^]) which causes a `Variable length lookbehind not implemented in regex` error. – DaveG Oct 11 '13 at 16:43
  • Works in Python too! Thank you! – rominf Oct 10 '18 at 07:27
  • For matching project keys use: ```my ( $id ) = ( $line =~ /(?:\s|^)([A-Z0-9_]+)(?=\s|$)/ );``` – rominf Oct 10 '18 at 07:43
4

In ~2015, JIRA started allowing numbers and underscores in JIRA project keys, so an updated regular expression for a JIRA ticket is:

\b[A-Z][A-Z0-9_]+-[1-9][0-9]*

Regex details: https://regex101.com/r/ZEzo2R/1

Sources:

Ensure that you choose a supported project key format. Only formats that meet all of the following rules are supported:

  • The first character must be a letter,
  • All letters used in the project key must be from the Modern Roman Alphabet and upper case, and
  • Only letters, numbers or the underscore character can be used.

Jira issue keys (or issue IDs) are of the format <project key>-<issue number>

David
  • 546
  • 2
  • 10
1

If you include sample data with your question, you get the best shot at answers from those who might not have Jira, etc.

Here's another take on it-

my $matcher = qr/ (?: (?<=\A) | (?<=\s) )
                  ([A-Z]{1,4}-[1-9][0-9]{0,6})
                  (?=\z|\s|[[:punct:]]) /x;

while ( <DATA> )
{
    chomp;
    my @matches = /$matcher/g;
    printf "line: %s\n\tmatches: %s\n",
        $_,
        @matches ? join(", ", @matches) : "none";
}

__DATA__
JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?

Remember that [0-9] will match 0001 and friends which you probably don't want. I think, but can't verify, Jira truncates issue prefixes to 4 characters max. So the regex I did only allows 1-4 capital letters; easy to change if wrong. 10 million tickets seems like a reasonably high top end for issue numbers. I also allowed for trailing punctuation. You may have to season that kind of thing to taste, wild data. You need the g and capture to an array instead of a scalar if you're matching strings that could have more than one issue id.

line: JIRA-001 is not valid but JIRA-1 is and so is BIN-10000,
        matches: JIRA-1, BIN-10000
line: A-1, and TACO-7133 but why look for BIN-10000000 or BINGO-1?
        matches: A-1, TACO-7133
Ashley
  • 4,307
  • 2
  • 19
  • 28
  • 1
    Good point about the `[0-9]` matching 0001. I'll re-use the [1-9][0-9] aspect of your regex. Wouldn't your use of `[:punct:]` mean that you would match "ABZ-123-foo"? FYI: JIRA doesn't truncate prefixes though - for example, we have one of out projects with a key of INCIDENT. – DaveG Oct 14 '13 at 09:25
  • JIRA doesn't care about leading zeroes: https://issues.apache.org/jira/browse/CODEC-0000000069 – Julius Musseau May 29 '15 at 00:35