2

ABC is a music notation; I'm working on patterns to parse it as part of an app.

Sometimes multiple renditions of a tune are in an ABC file, and I need to get just the first rendition -- or in an ideal world any rendition I specify. The beginning of a rendition is signified by the X: string.

It's not possible to know in advance how many renditions are in a file.

In Javascript, how can I return, for example, the first rendition (from the first X: inclusive to the beginning of the second) in the example below, in a way that will return the first if there is no second, and return the first if there are more than two renditions.

My work so far yields ([\s\S]*)(?=X:) which succeeds in the two rendition example, but fails with a single rendition or more than two.

Adding an 'OR'd end of file condition to the lookahead lets the single rendition case work, but fails on the one and three rendition cases, e.g. \([\s\S]*)(?=X:|$)

Any help appreciated ... a good way to parse ABC will be used by many.

A two-rendition example can look like the below -- for a three rendition example just add a line with X: at the end, and for a single chop off everything from the second X:

EDITS: Folks have been kind enough to ask for better examples, and they won't fit in a comment, so here's a few

Broken pledge is interesting because it has more than one ABC and they're not numbered sequentially:

X:56
T:Broken Pledge, The
R:reel
D:De Dannan: Selected Reels and Jigs.
Z:Also played in Edor, see #734
Z:id:hn-reel-56
M:C|
K:Ddor
dcAG ADDB|cAGF ECCE|D2 (3EFG Addc|AcGc Aefe|
dcAG FGAB|c2Bd cAGE|D2 (3EFG AddB|cAGE FDD2:|
|:dcAG Acde|~f3d ecAB|cAGE GAcd|ec~c2 eage|
dcAG Acde|fedf ecAG|~F3G AddB|cAGE FDD2:|
P:"Variations:"
|:dcAG ~A3B|cAGF ECCE|DEFG Addc|(3ABc Gc Aefe|
dcAG FGAB|c2Bd cAGE|DEFG AddB|A2GE FDD2:|
|:dcAG Acde|~f3d ecAB|cAGE GAcd|ec~c2 eage|
dcAG Acde|~f3d ecAG|FEFG AddB|A2GE FDD2:|

X:2
T:Broken Pledge, The
M:C
L:1/8
Q:250
K:D
dcAG A2 dB | cAGF EDC2 | DEFG Ad ~d2 | AcGc Adfe |
dcAG A2 dB | cAGF EDC2 | DEFG Ad ~d2 | AcGc ADD2 :|
|: dcAG A2 de | fedf edAB | cAGE GAcd | ec ~c2 eage |
dcAG A2 de | fedf edcA | F3 E FGAB | cAGE {F}ED D2 :||

Huish the Cat is interesting because it has lots of renditions, all numbered alike. You can see the X:whatever is totally arbitrary:

X:1
T:Huish the Cat
M:6/8
L:1/8
N:”Author and date unknown.”
R:Air
Q:"Quick"
S:Byrne, the harper, 1802
B:Bunting – Ancient Music of Ireland (1840, p. 3)
Z:AK/Fiddler’s Companion
K:C
(G>A).G c2(e|d<).d.A c2z|(G>A).G .c2 d|(ec).A .A2G|
(G>A).G .c2(e|d<).d.A .c2e|(g>f).e .f2d|(ec).A A2G:|
|:(gf).e .f2d|(ed).c .f2d|(gf).e .f2d|(ec).A A2G|
(gf).e .f2d|(ed).c .f2.d|(G>A).G f2d|(ec).A [F2A2]G:|]

X:1
T:Hunt the Cat
M:6/8
L:1/8
R:Jig
Q:”Allegro”
B:William Forde – 300 National Melodies of the British Isles (c. 1841, p.  26, No. 87)
B: https://www.itma.ie/digital-library/text/300-national-melodies-of-the-british-isles.-vol.-3-100.-irish-airs
N:William Forde (c.1795–1850) was a musician, music collector and scholar from County Cork
Z:AK/Fiddler’s Companion
K:D
A>BA d2f|e<eB d3|A>BA d2e|fdB B2A|
A>BA d2f|e<eB d2f|a>gf g2e|fdB B2A:|
|:agf g2e|FED G2E|agf g2e|fdB B2A|
agf g2e|fed g2e|A>BA g2e|fdB B2A:|]

X:1
T:Huish the Cat
M:6/8
L:1/8
R:Jig
Q:"Quick"
B:P.M. Haverty – One Hundred Irish Airs vol. 1 (1858, No. 87, p. 37)
Z:AK/Fiddler’s Companion
K:C
(G>A).G .c2(e|d<).d.A c2z|(G>A).G .c2d|(ec).A .A2G|
(G>A).G .c2(e|d<).d.A .c2|(g>f).e .f2d|(cA).A A2G:|
|:(gf).e .f2d|(ed).c .f2d|(gf).e .f2d|(ec).A A2G|
(gf).e .f2d|(ed).c .f2.d|(G>A).G f2d|(ec).A [F2A2] G:|]

X:1
T:Huish the Cat
M:6/8
L:1/8
R:Single Jig
S:O'Neill - Dance Music of Ireland: 1001 Gems (1907), No. 382
Z:AK/Fiddler's Companion
K:C
G>AG c2e|d<dA c2e|G>AG c2d|ecA A2c|
G>AG c2e|d<dA c2e|g>fe f2d|ecA A2G:|
|:gfe f2d|edc f2d|gfe f2d|ecA A2G|
gfe f2d|edc f2d|G>AG f2d|ecA A2G:||

X:1
T:Hunt the Cat
M:6/8
L:1/8
B:Roche, vol. 3 (1927, p. 114)
K:Ddor
DED D2A|AGE c3|DED D2A|AGE E2D|
DED D2A|AGE c3|ABc d2B AGE E2D:|
|:dcA AGE|AGE c3|dcA AGE|AGE E2D|
dcA AGE|AGE c3|ABc d2c|AGE E2D:||

LowBack car is pretty messy, with per cent signs and the like

X:1
%
T:Lowbacked Car [1], The
M:6/8
L:1/8
R:Air
S:James Goodman (1828─1896) music manuscript collection, 
S:vol. 3, p. 133. Mid-19th century, County Cork
Z:AK/Fiddler’s Companion
K:G
G|G2B B2d|c2A z2F|G2B d2d|d3 z2G|
c2c A2A|B2B G2B|c2A G2F|G3 z2G|
G2c c2e|e2d d2G|G2c c2e|d3 z2G|
G2g !fermata!g2e|e2d dcB|A2G A2B|!fermata!d3 z2A|
GED G2G|G3 z2B|AGE A2A|A3z B/c/|
dcB dcB|gfe !fermata!d2 B/A/|GED G2G|(G3 G2)||
X:1
%
T:Low Backed Car (1)
M:6/8
L:1/8
B:Howe - Musicians's Omnibus No. 2 (p. 107)
Z:AK/Fiddler's Companion
R:G
G|G2B B2d|c3 A2d|G2 B2 d2d|(d3 d2)B|
c2c A2A |B3 G2G A2A F2F|(G3 G2)||d|
d2g g2e|e2d d2B|d2g g2e|(e3 d2)d|
d2g g2e|e2d d2B|BAG A2B|d2c B2A|
.G.E.E .G2G|(G3 G2)B|AGE A2A|A3 ABc|
(.d.c.B) (.d.c.B)|(.a.a.d) .e.d.B|.G.E.D|(G3 G2)|]
X:1
%
T:Low Backed Car [1], The
M:6/8
L:1/8
R:Jig
B:Kerr - Merry Melodies, vol. 2, No. 257  (c. 1880's)
Z:AK/Fiddler's Companion
K:G
D|G2B B2d|d2c A2F|G2B d2d|(d3 d2) B|
cBc A2A|BAB GAB|c2A G2F|(G3 G2):||
B|G2g g2e|e2d d2B|G2g g2e|d3 cBA|
G2g g2e|e2d dcB|A2G A2B|d3 cBA|
GED G2G|(G3 G2)B|AGE A2A|A3 (ABc)|
dcB dcB|Gfe dBA|GED G2G|(G3 G2)||

And Lowbacked Car for 6 is the modal case of a single tune which we need to handle as the most common case:

X:1
T:Jaunting Car for Six
M:9/8
L:1/8
R:Slip Jig
S:Kerr - Merry Melodies, vol. 3, No. 233 (c. 1880's)
Z:AK/Fiddler's Companion
K:A
efe c2c c3|efe cde fga|efe c2c c3|BcB B2c def:|
|:e2a agf ecA|e2a agf e3|e2a agf ecA|BcB B2c def:|| 
rpc
  • 79
  • 7
  • 1
    Try `text.split(/[\r\n]+(?=X:)/)` – Wiktor Stribiżew Sep 17 '21 at 19:58
  • If the beginning of a rendition starts with a `X:` and continues to the beginning of the next `X:` or end of file. To just get the first rendition would be a regex like `X:([\w\W]*?)(?=X:|$)` and group 1 contains the body of it – sln Sep 17 '21 at 20:26
  • @WiktorStribiżew Thanks! Completely forgot about combining split and regex! Works well for multirenditions, but not singles (which can be detected and handled otherwise). ABC is hand-created and line end detects can be problematic ... current best implementation like: `var split_regex = abc_all.trim().split(/(?=X:)/);` – rpc Sep 18 '21 at 16:24
  • Don't forget that the substring `X:` could appear in other places. If you use a regex, it should match `X:` only at the beginning of a line. – Walter Tross Sep 18 '21 at 16:26
  • @WalterTross Maybe I'm mistaken ... I thought the ABC spec was fairly line agnostic except for music formatting purposes, and X:whatever denoted a new rendition regardless of whether at a line beginning. Have you seen otherwise: – rpc Sep 18 '21 at 16:29
  • Suppose the name of the group was "The X" instead of "De Dannan". Then look at the `D:` line. Anyway, no, [the spec](https://web.archive.org/web/20080309023424/http://www.norbeck.nu/abc/abcbnf.htm) is not line agnostic – Walter Tross Sep 18 '21 at 16:32
  • The BNF is explicit about where newlines can appear. Also about whitespace. E.g., no whitespace is allowed anywhere on an `X:` line. – Walter Tross Sep 18 '21 at 16:51
  • Yeah, you're almost certainly right. Though the colon(:) does qualify a line header. But people pretty much seem to do what they want. – rpc Sep 19 '21 at 01:38

3 Answers3

2

This is a complete rewrite of the answer, sorry. The following function returns the info you are currently interested in (it can be extended to return more info, like, e.g., the titles of the renditions as an array sharing indices with the renditions array).

function getAbcInfo(abc) {
    let renditions = ('\n' + abc).split(/[\r\n]+(?=[ \t\u00a0]*X[ \t\u00a0]*:[ \t\u00a0]*\d+)/);
    renditions.push(renditions.pop().replace(/[\r\n]+$/, ''))
    renditions.unshift(renditions.shift().replace(/^[\r\n]+/, ''))
    let x = ['']
    let indicesOfX = {'': [0]}
    for (let i = 1; i < renditions.length; i++) {
        let n = renditions[i].match(/^[ \t\u00a0]*X[ \t\u00a0]*:[ \t\u00a0]*(\d+)/)[1]
        x[i] = n
        if (n in indicesOfX) {
            indicesOfX[n].push(i)
        } else {
            indicesOfX[n] = [i]
        }
    }
    return {renditions: renditions, x: x, indicesOfX: indicesOfX}
}

console.log(JSON.stringify(getAbcInfo(brokenPledge)));
// {"renditions":["","X:56…","X:2…"],"x":["","56","2"],"indicesOfX":{"2":[2],"56":[1],"":[0]}}
console.log(JSON.stringify(getAbcInfo(huishTheCat)));
// {"renditions":["","X:1…","X:1….","X:1…","X:1…","X:1…"],"x":["","1","1","1","1","1"],"indicesOfX":{"1":[1,2,3,4,5],"":[0]}}
console.log(JSON.stringify(getAbcInfo(lowbackedCar)));
// {"renditions":["","X:1…","X:1…","X:1…"],"x":["","1","1","1"],"indicesOfX":{"1":[1,2,3],"":[0]}}
console.log(JSON.stringify(getAbcInfo(commonCase)));
// {"renditions":["","X:1…"],"x":["","1"],"indicesOfX":{"1":[1],"":[0]}}
console.log(JSON.stringify(getAbcInfo(brokenPledgeWithoutTheFirstLine)));
// {"renditions":["T:Broken Pledge…","X:2…"],"x":["","2"],"indicesOfX":{"2":[1],"":[0]}}

The renditions array always contains what precedes the first X: (if any) at index 0. This will normally be the empty string, but it might be a header with fields that the standard allows there, or even a full rendition if its X: line has simply been omitted (against the standard, but humans don't always follow standards).

From index 1 on, the items of renditions are renditions starting with X: (actually whitespace is allowed, see the regex), and with trailing newlines stripped.

The x array shares indices with the renditions array, giving the n of the X:n line of each rendition. Since the “rendition” at index 0 has no X:n line (it's “unnamed”, or rather, “unnumbered”), the x array will always have the empty string at index 0.

The indicesOfX object allows you to get the array of indices in renditions given the n of X:n. In other words, it inverts the key-value relation of the x array.

In case you want to extend the function to add, say, a titles array to the output, don't forget that you can't simply match a T:, because you have to consider whitespace (the regexes I used allow spaces, tabs and non-breaking spaces – don't use \s* because that includes \n), and also because the T: must be preceded by a newline, except for the rendition at index 0, where it can be at the start of the string. The text of the T: ends with a newline ([\r\n]).

BTW, you might want to “normalize” newlines by replacing all \r with nothing, or, if you fear there could be old Mac Classic files around where newlines are just \r, replacing all \r\n with \n, and then all remaining \r with \n. Once you are sure you don't have \r newlines around, you can match the start of a new line AND the start of the string at the same time by using the ^ and the m (multiline) flag.

Walter Tross
  • 12,237
  • 2
  • 40
  • 64
  • I'm testing out the code now, and have arrived at the same conclusion. The issue is that you can't really count on the ordinal number after the 'X:' ... in theory the first rendition is 1, and second 2, but in reality it's a hodgepodge and all you can count on is a sequential set of renditions, each of which has a number after the X: I'm trying variants, for example with an X:132 rendition followed by an X:1, etc. – rpc Sep 18 '21 at 17:48
  • is there some on-line available corpus to test against? – Walter Tross Sep 18 '21 at 18:16
  • ok, I found https://abcnotation.com, and after only a dozen attempts I found, e.g., [this tune](https://abcnotation.com/tunePage?a=ifdo.ca/~seymour/runabc/esac/HAN1/0440) that would make my function, as it is currently, of little use (unless one looped over, say, the 1..1000 range) :-( – Walter Tross Sep 18 '21 at 18:25
  • oh, and of course there's people not following the standard and adding in whitespace, like in [this tune](https://abcnotation.com/tunePage?a=www.folkwiki.se/pub/cache/_LivAntes_Polska_34b173/0001) – Walter Tross Sep 18 '21 at 18:51
  • If you want I can completely rewrite my answer. In that case it would be nice to know more precisely what you would like to achieve in the end. – Walter Tross Sep 18 '21 at 19:01
  • Thanks, Walter. Precisely what I want is to be able to select a rendition from often poorly formatted ABC *most of the time*, and ideally know when I've failed so I can give a message. Kind of imprecise, but as you're seeing, ABC is ... ah ... variable in its real world implementation. I've revised the main post to include three samples I'm using as a set of things I'd like to successfully handle. – rpc Sep 19 '21 at 01:41
  • ok, completely rewritten – Walter Tross Sep 19 '21 at 14:40
  • Thank you so much. Early testing looks great, and returning the entire set of renditions lays the groundwork for future features (rendition choices). It's also helpful to have the ancillary arrays for possible future function. Will finish testing then wrap this up! – rpc Sep 20 '21 at 21:42
  • And many thanks! Incorporated and working great! – rpc Sep 26 '21 at 16:07
1

If you want the nth part from the data starting with X: or from the start of the string, you can use a capture group to capture what you want to keep and use a quantifier to repeat n parts before it as a match.

Pattern in Javascript, to get for example the 3rd part with the quantifier {3}

Javascript does not support possessive quantifiers, but you can mimic it by using a capture group in a lookahead for what you want and then matching it using a backreference as there is no backtracking in a lookahead.

/^(?:(?=([\s\S]+?(?=X:|$)))\1){3}/g

The pattern matches:

  • ^ Start of string
  • (?: Non capture group
    • (?= Positive lookahead
      • ([\s\S]+?(?=X:|$)) Match 1+ times as least as possible characters and assert either X: or the end of the string to the right
    • ) Close lookahead
    • \1 Backreference to match the capture group 1 value
  • ){3} Close the non capture group, and repeat n times, in this case 3 times

Regex demo

If the X: should be after a newline you can prepend a newline before it:

/^(?:(?=([\s\S]+?(?=\nX:|$)))\1){2}/g

Regex demo

The fourth bird
  • 154,723
  • 16
  • 55
  • 70
  • 1
    This looks *very* promising; seems to pass my little test set, with the minor proviso of needing a newline at the end of a singleton. Will pound on it some tomorrow. Thanks - the Regex demo makes testing a breeze! – rpc Sep 19 '21 at 01:47
  • 1
    Excellent solution, opted for the javascript code path for features and flexibility but this worked fine! – rpc Sep 26 '21 at 16:07
0

You tagged abcjs, so I assume you are using that library. If that isn't a correct assumption, please disregard.

There is a function that gives you meta data about your string. You can call:

var tuneBook = new ABCJS.TuneBook(tunebookString)
var arrayOfTunes = tunebook.tunes;
var firstTune = arrayOfTunes[0];

See https://paulrosen.github.io/abcjs/analysis/tune-book.html#number-of-tunes

Paulie
  • 1,940
  • 3
  • 20
  • 34