Project:
Take Wikipedia's list of Roman consuls, put the data in a CSV so I can make a graph of the rise and fall of various gens in terms of consulage
Example data source:
509,L. Iunius Brutus,L. Tarquinius Collatinus
suff.,Sp. Lucretius Tricipitinus,P. Valerius Poplicola
suff.,M. Horatius Pulvillus,
508,P. Valerius Poplicola II,T. Lucretius Tricipitinus
507,P. Valerius Poplicola III,M. Horatius Pulvillus II
Vim search:
/\v(\d+|suff\.),((\w+\.=) (\w+)(\s\w+)=(\s\w+)=(\s[iv]+)=(\s\(.{-}\))=,=){,2}
So essentially:
- Find the year at the beginning (or indication of suffect consul):
(\d+|suff\.)
- The next grouping (let's call it the outer group) needs to be found up to two times:
(outer group){,2}
- For each of these two outer groups, find:
- Praenomen, with optional period (sometimes this isn't present):
(\w+.=)
- Nomen:
(\w+)
- Optional cognomen (includes space, as do all below):
(\s\w+)=
- Optional agnomen:
(\s\w+)=
- Optional iteration (indicates the nth time he's been consul). Data source does not have more than 8 iterations (so I and V will suffice):
(\s[iv]+)=
- Optional explanatory note like "Sicinius (Sabinus?)":
(\s\(.{-}\))=
- Praenomen, with optional period (sometimes this isn't present):
(Last comma is optional since it's the end of the row.)
So the back references turn out to be:
\1: year or suffect
\2: the entire second outer group
\3: Praenomen of second outer group (same with all below)
\4: Nomen
\5: Cognomen
\6: Agnomen
\7: Iteration
\8: Explanatory note
The problem is I can't figure out how to capture that first outer group. It's like the \2 and \3-\8 references get overwritten when it sees that second outer group.
Using this replace:
:%s//1:{\1}^I2:{\2}^I3:{\3}^I4:{\4}^I5:{\5}^I6:{\6}^I7:{\7}^I8:{\8}^I9:{\9}
I get this output:
1:{509} 2:{L. Tarquinius Collatinus} 3:{L.} 4:{Tarquinius} 5:{ Collatinus} 6:{} 7:{} 8:{} 9:{}
1:{suff.} 2:{P. Valerius Poplicola} 3:{P.} 4:{Valerius} 5:{ Poplicola} 6:{} 7:{} 8:{} 9:{}
1:{suff.} 2:{M. Horatius Pulvillus,} 3:{M.} 4:{Horatius} 5:{ Pulvillus} 6:{} 7:{} 8:{} 9:{}
1:{508} 2:{T. Lucretius Tricipitinus} 3:{T.} 4:{Lucretius} 5:{ Tricipitinus} 6:{ II} 7:{} 8:{} 9:{}
1:{507} 2:{M. Horatius Pulvillus II} 3:{M.} 4:{Horatius} 5:{ Pulvillus} 6:{ II} 7:{} 8:{} 9:{}
I can't access those groups within the first outer group. I think they're being overwritten: are they being overwritten? If so, is there a way around this?
Edit: Original title Vim regex (or any compatible regex): how to reference a group (within a group) if the outer group is iterated?