I'm trying to parse a categories file with PEG.js
How can I group categories (set of non-empty lines followed by a blank line)
stopwords:fr:aux,au,de,le,du,la,a,et,avec
synonyms:en:flavoured, flavored
synonyms:en:sorbets, sherbets
en:Artisan products
fr:Produits artisanaux
< en:Artisan products
fr:Gressins artisanaux
en:Baby foods
fr:Aliments pour bébé, aliment pour bébé, alimentation pour bébé, aliment bébé, alimentation bébé, aliments bébé
< en:Baby foods
fr:Céréales pour bébé, céréales bébé
< en:Whisky
fr:Whisky écossais
es:Whiskies escoceses
wikipediacategory:Q8718387
For now I can parse line by line with this code:
start = stopwords* synonyms* category+
language_and_words = l:[^:]+ ":" w:[^\n]+ {return {language: l.join(''), words: w.join('')};}
stopwords = "stopwords:" w:language_and_words "\n"+ {return {stopwords: w};}
synonyms = "synonyms:" w:language_and_words "\n"+ {return {synonyms: w};}
category_line = "< "? w:language_and_words "\n"+ {return w;}
category = c:category_line+ {return c;}
I got:
{
"language": "en",
"words": "Artisan products"
},
{
"language": "fr",
"words": "Produits artisanaux"
}
but I want (for each group):
{
{
"language": "en",
"words": "Artisan products"
},
{
"language": "fr",
"words": "Produits artisanaux"
}
}
I tried this too, but it doesn't group and I got \n at the beginning of some lines.
category_line = "< "? w:language_and_words "\n" {return w;}
category = c:category_line+ "\n" {return c;}