2

I want to add a new lookup list to the ANNIE Gazetteer. Some of the words which shall be found contain colons; and a feature in the annotations (which are defined in the new mylookup.list) contain also colons, for example:

mylookup.lst:
Star Wars:Episode I:url=http://example.com

So it shall find the String Star Wars:Episode I and annotate it with url=http://example.com

I already tried to "escape" the colons in my mylookup.lst with \, but it didn't work. Because I also want to use the other default lookup lists (which are all colon-separated) I can't just define an other separator. So how can I tell the gazetteer to lookup and annotate words that contain colons?

Munchkin
  • 4,528
  • 7
  • 45
  • 93

1 Answers1

2

As far as I know there is no support for escaping separator characters in .lst files. You have to choose other separator character. I recommend the tab character: \t

In that case you cannot use the default (colon-separated) lookup lists in the same gazetteer PR. But you can use two separate gazetteer PRs in your pipeline. One for the default lookup lists and a second one for the new lists with the different separator.

dedek
  • 7,981
  • 3
  • 38
  • 68