I'm writing a syntax-highlighting file for all .bed files. The exact content of each column may vary and generally looks like below
chr1 11873 14409 uc001aaa.3 0 + 11873 11873 0 3 354,109,1189, 0,739,1347,
chr21 1000000 1230000 peakValue 200 -
chrX 11873 14409 selection
....
<string> <numeric> <numeric> <string> <numeric 1-1000> <+ or - or .> <numeric> <numeric> <numeric> <numeric> <comma separated list> <comma separated list>
So far I have first column selection and strand working:
bed.lang
<?xml version="1.0" encoding="UTF-8"?>
<language id="bed" _name="Bed" version="2.0" _section="Scientific">
<metadata>
<property name="mimetypes">text/bed</property>
<property name="globs">*.bed</property>
</metadata>
<styles>
<style id="chrom" _name="Chrom" map-to="bed:chr" />
<style id="strand" _name="Coords" map-to="bed:strand" />
</styles>
<definitions>
<context id="bed">
<include>
<context id="1_chr" style-ref="chrom">
<match extended="true">
^\w+
</match>
</context>
<context id="6_strand" style-ref="strand">
<match extended="true">
\t[+\-\.]\t
</match>
</context>
</include>
</context>
</definitions>
</language>
I'd like to extend this so each column is formatted differently based on a scheme I can define. i.e. coordinates are one color, names are another, scores are another color. The problem is that things like coordinates and scores are all numeric strings.
The 'simplest' solution I can see is a regex expression which can select columns, and if the selection is greater then the number of columns returns nothing (does not wrap around lines).
Backsearching doesn't seem to work (because of the '>' character in the regex expression. Some regex I've tried but that don't behave nicely are:
Building up Iterative matches and formatting each differently doesn't work. Multiple selections of the same string causes all syntax highlighting to fail.
^.+?\t ^.+?\t.+?\t ^.+?\t.+?\t.+?\t ...
Selecting 'Numeric Strings'
Single numeric string (?<=^\w\t)[0-9]+(?=\t) Numeric string doublets (?<=\t)[0-9]+\t[0-9]+(?=\t){1}
I'll be continuing to hack together an ugly solution but I was wondering if there was something elegant I'm not thinking of.