I want to impute marker classes (either class A or class B), based on proximity of known marker classes. So for example if I know M1 and M4 are class A, then all markers positioned in the map between M1 and M4 can also be classified as A.
If I know marker M4 is class A and its position is chr1 13, and marker M7 is B with position 16, then we can classify all markers with position less than equal to (13+16)/2=14.5 as A and everything between 14.5 and 16 as B on the same chromosome. So M5 is A and M6 can be classified as B.
I have a map of sorted positions of markers
M0 chr1 9
M1 chr1 10
M2 chr1 11
M3 chr1 12
M4 chr1 13
M5 chr1 14
M6 chr1 15
M7 chr1 16
M8 chr2 1
M9 chr2 2
M10 chr2 3
M11 chr2 4
So given a simple backbone of
M1 A
M4 A
M7 B
M8 B
M10 A
I want to impute the rest of the markers on the map, if possible.
So my desired output is
M1 A
M2 A
M3 A
M4 A
M5 A
M6 B
M7 B
M8 B
M9 B
M10 A
I am a biologist trying to learn a little bit of awk, and relaize this maybe just a computational problem and I`m not sure where to get started. Please help. I have access to unix cluster to run awk and perl. Please note, correct imputation can be only done between markers mapped to the same chromosome.