I have association mapping derived P-values for SNPs that are scattered across thousands of scaffolds in a non-model organism. I would like plot the P-value of each SNP on a Manhattan-style plot. I do not care about the order of the scaffolds, but I would like to retain the relative order and spacing of SNP positions on their respective scaffolds. I simply want to visualize roughly how many genomic regions are significantly associated with a phenotype. For example:
My data looks something like this:
SCAFFOLD POSITION
1 8967
1 8986
1 9002
1 9025
1 9064
2 60995
2 61091
2 61642
2 61898
2 61921
2 62034
2 62133
2 62202
2 62219
2 62220
3 731894
3 731907
3 731962
3 731999
3 732000
3 732050
3 732076
3 732097
I would like to write a perl code to create a third column that retains the distance between SNPs on the same scaffold, while arbitrarily spacing scaffolds by some number (100 in the following example):
SCAFFOLD POSITION CONTINUOUS_AXIS
1 8967 8967
1 8986 8986
1 9002 9002
1 9025 9025
1 9064 9064
2 60995 9164
2 61091 9260
2 61642 9811
2 61898 10067
2 61921 10090
2 62034 10203
2 62133 10302
2 62202 10371
2 62219 10388
2 62220 10389
3 731894 10489
3 731907 10502
3 731962 10557
3 731999 10594
3 732000 10595
3 732050 10645
3 732076 10671
3 732097 10692
Thank you to anyone who might have a good strategy.