How to find the occurrence of k-mers in a list of DNA sequences

Question

I want to scan a list of DNA sequences against a list of given k-mers; each element on the k-mer list is a set of similar k-mers of equal length, they look like

myKmer1=c("TATGGGTTT", "TAAGGGTTT", ...,"CAAGGGTTT")

...

myKmer10=c("GGATTCCAG","CCATTCTTT",..., "CGATTCCTT")

What software/ R-script are available to attain the occurrences of list of k-mers on each sequence--the outcome should be a table looks like:

k-mers occurrence table1: showing the counts of k-mer in the sequences

myKmer1 myKmer2 ...myKmer10

seq1 2 0 3

seq2 1 3 0

...

seq1000 0 1 0

k-mers occurrence table2: showing the location of k-mer in the sequences

myKmer1 myKmer2 ...myKmer10

seq1 111, 888 0 123,456,3333

seq2 123 111,223,333 0

...

seq1000 0 1234 0

You might get a better response for this type of question on Biostars — blJOg, Nov 18 '13 at 10:39

score 1 · Answer 1 · answered Feb 05 '14 at 05:06

If the kmers that you are looking for are the same length then you could use Jellyfish with the dump subcommand to give you the counts of all kmers of length k. You could then parse the output for your specific kmers. See also the Jellyfish user guide.

How to find the occurrence of k-mers in a list of DNA sequences

1 Answers1