1.How to parse it in d3.js?
D3.js is a JavaScript (look at the "js") library for manipulating documents based on data. So, at the end of the day, D3 is javascript, and there is no "parsing" function for nucleic acid sequences.
Regarding D3 (actually regarding JavaScript), you can deal with the DNA sequence as a string:
"ACATATTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC..."
or as an array:
["A", "C", "A", "T", "A"...]
Or, in a cumbersome way, as an array of objects:
[{position:1, base:"A"}, {position:2, base:"B"}...]
It depends on you. FASTA is text-based, which means we will treat the data as a string (first option).
2.How to extract subsequence at (start, end) location?
As D3 is a javascript library, you'll have to deal with your string using JavaScript methods.
For instance, to find the position of the start (TAC, corresponding to UAG codon) triplet in your sequence, you can use indexOf
:
var sequence = "ACATACTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC";
var start = "TAC";
console.log(sequence.indexOf(start))
(Have in mind that JavaScript, as most of computer languages I'm aware of, is zero based, meaning that the result 2
in the previous snippet shows that the start sequence begins at the third base of your sequence)
Or, to extract the sequence from a given start to a given stop, you can use substring
and indexOf
:
var sequence = "ACATACTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC";
var start = "TAC";
var stop = "GGC"
console.log(sequence.substring(sequence.indexOf(start), sequence.indexOf(stop)+3))
PS: the FASTA file has a header and a bunch of other stuff besides the actual nucleotide sequence. If by "parse" you're talking about extracting just the sequence from a FASTA file, I suggest you post another question, without the tag D3
, with the tag Javascript
and explaining what a FASTA file is.