how to read FASTA into dataframe and extract subsequences of FASTA file in d3.js

Question

I have a small fasta file of DNA sequences which looks like this:

sequence 1 >
ACATATTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC

sequence 2 >
CTAACCTCTCCCAGTGTGGAACCTCTATCTCATGAGAAAGCTGGGATGAG

Question

How to parse it in d3.js?
- like calculating the average sequence from like 100 of sequence which stored in fasta format and how to catch it like 2D object in d3.

2.How to extract subsequence at (start, end) location?

Please describe what you want to achieve, what you tried so far and what are exactly your problems. What does that mean "parse it to d3.js" what kind of chart do you want to show with it and so on... — kabaehr, Dec 13 '16 at 13:16
like calculating the average sequence from like 100 of sequence which stored in fasta format and how to catch it like 2D object in d3.can't understand what to do — Desmond_, Dec 13 '16 at 14:35

Gerardo Furtado · Accepted Answer · 2016-12-14T04:53:15.823

1.How to parse it in d3.js?

D3.js is a JavaScript (look at the "js") library for manipulating documents based on data. So, at the end of the day, D3 is javascript, and there is no "parsing" function for nucleic acid sequences.

Regarding D3 (actually regarding JavaScript), you can deal with the DNA sequence as a string:

"ACATATTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC..."

or as an array:

["A", "C", "A", "T", "A"...]

Or, in a cumbersome way, as an array of objects:

[{position:1, base:"A"}, {position:2, base:"B"}...]

It depends on you. FASTA is text-based, which means we will treat the data as a string (first option).

2.How to extract subsequence at (start, end) location?

As D3 is a javascript library, you'll have to deal with your string using JavaScript methods.

For instance, to find the position of the start (TAC, corresponding to UAG codon) triplet in your sequence, you can use indexOf:

var sequence = "ACATACTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC";

var start = "TAC";

console.log(sequence.indexOf(start))

(Have in mind that JavaScript, as most of computer languages I'm aware of, is zero based, meaning that the result 2 in the previous snippet shows that the start sequence begins at the third base of your sequence)

Or, to extract the sequence from a given start to a given stop, you can use substring and indexOf:

var sequence = "ACATACTGGAGGCCGAAACAATGAGGCGTGATCAACTCAGTATATCAC";

var start = "TAC";

var stop = "GGC"

console.log(sequence.substring(sequence.indexOf(start), sequence.indexOf(stop)+3))

PS: the FASTA file has a header and a bunch of other stuff besides the actual nucleotide sequence. If by "parse" you're talking about extracting just the sequence from a FASTA file, I suggest you post another question, without the tag D3, with the tag Javascript and explaining what a FASTA file is.

http://stackoverflow.com/questions/31265282/how-to-randomly-extract-fasta-sequences-using-python?rq=1 i want to do this just in javascript @Gerardo — Desmond_, Dec 14 '16 at 16:47

how to read FASTA into dataframe and extract subsequences of FASTA file in d3.js

1 Answers1