How to convert the large bio-format file to database-like file that can be asynchronously accessed width JavaScript

Question

Pros!

I have a visualization project that render the biological data to canvas charts, in which I use a javascritp framwork called jgv.js(the doc API) to generate canvas.

Here’s a simple config demo:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <title>IGV Data Vis</title>
    <link rel="stylesheet" href="source/jquery-ui.css">
    <link rel="stylesheet" href="source/font-awesome.min.css">
    <link rel="stylesheet" href="source/igv-1.0.1.css">
    <script src="source/jquery.min.js"></script>
    <script src="source/jquery-ui.min.js"></script>
    <script src="source/igv-1.0.1.js"></script>
</head>
<body>
    <div id="container"></div>

    <script>
        let options = {
                palette: ["#00A0B0", "#6A4A3C", "#CC333F", "#EB6841"],
                locus: "7:55,085,725-55,276,031",

                reference: {
                    id: "hg19",
                    fastaURL: "//igv.broadinstitute.org/genomes/seq/1kg_v37/human_g1k_v37_decoy.fasta",
                    cytobandURL: "//igv.broadinstitute.org/genomes/seq/b37/b37_cytoband.txt"
                },

                trackDefaults: {
                    bam: {
                        coverageThreshold: 0.2,
                        coverageQualityWeight: true
                    }
                },

                tracks: [
                    {
                        name: "Genes",
                        url: "//igv.broadinstitute.org/annotations/hg19/genes/gencode.v18.collapsed.bed",
                        index: "//igv.broadinstitute.org/annotations/hg19/genes/gencode.v18.collapsed.bed.idx",
                        displayMode: "EXPANDED",
                        height: 350,
                        color: '#ff0000'
                    }
                ]
            };

        let browser = igv.createBrowser(document.getElementById('container'), options);
    </script>
</body>
</html>

The items of tracks in the code above are bio-information statments that could be in the form of plain-text file or binary file(*.bam).

The problem is the bio files are so terible large that I can not access them directly, no mention for the clients. Such as:

.bam approximate 3G
.vcf approximate 1G

So, is there any back-end solutions to make those files accessable piece by piece? Just like the way of AJAX.

Any suggestions will be appreciated!

What do you have in those files? All the base pairs of the human genome? — Gerardo Furtado, Mar 09 '17 at 09:53

Pierre · Accepted Answer · 2017-03-10T10:29:13.080

Depends of what you mean by 'piece by piece'.

Bam and vcf files use a bgzip format that can be accessed using random access. Even through the web has long as the hosting server supports the 'Byte-Range:' request.

$ tabix "http://igv.broadinstitute.org/annotations/hg19/genes/gencode.v18.collapsed.bed.gz" "1:40723778-40759856"

1   40723778    40759856    ZMPSTE24    1000.0  +   40723778    40759856    .   17  288,159,156,183,147,72,87,51,117,153,142,185,105,353,144,1740,177,  0,129,132,1243,2732,4727,9679,9679,10312,11868,13787,23236,27818,32538,32747,34338,34338,
1   40728343    40728656    RP1-39G22.4 1000.0  -   40728343    40728656    .   1   313,    0,

For bioinformatics, you can also ask biostars.org

score 0 · Answer 2 · answered Mar 09 '17 at 09:58

Too broad question. There are many ways to get a file by pieces. Php has a lot of functions to deal with files like fseek (doc) or fgets. You'd better not transfer 3G of data to user, but do the calculations needed at your back-end.

Using any image library (gd2?) you can make the image base on the genome file on your server. No need to transfer a huge amount of data to a client.

score 0 · Answer 3 · answered Mar 09 '17 at 09:58

0

yes. format bam dispicts the whole genome reads alignment details, so it is very large. format vcf dispicts the whole genome SNP infos and thier respective annotations

answered Mar 09 '17 at 09:58

Fiona Kim

1

How to convert the large bio-format file to database-like file that can be asynchronously accessed width JavaScript

3 Answers3