Questions tagged [genome]

Genome is the entirety of an organism's DNA sequence. The genome includes both the genes and the non-coding sequences, such as repeats, introns and regulatory sequences, possessing both known and unknown function.

230 questions
2
votes
1 answer

Python Regex to Extract Genome Sequence

I’m trying to use a Python Regular Expression to extract a genome sequence from a genome database; I’ve pasted a snippet of the database below. >GSVIVT01031739001 pacid=17837850 polypeptide=GSVIVT01031739001 locus=GSVIVG01031739001…
MojaveAzure
  • 21
  • 1
  • 3
2
votes
1 answer

What is the syntax to instantiate a structured dtype in numpy?

If I have a dtype like foo = dtype([('chrom1', '
traeki
  • 33
  • 6
1
vote
2 answers

Can I parse hg19.2bit with php?

I know this is possibly an obscure use for php, but I'm working on an idea to navigate the human genome in a rather interesting way. The problem is I need to know if I can write a php script to parse the freely available data, and if so how would I…
T9b
  • 3,312
  • 5
  • 31
  • 50
1
vote
1 answer

Nextflow No such variable: id

I'm trying to perform my first code with Next-flow, im introducing 2 paired reads and I want to execute the bbduk function. I don't know why my code didn't works. I tryed the following code: #!/usr/bin/env nextflow /* * Pipeline Metagenomics,…
1
vote
0 answers

Generate Random Permutations of Genomic Ranges using Nullranges (matchedranges or bootranges)

I want to generate 200 random genomicranges that are 200kbp long each that can occur anywhere in the genome. I was recommended to try using nullranges, but I haven't figured out how to specify only generating 200 ranges / iteration. I think it takes…
erman
  • 11
  • 1
1
vote
1 answer

Why is my SPAdes not working on Nextflow?

The SPAdes is not working on my Nextflow for some reason, I already have it installed. I used the following code, but it doesn't seem to work. Can anyone please help point out where the problem is? #!/usr/bin/env…
Terra
  • 21
  • 1
1
vote
1 answer

ggplot: Any way to only draw x axis border starting from 0?

I'm trying to add a border to the axis of this ggplot but it extends past the 0Mb mark and I would like it to start there. Is there a way to start it at 0 or have it covered up by a white line in the negative direction so that it doesn't show? I…
1
vote
1 answer

Retrieve mRNA sequence based on DNA coordinates

I have a list of genome DNA coordinates (hg38), I want to retrieve corresponding mRNA sequence 200bp up/downstream of these coordinates’ positions, and idea? Thank you. I have tried table browser, easy to get all codon sequence based on coordinates,…
1
vote
1 answer

How do I rewrite this expected depth of (genome) coverage function in R?

I need to draw the probability density for a random position for Length of fragment = 600, Genome size = 3 × 109, and Number of reads = 10 million reads depth_of_coverage <- function(genome = 3E9, fragment_length = 600, reads = 10E6) { depth <- 0 …
ibnadam
  • 13
  • 2
1
vote
1 answer

How to determine characteristics for a genome?

In AI, are there any simple and/or very visual examples of how one could implement a genome into a simulation? Basically, I'm after a simple walkthrough (not a tutorial, but rather something of a summarizing nature) which details how to implement a…
1
vote
0 answers

MuscleCommandLine non-zero return code 1/is not recognized as an internal or external command,

I am trying to align 4 difference sequences using MuscleCommandLine. This code works perfectly on Anaconda and Mac but I am trying to make it work on Windows and I am having several issues. muscle_exe = r'../muscle3.8.31_i86darwin64.exe' in_file =…
1
vote
1 answer

Tab file mix up column when loading into R

I am trying to load data into R, but some row does not work well. I got this issue a lot of time, but when I load them in excel, it works well. Please help me if you know the reason. Thank you very much! library(RCurl) URL <-…
1
vote
0 answers

How to setup a Seurat object from gz file?

I am trying to follow the Seurat tutorial found here: https://satijalab.org/seurat/articles/pbmc3k_tutorial.html The PBMC raw data from the tutorial downloads to my computer as: pbmc3k_filtered_gene_bc_matrices.tar.gz I am having trouble uploading…
1
vote
3 answers

Can anyone tell me how to replace strings with floats in an np.array(of several genotypes) by frequence per column?

I have a np.array matrix(1826*5000) where the rows are my samples and the columns are the features. That means I have a genotype in each line with the individual nucleotides as a string. like this: [['G' 'G' 'G' ... 'T' 'T' 'A'] ['G' 'G' 'G' ...…
1
vote
1 answer

How to optimize my FASTA parser Python script in order to make it runs faster on slurm?

I hope I post on the right place ? My script is running fine on little genomes but it take hours and days when it comes to work with mammal genomes. I tried many different things but Im out of idea. Can you tell me what cause this script to be so…
1 2
3
15 16