Questions tagged [genome]

Genome is the entirety of an organism's DNA sequence. The genome includes both the genes and the non-coding sequences, such as repeats, introns and regulatory sequences, possessing both known and unknown function.

230 questions
3
votes
2 answers

Organizing the output of my shell script into tables within the text file

I am working with a unix shell script that does genome construction then creates a phylogeny. Depending on the genome assembler you use, the final output (the phylogeny) may change. I wish to compare the effects of using various genome assemblers. I…
dnic2693
  • 63
  • 1
  • 1
  • 8
3
votes
3 answers

Common genomic intervals in R

I would like to infer shared genomic interval between different samples. My input: sample chr start end NE001 1 100 200 NE001 2 100 200 NE002 1 50 150 NE002 2 50 150 NE003 2 250 300 My expected…
user3091668
  • 2,230
  • 6
  • 25
  • 42
2
votes
2 answers

Finding genome coverage using random reads

Thank you for looking at my question. I am trying to solve this homework question. Consider the problem of sequencing genome by random reads. If G is the length of the entire sequence, L is the length of the read and n is the number of reads,…
smandape
  • 1,033
  • 2
  • 14
  • 31
2
votes
1 answer

Interactive Manhattan plot with string chromosome names

I'm trying to generate ManhattanPlot using Dash-plotly library for python: https://dash.plotly.com/dash-bio/manhattanplot I have SNP results data for plants like wheat which have chromosome names with letters e.g. 3A, 3B, 3D. Is is possible to…
MLearner
  • 63
  • 7
2
votes
1 answer

How do I run this Bismark Bisulfite Sequencing program?

I am very new to coding so I'm not really sure how to approach this. I wanted to look at some data that we got and sequence them using Bismark. I already used Trim Galore to pare the reads, now I wanted to get the data into Bismark. However, I'm not…
2
votes
1 answer

R hypergeometric test between a character vector and a list, calculating p values in a loop

I'm trying to write a code myself to run the hypergeometric test in R using phyper. I have a character vector of upregulated genes: (or these are "red" balls I pulled out from my urn) gene.up <- c("A", "B", "C", "D") I also have a character vector…
Jen
  • 331
  • 2
  • 11
2
votes
0 answers

How to read a file (not a .csv file) in S3 into rstudio and AWS

The file is about 45 GB and ends with ".gds" (Genomic Data Structure (GDS) Files). How to read it into rstudio and aws so that I can run statistical analysis on rstudio cloud? I…
Jason
  • 59
  • 8
2
votes
0 answers

How to fix "bowtie2 died with signal 9 (KILL) error"

I am working on assembling a transcriptome of the terminal ganglion of the cricket (Gryllus bimaculatus). We are working on a non-model species and thus did not remove ribosomal contamination during library prep. We want to remove ribosomal…
2
votes
0 answers

Extract and organize automatically KEGG annotation results into Excel

I have launched a query with amino acid sequences on "KAAS - KEGG Automatic Annotation Server". I have then downloaded the results file called "myfile.keg". A small example file that shows how it looks like can be dowloaded at:…
SkyR
  • 185
  • 1
  • 9
2
votes
0 answers

Invalid command name "tk_chooseDirectory" error

I am using bioconductor for WES pipeline and I am using tk_choose.dir for selection of directory (and store it for further use) where user has stored input files. Here the command lines library(tcltk) dataDir <- dirname(tk_choose.dir(default = " ",…
Lot_to_learn
  • 590
  • 2
  • 9
  • 21
2
votes
0 answers

Graph Representation of N Genomes

I have n sequences each of 3 billion length (Human Genome). I am looking for efficient way to store/represent these n strings. One natural way that I can think of is graphs, where nodes can store common sub-strings among these sequences and directed…
Raghu
  • 21
  • 2
2
votes
2 answers

How to calculate one frequency matrix from an entire genome file?

So, I'm simply trying to calculate single nucleotide frequencies(A, T, C, G) in a HUGE file that contains pattern similar to this: TTTGTATAAGAAAAAATAGG. That would give me one line of output of the entire file such as: The single nucleotide…
Chelsey W
  • 31
  • 3
2
votes
2 answers

Gnome glib status for Windows/OSX/Unix-like and binaries

I am trying to understand which is the current situation of glib regarding Windows, Unix-Like (not necessary Linux) and OSX. I am analyzing if I could use glib for a project and I will need all those OS working. I am searching the binaries of…
Mariano Martinez Peck
  • 473
  • 1
  • 12
  • 25
2
votes
4 answers

How to load GEO methylation (450k) datasets without sample sheet provided?

I downloaded some Illumina 450k methylation datasets from Gene Expression Omnibus (GEO) The R Bioconductor packages minfi and ChAMP seem to require something called a "sample sheet" Most TAR files on GEO do not seem to contain such a sample sheet -…
2
votes
1 answer

Estimating distance difference between rows (genetic markers)

I would like to calculate the distance between the markers (Name) in a given chromosome (Chr). The objects dist1.alldown (distance downstream) and dist1.allup (distance upstream) have exactly what I want. However, the below script is computationally…
user2120870
  • 869
  • 4
  • 16
1
2
3
15 16