Questions tagged [genome]

Genome is the entirety of an organism's DNA sequence. The genome includes both the genes and the non-coding sequences, such as repeats, introns and regulatory sequences, possessing both known and unknown function.

230 questions
0
votes
1 answer

Compiling issue with Recursive Decent parser (c++)

I am having compiling issues with my c++ program, it is a recursive decent parser, these are the rules: slist :: = stmt slist | stmt stmt ::= decl | assign | print decl ::= INT ID SC | FLOAT ID SC print ::= PRINT expr SC assign ::= ID EQUAL expr…
octain
  • 964
  • 3
  • 10
  • 24
0
votes
1 answer

R Genome Alignment Viewer

Currently, I have read in a genbank ptt file and used it to plot a genome in R using genoplotR plot_gene_map(dna_segs=list(mo),xlims=xlims,annotations=annotMED,annotation_height=5,main="Region",gene_type="side_blocks",dna_seg_scale=TRUE,…
0
votes
2 answers

How to screen genomes for compositional studies?

I am working with around 2600+ genomes and wish to study the genome, gene and intergenic features among various groups. In case of taxonomical groups which have very few representatives, there is no issue. In case of taxonomical groups having…
SRKR
  • 33
  • 8
0
votes
2 answers

How to change long gene names to abbreviated in some automatic way (microarray data processing)?

Is there any automatic way to convert a list of long gene names (like Cadherin_3453) to its abbreviations, like CDHRN_3453? Are there any abbreviation name convention in Genomics, Bioinformatics? Sorry, no code herein
KvasDub
  • 281
  • 7
  • 16
0
votes
1 answer

How do I shorten a genome sequence to secure my workflow is properly functioning?

I am Moritz from the University Heidelberg in Germany. For my bachelor thesis I have 20 large (25-30 GB) genome files (.txt.gz) by patients with hepatocellular carcinoma. I have Bpipe installed on my Ubuntu server, which I have got to try out…
moritz
  • 1
  • 1
0
votes
1 answer

Algorithm or requirement behind Bowtie?

BOWTIE maps test reads to a reference genome. Basically it's a String comparison. Reference String could be a million base pair made by combinations of A C T & G, also the test reads, Now what is the criteria to call a test read as match, mutation,…
user2458922
  • 1,691
  • 1
  • 17
  • 37
0
votes
1 answer

skipping first half of a 59GB fastq file to process last half: read line-by-line, or fgetpos?

I have 2 ~59GB text files in ".fastq" format. fastq files are genomics read files from a sequencer. Every 4 lines is a new read, but the lines are of variable size. The filesize is roughly 59GB, and there are about 211M reads-- which means, give…
HodorTheCoder
  • 254
  • 2
  • 11
-1
votes
0 answers

How to determine whether shotgun sequences from multiple samples aligned to homologous regions of a reference genome?

I have a pilot dataset of 4 samples. Data were generated using shotgun sequencing of fecal-derived DNA. I have aligned the samples to a reference genome using bwa and samtoools. **I know what percent of the reads mapped to the reference for each…
-1
votes
4 answers

Renaming variant column

I have a large file with rsIDs in the 2nd field. Some variants are in this format: chr1-97981343:rs55886062-AT Using bash commands, how can I replace these identifiers to just print the rsID (e.g. rs55886062)? Toy data set: 1 rs3918290 110…
Svalf
  • 109
  • 4
-1
votes
1 answer

BWA-mem and sambamba read group line error

This is a two-part question: help interpreting an error; help with coding. I'm trying to run bwa-mem and sambamba to aling raw reads to a reference genome and to sort by position. These are the commands I'm using: bwa mem \ -K 100000000 -v 3…
-1
votes
1 answer

removing text part between 2 symbols

I have this type of text file (also header, starting with ##, but not shown) #CHROM POS ID REF ALT QUAL FILTER INFO chr1 69511 rs2691305 A G . PASS …
-1
votes
2 answers

How to filter for only one of many intersecting ranges

As part of a much longer and complex query, I am trying to keep only one entry for overlapping intervals, and all entries which do not overlap. Here is a minimal example: create table protein ( seqid varchar(100), start SMALLINT(5), …
fridaymeetssunday
  • 1,118
  • 1
  • 21
  • 31
-1
votes
1 answer

How to make rownames in two separate data.frames the same?

Please see the picture attached! I am trying to conduct a RDA analysis, but before I proceed I need to make sure my SNP dataset and my Environmental dataset have identical rownames. I tried editing the individual datasets in excel to satisfy this…
-1
votes
2 answers

AnnotationHub Resource | Failed to load Resource | Download Issue in R

When I run Command ah[[1]] I faced the following Problem: retrieving 1 resource Error: failed to load resource name: AH5087 title: ORegAnno reason: 1 resources failed to download In addition: Warning messages: 1: download failed web resource…
-1
votes
1 answer

Use XPATH to obtain value from a large NCBI XML file

I am new to R. I have downloaded the XML with all Bioprojects from the NCBI. The file is 1GB in size. I started with this: setwd("C://Users/USER/Desktop/") xmlfile = xmlParse("bioproject.xml") root = xmlRoot(xmlfile) xmlName(root) [1]…