General feature format is a file format used for describing genes and other features of DNA, RNA and protein sequences.
Questions tagged [gff]
26 questions
0
votes
1 answer
How to convert Ensembl .gff3 to 12-column .bed
I am trying to use the geneBody_coverage.py script from RSeQC, which requires a tab-separated 12-column .bed file as a reference. To do so, I used gff2bed script to convert a .gff3 file from Ensembl to a .bed format. When I run it, I only get errors…

Dysnomia
- 21
- 1
0
votes
0 answers
What kind of error should be checked by a validator while validating biological file formats like GFF and FASTA
I'm working on a project to create a library(in Java) that can validate various biological file formats like GFF, FASTA, OBO etc.
But as I'm not from this field, So I'm little confused about what kind of validation should be performed by the…

Deepak Singh
- 1
- 1
0
votes
1 answer
How can I find the number of the first base of a gene in a FASTA file?
In order to manually modify a .gff file I have, I need to find the start position of my gene in the FASTA-formatted genome of my animal (i.e. what # base is it in the sequence?). I have the sequence of this gene.
How do I do this as easily as…

kdickson
- 23
- 4
0
votes
0 answers
Convert String array to JSONArray
I am trying to parse a GFF file and search for a particular gene ID, and if found, convert the whole row of that gene ID into an element of a JSONArray.
However, when I do the above, the column headers are not present in the array:
try{
…

biocode
- 99
- 1
- 8
0
votes
1 answer
using awk to extract a specific pattern
I explain my problem.
I have a huge file in gff format such that:
scaffold_31 AUGUSTUS CDS 18857 19210 0.63 + 0 transcript_id "g56.t1"; gene_id "g56";
scaffold_32 AUGUSTUS CDS 8973 9290 0.82 - 0 transcript_id "g57.t1";…

Grendel
- 555
- 1
- 4
- 11
0
votes
0 answers
How to replace a string present in file A and the first column of file B and replace by corresponding string in the second column of file B?
File A (tab-delimited, 10 columns):
chrI DBVPG6765 gene 7249 9030 . - . ID=01G00030;Name=YAL067WchrI DBVPG6765 mRNA 7249 9030 . - . ID=01T00030.1;Parent=01G00030chrI DBVPG6765 exon 7249 9030 . - …

mcbioinfo
- 1
- 2
0
votes
1 answer
Replace multiple lines in one file with the same lines at the same line numbers in another file?
I have a modified gff file, and it is missing some lines that are present in the original gff file. I want to add those back in.
i.e.,
original gff file with extra lines "# Fasta ..." and "##sequence-region" included prior to each new…

Emily Giroux
- 3
- 1
0
votes
1 answer
Renaming Name ID in gffile.
I have a gff file looks like this:
contig1 loci gene 452050 453069 15 - . ID=dd_g4_1G94;
contig1 loci mRNA 452050 453069 14 - . ID=dd_g4_1G94.1;Parent=dd_g4_1G94
contig1 loci exon 452050 452543 . - . …

Alex Trevylan
- 517
- 7
- 17
-1
votes
3 answers
sed using while loop is very slow
I have gff file, the contents are like the following (tab separated):
# start gene 1Chr.g1
1Chr AUGUSTUS gene 3636 5916 0.1 + . ID=1Chr.g1
1Chr AUGUSTUS transcript 3636 5916 0.1 + . …

Mendel
- 65
- 6
-1
votes
1 answer
parsing using multiple parameters - Awk
I having trouble parsing out a GFF file. I am using the code below as a one liner. I am obtaining an output filtered based on column 1 ($1) but when I add the additional filter of greater than 5000 but less than 150000, awk does not filter out my…

serious
- 147
- 7
-2
votes
1 answer
AWK replace full string in TABLE2 according to TABLE1
I have TABLE1 where first column is a string which should be replaced in the TABLE2 and second column in the TABLE1 is the value which should replace the string.
TABLE1 looks as this:
g63. MYL9
g5990. PTC7
g6018. POLYUBQ
g17850. NAA50
Table 2 looks…

Martin Kovář
- 1
- 1