0

I'm working on a project to create a library(in Java) that can validate various biological file formats like GFF, FASTA, OBO etc.

But as I'm not from this field, So I'm little confused about what kind of validation should be performed by the validator program.

There are some online tools like Genome Tools that can validate GFF file format, So can anyone help me understand what kind of validation rules should be applied on easy of these files.

  • There are two kinds of validations: format validations and data validations. First of all, you should search for the documentation of those formats and make sure the files comply with them (format validation). Second, you should check for the validity of the contents. In that case, better ask a bioinformatic or biologist about which are the suitable contents (like DNA must be formed with [ATGC], or the ranges of the exons must be in the ranges of the chromosome). The second part is the difficult one. – Poshi Mar 22 '19 at 10:22
  • @Posh thanks for your response, yes I really want information about the second type(validity of content), Do any know any resource where I can look for this kind of Information Or correct place to ask this question? – Deepak Singh Mar 22 '19 at 17:25
  • It all depends on the kind of file you are processing. The best source, your local bioinformatic. You can read biology books, but there are some details that are not covered because are related to technical issues. – Poshi Mar 22 '19 at 17:27

0 Answers0