2

I have launched a query with amino acid sequences on "KAAS - KEGG Automatic Annotation Server".

I have then downloaded the results file called "myfile.keg". A small example file that shows how it looks like can be dowloaded at: https://www.dropbox.com/s/ixf0091z5q3cx9z/myfile.keg?dl=0

+D  KO
#<h2><a href="/kegg/kegg2.html"><img src="/Fig/bget/kegg3.gif" align="middle" border=0></a> &nbsp; KEGG Orthology (KO)</h2> 75prot_protdiff_GD_5h
!
A<b>Metabolism</b>
B
B  <b>Carbohydrate metabolism</b>
C    00010 Glycolysis / Gluconeogenesis [PATH:ko00010]
D      MYGENEACCESSION01; K01623  ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
C    00020 Citrate cycle (TCA cycle) [PATH:ko00020]
C    00030 Pentose phosphate pathway [PATH:ko00030]
D      MYGENEACCESSION02; K01623  ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
C    00040 Pentose and glucuronate interconversions [PATH:ko00040]
C    00051 Fructose and mannose metabolism [PATH:ko00051]
D      MYGENEACCESSION03; K17497  PMM; phosphomannomutase [EC:5.4.2.8]
D      MYGENEACCESSION04; K01623  ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]
C    00052 Galactose metabolism [PATH:ko00052]
C    00053 Ascorbate and aldarate metabolism [PATH:ko00053]
C    00500 Starch and sucrose metabolism [PATH:ko00500]
C    00520 Amino sugar and nucleotide sugar metabolism [PATH:ko00520]
D      MYGENEACCESSION05; K01183  E3.2.1.14; chitinase [EC:3.2.1.14]
C    00620 Pyruvate metabolism [PATH:ko00620]
C    00630 Glyoxylate and dicarboxylate metabolism [PATH:ko00630]
C    00640 Propanoate metabolism [PATH:ko00640]
C    00650 Butanoate metabolism [PATH:ko00650]
C    00660 C5-Branched dibasic acid metabolism [PATH:ko00660]
C    00562 Inositol phosphate metabolism [PATH:ko00562]
B

!
#<hr>
#<b>[ <a href="/kegg/ko.html">KO</a> | <a href="/kegg/brite.html">BRITE</a> | <a href="/kegg/kegg2.html">KEGG2</a> | <a href="/kegg/">KEGG</a> ]</b><br>
#Last updated: May 18, 2018
#<br><br><a href="/kegg-bin/get_htext?ko00001_all.keg">&raquo; All categories</a>

(I open it with Notepad++)

In this file, you can see the different functional categories from KEGG for each of my genes, the latters being referred to as "MYGENEACCESSION01" (or -"02", -"03", etc).

I want to extract and organize all info from this first file.keg into a new file (e.g., excel) that looks something like this : https://www.dropbox.com/s/xq4714ngesap9dx/annotation.xlsx?dl=0

CSV version here:

accession,kegg.first.level,kegg.second.level,kegg.third.level,kegg.fourth.level,path ,KO
MYGENEACCESSION01,metabolism,carbohydrate metabolism,glycolisis / Gluconeogenesis,"ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]",PATH:ko00010,K01623
MYGENEACCESSION02,metabolism,carbohydrate metabolism,Pentose phosphate pathway ,"ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]",PATH:ko00030,K01623  
MYGENEACCESSION03,metabolism,carbohydrate metabolism,Fructose and mannose metabolism,  PMM; phosphomannomutase [EC:5.4.2.8],PATH:ko00051,K17497
MYGENEACCESSION04,metabolism,carbohydrate metabolism,Fructose and mannose metabolism,"ALDO; fructose-bisphosphate aldolase, class I [EC:4.1.2.13]",PATH:ko00051,K01623  
MYGENEACCESSION05,metabolism,carbohydrate metabolism,Amino sugar and nucleotide sugar metabolism,chitinase [EC:3.2.1.14],PATH:ko00520,K01183

I have done it manually but it is very tedious and I have a much larger dataset than the provided example.

Any idea to do it automatically with R or another program? (Do you think that an R script could do the job ?)

zx8754
  • 52,746
  • 12
  • 114
  • 209
SkyR
  • 185
  • 1
  • 9
  • Technically asking for recommendations for external programs to do X is off-topic on SO, but it is certainly possible to deliver an R kegg parser, so I'm going to work on it for a bit and see if there is something simple that can be coded. – IRTFM Jun 15 '18 at 16:27
  • No idea ? I guess I should write a loop that could organize the data? – SkyR Jun 18 '18 at 14:47
  • I tried for a while to identify a program that might parse KEGG files. I tried `sos::findFn("parse kegg files") and got quite a few hits but my knowledge of the subject area was insufficient for an efficient survey of the methods. I'm also not sure if that covers BioC packages so you might consider posting the question on the BioC help page. – IRTFM Jun 18 '18 at 15:24
  • Ok thanks, I have tried your command with the sos package and so I have seen the call.MGRAST() function, I am trying to understand how to use it with my file. – SkyR Jun 19 '18 at 07:54

0 Answers0