File A (tab-delimited, 10 columns):
chrI DBVPG6765 gene 7249 9030 . - . ID=01G00030;Name=YAL067W
chrI DBVPG6765 mRNA 7249 9030 . - . ID=01T00030.1;Parent=01G00030
chrI DBVPG6765 exon 7249 9030 . - . ID=01T00030.1.exon.1;Parent=01T00030.1
chrI DBVPG6765 CDS 7249 9030 . - . ID=01T00030.1.CDS.1;Parent=01T00030.1
chrI DBVPG6765 gene 11586 11945 . - . ID=01G00040;Name=YAL065C
chrI DBVPG6765 mRNA 11586 11945 . - . ID=01T00040.1;Parent=01G00040
chrI DBVPG6765 exon 11586 11945 . - . ID=01T00040.1.exon.1;Parent=01T00040.1
chrI DBVPG6765 CDS 11586 11945 . - . ID=01T00040.1.CDS.1;Parent=01T00040.1
File B (tab-delimited, 2 columns):
YAL001C TFC3
YAL002W VPS8
YAL003W EFB1
YAL005C SSA1
YAL007C ERP2
YAL008W FUN14
YAL009W SPO7
YAL010C MDM10
YAL011W SWC3
YAL012W CYS3
YAL013W DEP1
...
YAL067W SEO1
YAL066W YAL066W
YAL065C YAL065C
...
The format I should get is:
chrI DBVPG6765 gene 7249 9030 . - . ID=01G00030;Name=SEO1
chrI DBVPG6765 mRNA 7249 9030 . - . ID=01T00030.1;Parent=01G00030
chrI DBVPG6765 exon 7249 9030 . - . ID=01T00030.1.exon.1;Parent=01T00030.1
chrI DBVPG6765 CDS 7249 9030 . - . ID=01T00030.1.CDS.1;Parent=01T00030.1
chrI DBVPG6765 gene 11586 11945 . - . ID=01G00040;Name=YAL065C
chrI DBVPG6765 mRNA 11586 11945 . - . ID=01T00040.1;Parent=01G00040
chrI DBVPG6765 exon 11586 11945 . - . ID=01T00040.1.exon.1;Parent=01T00040.1
chrI DBVPG6765 CDS 11586 11945 . - . ID=01T00040.1.CDS.1;Parent=01T00040.1
ID=DBVPG6765_01G00030;Name=YAL067C is column 10 in file A. The script/ one-liner should look for YAL067W in file B and replace YALO67W with the corresponding second column of file B (SEO1 in this case).
Since the order of genes in file B is not identical to the line number in file A, awk 'NR==FNR ... is not working.
Anyone has some advice or a small script of how I should proceed with this? I should mention I'm quite new to scripting/programming.