I wrote a script to retrieve and treat information from the Protein Data Base. I import the MMCIFDict module from Bio.PDB.MMCIF2Dict which allows to parse the CIF data in a dictionary. It works well for almost all structures of my list but, I don't know why, for some it crashes. For example with the PDBid 4asd, it returns the key instead of the value, and the value instead of the key. It is like if the parser flips the attribution of keys and values.
The only solution I found is to check if the expected key from the dictionary generated by MMCIFDict module exists or not. If not, I have to find it in all the values of the corresponding dictionary.
import urllib.request
from Bio.PDB.MMCIF2Dict import MMCIF2Dict
set the list of pdb id. Here an example with 4asd
pdb_list = ['4asd']
retrieve the data
cif_webpage = urllib.request.urlopen(f'https://files.rcsb.org/header/{pdb}.cif').read().decode('utf-8').split('\n')
create the dictionary
dico = MMCIF2Dict(cif_webpage)
What I expect:
dico['_entity_src_gen.pdbx_gene_src_scientific_name'] == 'HOMO SAPIENS'
What I have:
KeyError: '_entity_src_gen.pdbx_gene_src_scientific_name'
The expected key is not a key but the value of the expected value which is now a key (hope I don't lost you):
dico['HOMO SAPIENS'] == '_entity_src_gen.pdbx_gene_src_scientific_name'
Thanks in advance for your help!