I have a 20gb database file (*.db) that contains a lot of information on scientific articles, acquired from Scopus. Most information is stored in the responses table within the database file. I only want to obtain the subject areas from each published article. I have no idea how to get this from the database file.
In each row in the database file a lot of information is stored on the author, the article, and also the subject area. A snippet looks like this:
{"abstracts-retrieval-response":{"coredata":{"prism:url":"http://api.elsevier.com/content/abstract/scopus_id/85012897283","dc:identifier":"SCOPUS_ID:85012897283","eid":"2-s2.0-85012897283","language":{"@xml:lang": "eng"},"authkeywords":{"author-keyword": [{"@_fa": "true", "$" :"austerity policies"},{"@_fa": "true", "$" :"housing policy"},{"@_fa": "true", "$" :"Italy"},{"@_fa": "true", "$" :"Mediterranean welfare regime"},{"@_fa": "true", "$" :"Neoliberalism"},{"@_fa": "true", "$" :"Spain"}]},"idxterms":null,"subject-areas":{"subject-area": [{"@_fa": "true", "@abbrev": "SOCI", "@code": "3303", "$" :"Development"},{"@_fa": "true", "@abbrev": "SOCI", "@code": "3322", "$" :"Urban Studies"},{"@_fa": "true", "@abbrev": "ENVI", "@code": "2308", "$" :"Management, Monitoring, Policy and Law"}]}}
From this large (but in reality even much larger) table, I am only interested in getting the dc-identifier
and the (multiple) subject-areas. Ideally I would get them in a *.csv file.
Is there a straightforward way to obtain this information from the *.db file using Python? Using sqlite3 I seem to be able to get acces to the database using the following code:
import sqlite3
conn = sqlite3.connect('C:/responses.db')
c = conn.cursor()
To me it's not clear how I can now only get the dc-identifier and subject-areas information from the database file.