-1

I think pubchem has what I need here, I want a database that is -or could be converted into- a table of chemical identifier : series of properties for a school project. The issue is, pubchem is too large, the only file they offer that I know how to decode is XML (they also offer SDF and ASN, heres the link: ftp://ftp.ncbi.nlm.nih.gov/pubchem/Substance/CURRENT-Full/), and I don't have enough RAM to open the XMLs in a text editor.

Is there an alternative database I can use?

Is there a way to slice up the XML files into more manageable pieces before loading them?

Once I have the data in any openable form I will be able to parse it with code, so the data being too much to read through is not an issue.

  • 1
    Why can't you parse it with code now? You don't have to load the entire file at once. – Dave Newton Jan 12 '21 at 19:16
  • how do I not load the full file at once? – Pablo Ibarz Jan 13 '21 at 15:42
  • to be specifc, in python, when i call "open" it loads the whole file to RAM, do I need to split the file up on my hard drive or is their another method? Sorry if I didn't phrase my original question very well – Pablo Ibarz Jan 13 '21 at 15:44
  • There are many ways to process XML as a stream; if you’re targeting a specific language you should tag the question appropriately so you get specific approaches. – Dave Newton Jan 13 '21 at 18:36
  • I think that's the line of questioning i need for google to take me the rest of the way, – Pablo Ibarz Jan 23 '21 at 17:42

1 Answers1

0

I think im starting to figure out how to do this. XML stream was a good keyword to search around with. Sorry my phrasing was a little strange in the question, I wasn't really sure what I was asking from a technical standpoint.