0

I am having an huge XML file containing the Resumes. This file is in two format viz- A single master file containing all the Resumes for ex-

<Resumes>
  <Resume>
    <Name>ABC</Name>
    ......
    ......
  </Resume>
  <Resume>
    <Name>PQR</Name>
    ......
    ......
  </Resume>
  ......
  ......
</Resumes>

and multiple files for ex-

file 1-

<Resumes>
  <Resume>
    <Name>ABC</Name>
    ......
    ......
  </Resume>
</Resumes>

file-2

<Resumes>
  <Resume>
    <Name>PQR</Name>
    ......
    ......
  </Resume>
</Resumes>

and so on.

I want to use baseX or eXist XML DB for storing the XML. So in future, if I want to add more Resumes (in XML) format then which one will be better?

John
  • 2,820
  • 3
  • 30
  • 50
  • 1
    I've heard that there are some XML Databases that cope well with data in a single huge document, but I think that for most products, many small documents works much better. – Michael Kay Apr 27 '12 at 20:50
  • @MichaelKay- Yes. Right said. Still waiting for more expert views. :) – John Apr 28 '12 at 07:42

1 Answers1

2

For eXist-db, let me quote from a post on exist-open by Wolfgang Meier in response to a similar question:

I don't think there's a theoretical limit, but there are certainly some practical considerations. Storing a very large document can block the db more than storing many small ones. It requires a single transaction (and sufficient disk space for the transaction log).

The dblp bibliography, which I use for some automated performance tests, comes as a single document with more than 600mb. This loads well if you slightly increase the cache size and memory settings. I know other users have to deal with much larger documents (many gigabytes), but if you have a choice, I would definitely recommend to split your data in smaller chunks, which are easier to handle.

Granted, eXist-db has become even more efficient and robust since November 2009 when Wolfgang wrote this post, but I think his advice still applies. Two final notes:

  1. Make sure you use the latest version of eXist, e.g. either 1.4.2 or the 2.0 Tech Preview. These benefit from the advances I spoke about.

  2. To squeeze out the most performance of eXist-db, read the eXist-db documentation article entitled, Performance Tuning.

Joe Wicentowski
  • 5,159
  • 16
  • 26
  • @joewiz- Thanks a lot :) Very good information shared by you... Thanks a lot again. – John Apr 30 '12 at 10:13
  • @John Sure thing! I'd also encourage you to join the [exist-open mailing list](https://lists.sourceforge.net/lists/listinfo/exist-open) - it's a good place to ask more detailed questions, stay abreast of current best practices, report your experience, etc. – Joe Wicentowski May 04 '12 at 23:56