0

I am a beginner with exist-db. I am building an xml document through Java. I process data through JAXB and then insert into exist-db resource through insert update. I am testing with around 500 nodes at this time and it starts taking up to 10 seconds per insert after a few dozen have executed. My XML has the following general structure.

<realestatedata>
<agents>
    <author id="1">
        <name>Author_A</name>
    </author>
    <author id="2">
        <name>Author_B</name>
    </author>
    <portal id="1">
        <name>Portal_A</name>
    </portal>
</agents>
<artifacts>
    <document id="1">            
        <latitude>51.37392</latitude>
        <longitude>-0.00866</longitude>
        <bathroom_number>1</bathroom_number>
        <bedroom_number>3</bedroom_number>
        <price>365000</price>
    </document>
    <theme id="1">
        <name>Garden</name>
    </theme>
    <place id="1">
        <name>BR4</name>
        <location>
            <lat>51.37392</lat>
            <lon>-0.00866</lon>
        </location>
    </place>
</artifacts>
</realestatedata>

To ensure elements are placed at correct order, I am using the following code for insert update so a new record of its type is either the first one or is appended at the end of similar elements based on ids.

public void saveAuthor(Author author) {
    XQueryService xQueryService = null;
    CompiledExpression compiled = null;
    int currentId = authorIdSequence.get();
    StringWriter authorXml = new StringWriter();
    try {
        xQueryService = Utils.getXQeuryService();
        if (getAuthorByName(author.getName()) == null) {
            author.setId(String.valueOf(authorIdSequence.incrementAndGet()));
            marshaller.marshal(author, authorXml);
            if(currentId == 0){
                compiled = xQueryService
                        .compile("update insert " + authorXml.toString()
                                + " into //agents");
            }
            else{
                compiled = xQueryService
                        .compile("update insert " + authorXml.toString()
                                + " following //author[@id = '"+String.valueOf(currentId)+"']");
            }               
            xQueryService.execute(compiled);
        }

    } catch (XMLDBException e) {
        e.printStackTrace();
    } catch (JAXBException e) {
        e.printStackTrace();
    }
}

The same methods are executed for other elements like document, place etc. After a few updates, it gets very slow. It starts taking up to ten seconds to insert one record.

Only related links I could find are unswered.

http://sourceforge.net/mailarchive/forum.php?thread_name=s2s508bb1471004190430h8b42ee99o3f1835a9bc873d58%40mail.gmail.com&forum_name=exist-development

http://exist.2174344.n4.nabble.com/Slow-xquery-quot-update-insert-quot-performance-tt4657541.html#none

Joe Wicentowski
  • 5,159
  • 16
  • 26
waqas
  • 124
  • 1
  • 10

1 Answers1

2

A few thoughts:

  • Attribute filters ([@id=…]) can be pretty slow when run on a large set of nodes. Consider that your code as posted will require eXist to check the @id of every previously inserted author before finding the right place to insert the new one. I can think of a few ways to solve this:
    1. A range index on @id's would speed things up considerably.
    2. Using @xml:id instead of @id would let you use id(…) which would be even faster yet. This would require changing your id's to be unique though (eg. "author_1", and "portal_1")
    3. If you're really always incrementing your @id values, new nodes will always have the largest @id. In that case, following //author[last()] or even into //agents will work just fine.
  • Doing many small inserts will always be slower than doing one big insert. If possible, delay saving new data to eXist until you have a bunch to do at once.
  • Make sure the XQueryServices you're creating are getting released properly after you're done with them. Is Utils.getXQueryService() possibly keeping references it shouldn't?
  • Make sure you're not compounding overhead unnecessarily. Can you reuse XQueryServices between calls? If getAuthorByName() is querying eXist, can it be combined with the update query? Can you provide the node(s) to insert through a variable binding instead of as literals in the query so that you can reuse the same compiled query every time?

All that being said though, 10s is an awfully long time for a single insert if you only have 500 nodes. A quick test on my machine using the un-indexed "following" syntax to run a batch of updates in a single query can do the whole 500 in half that time. There's quite likely something larger going wrong that's not evident in your question.

Telic
  • 357
  • 1
  • 6