0

I'm busy creating a simple DBTester program with a Data Access Layer that can Test and Compare multiple (kind of) Databases. Currently I have implemented the Add(Insert) for SQL Server and MarkLogic NoSQL.

To my supprise inserting/adding 1 M Person Entities takes much more time using MarkLogic XCC/.Net XQuery than using SQL Server 2008 R2. SQL Server takes a few minutes of which 11 seconds in 11654 ms in Data Access Layer. MarkLogic 8 is still busy at 15621 entities in 15+ minutes!

I'm new to NoSQL MarkLogic and XCC/XQuery and maybe doing something wrong. My test code for MarkLogic can be found at GitHub: https://github.com/driekus77/DBTester/blob/master/DBTester/DataAccessLayer/Repository/MarkLogic/PersonRepository.cs#L48

The corresponding SQLServer Add code can be found at: https://github.com/driekus77/DBTester/blob/master/DBTester/DataAccessLayer/Repository/SQLServer/PersonRepository.cs#L64

So what am I doing wrong? Should I use MarkLogic RestAPI direct? Should I use JSON in stead off XML? Are there ways to speed up my XQuery Add call?

Thanks for any help!

Henry Roeland
  • 492
  • 5
  • 19
  • 1
    I'm not replying to the answer, as it would take quite some time to dive into your code. However, it is good to note that In MarkLogic, by default, you are doing transactions, you are stemming, you are indexing every element in various ways both for db-like selects as well as some pretty robust search-engine(word query) capabilities. So as a start, I'm not sure that. Furthermore, there is a big difference in how one may load data into one system or another. – David Ennis -CleverLlamas.com Nov 15 '15 at 14:41

2 Answers2

3

See my original comment. IN addition, I also note that you are inserting items into a single 'persons' CML document. That is not what MarkLogic prefers. Each person is meant to be a separate record. Otherwise, - because it is a transactional database, each of your insert-child calls is blocking as it is the same document.

  • When I remove the Persons root node I get the XQuery exception: Document nodes cannot have multiple roots – Henry Roeland Nov 15 '15 at 15:27
  • Aha I've just seen: https://developer.marklogic.com/learn/sql-marklogic-mapping and it is explaining what you mean! – Henry Roeland Nov 15 '15 at 16:03
  • 1
    Great! I hope it helps. But like I said in the original comment under your initial post - You're not really comparing the same thing when you consider indexes, etc. Plus, other items not mentioned are things like multiple forests, etc.. research a bit more and try to get a test that matches a bit closer to each-other. Furthermore, if you really want to dive in and need assistance, let me know and maybe I can have someone spend a bit of time going over stuff with you. We do have a dedicated C#/ML person that may be able to give more specific guidance than myself. – David Ennis -CleverLlamas.com Nov 15 '15 at 21:53
3

Relational and Document and "NoSQL" databases as so fundamentally different that comparisons of this sort are misleading at best. They are different because they focus on different problems and use cases and optimize for those in different ways. A 'classic' example is comparing GC based, vs Reference counting vs explicit memory management languages. E.g. a GC based application even using less efficient algorithms can outperform a lower level manual memory managed language -- simply because the GC can be deferred past the interesting part of the application -- sometimes forever (the app exists before it needs to GC). One could debate both sides of that depending on whats important to you.

I suggest a more useful performance comparison is total application responsiveness, or throughput, or some measure of the 'whole picture' that's important to you and after you optimize the application for the particular use cases and technology. As noted, ML does significantly more work 'up front' then Relational or 'traditional' NoSQL database. If your app is a "WOM" (Write Only Memory) use case then writing to /dev/null will be even faster. When it comes time to do complex queries, document creation, large datasets etc that 'up front' is already done and neither your code nor the server has to work as hard. Similarly with data modeling -- if you start with a data model optimized for a RDBMS, it may not be ideal for a non-RDBMS engine - and visa versa.

I suggest starting with a smaller set of data at first and work through a POC of a common use case for the app as a whole. The data model is fundamental to success (or failure) with any database and application. From your application model perspective, what does the 'business objects' 'look like' ? For a NoSQL type DB try to model that as directly as possible. That will lead you in the right direction for performance as well as development/coding. At that point performance measurements and optimization strategies are much more useful and comparable.

DALDEI
  • 3,722
  • 13
  • 9