2

While using MLCP I encountered a strange issue with '-batch_size' option given in the options file(options.txt) when copying documents from one database to another, for example if -batch_size = 10 and the number of documents to be transferred(on the basis filtering options provided) are 106 and now I execute the command : mlcp.bat -options_file "options.txt"

The Content Pump stats received are as :

 INFO contentpump.LocalJobRunner:com.marklogic.mapreduce.ContentPumpStats:
 INFO contentpump.LocalJobRunner: INPUT_RECORDS: 106
 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS: 106
 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_COMMITTED: 100
 INFO contentpump.LocalJobRunner: OUTPUT_RECORDS_FAILED: 0
 INFO contentpump.LocalJobRunner: Total execution time: 37 sec

It seems that the remainder documents left after batching(batch size=10) that is 10*10 + 6, the 6 documents are not transferred to the desired database.

Thus somehow, it only transfers the documents which falls under complete batches and not the left over documents which cannot form complete batch.

Can someone please suggest a workaround for this and cause of this behavior.

Aacini
  • 65,180
  • 12
  • 72
  • 108
  • improved formatting and explanation – Sreehari Jul 07 '16 at 06:30
  • This should normally not happen. Even without the `-batch_size` flag, you'd have multiple batches, as the default value is `100`. Are you sure this is not caused by none-unique uris somehow? If convinced this is a bug, this should be reported to MarkLogic support. Are you entitled to support? – grtjn Jul 07 '16 at 07:49
  • 1
    It could also be worth checking actual database doc count with for instance Query Console. `MLCP` could (theoretically) just be printing the wrong number.. – grtjn Jul 07 '16 at 07:52

1 Answers1

1

Did you verify whether or not the "missing" documents were in the destination database?

There was a bug for awhile (at least with import) in which the statistics were sometimes incorrect, even though the behavior was correct. That problem was fixed in mlcp v8.0-4.

The other thing I would check is that there are no errors logged on the destination server. OUTPUT_RECORDS < OUTPUT_RECORDS_COMMITTED can indicate a server-side error occurred which made some of the commits fail, as described here:

http://docs.marklogic.com/guide/mlcp/getting-started#id_33299

Lastly, what filters are you using? -query_filter can turn up false positives since it uses unfiltered search. I doubt this is what you're running into, but thought it worth mentioning. It's talked about here:

http://docs.marklogic.com/guide/mlcp/export#id_85989

kcoleman
  • 666
  • 3
  • 8
  • I verified it and the documents are missing in the destination database. Moreover there are no such errors encountered on the destination server and their are no such commit failures. Also, I don't thing it has any impact of the option -query_filter as the issue still exists when this option is not used. This issue also occurs when batch size is set to default that is 100. – Rachit Rampal Jul 08 '16 at 12:05
  • In that case, I recommend you contact Support, as @grtjn suggested. – kcoleman Jul 11 '16 at 17:37