0

I am trying to combined data from multiple sources like RDBMS, xml files, web services using Marklogic. For this as I see from MarkLogic documentation on Metadata Catalog (https://www.marklogic.com/solutions/metadata-catalog/), Data Virtualization (https://www.marklogic.com/solutions/data-virtualization/) and Data Unification it is very well possible. But I am not able to get hold of any documentation describing how exactly to go about it or which tools to use to achieve this.

Looking for some pointers.

2 Answers2

2

As the second image in the data-virtualization link shows, you need to ingest all data into MarkLogic databases. MarkLogic can then be put in between to become the single entry point for end user applications that need access to that data.

The first link describes the capabilities of MarkLogic to hold all kinds of data. It partly does so by storing them as-is, partly by extracting text and metadata for searching, partly by conversion (if you needs go beyond what the original format allows).

MarkLogic provides the general purpose MarkLogic Content Pump (MLCP) tool for this purpose. It allows ingesting zipped or unzipped files, and applying transformations if necessary. If you need to retrieve your data from a different database, you might need a bit more work to get that out. http://developer.marklogic.com holds tutorials, blogs, and tools that should help you get going. Searching the MarkLogic Mailing List through http://marklogic.markmail.org/ can provide answers as well.

HTH!

grtjn
  • 20,254
  • 1
  • 24
  • 35
  • As per the following video presentation by marklogic team https://www.youtube.com/watch?v=tiNqlSmM6T4 the data neednt be ingested into the repository. It can be virutally accessed using metadata. If only someone could show how to do it technically. :-( – user3615683 May 12 '14 at 02:08
  • Without viewing the entire presentation (a bit long): I think they are talking about handling large binaries. You can filter metadata and text out of over 200 binary formats using xdmp:filter. That requires at least accessing the file from within MarkLogic. After that you can either only store the metadata, with a reference to the original location, or simply store it. In the latter case if it is larger than the threshold (default 1mb), it will automatically get stored on file-system, but managed by MarkLogic. That is not the case for the first. – grtjn May 12 '14 at 19:31
0

Combining a lot of data is a very broad topic. Can you describe a couple types of data you'd like to integrate, and what services or queries you would like to build on that data?

  • I want to combine xml data that i have loaded into the marklogic repository with some data that i have in a sql server database without actually moving the data into the marklogic repository (show casing virtualization). – user3615683 May 12 '14 at 02:00
  • the only tool i got hold off was MLSAM which allows me to connect and run queries against a SQL Server (or other RDBMS) database. But i believe that doesnt prove data virtualization. For data virtualization I should be able to create a view in Marklogic server of the data present in an external source. And that can be through creating a metadata catalog for the external source. Please let me know if my understanding is wrong. – user3615683 May 12 '14 at 02:03