Better performance and lower memory usage

Question

I am developing an application where I will store complex XMLs in Snappydata for future analysis.

For better analysis performance and lower memory consumption, what do you recommend? Store in xml, json or object?

Previously, thanks for your attention.

score 1 · Answer 1 · answered Jan 07 '18 at 02:21

Obtain a DataFrame from your XML source and save into a Row or Column table in SnappyData.

Something like this if SQL is your preferred choice .... (Refer to docs for DF API)

snappy> CREATE external TABLE myXMLTable USING com.databricks.spark.xml
   OPTIONS (path "pathToYourXML.xml", rowTag "Refer to docs link below");

snappy> create table myInMemoryTable using column as (select * from myXMLTable);

https://github.com/databricks/spark-xml

Better performance and lower memory usage

1 Answers1