1

I am testing databases for a new application where I will have to browse and index millions of xmls files and subsequently generate analysis of these data.

I would use SnappyData in this project. However, I do not know how it works.

Is it recommended for this type of application?

Is it possible to use it with Spring-Data-JPA?

In addition to storing the xmls itself, I would like to store the other data (users and system settings) of the application in the same Database instead of PostgreSQL. Is it recommended?

1 Answers1

1

SnappyData is a Hybrid distributed database and primarily designed to manage data in-memory. So, the simple answer is Yes. Do you have specific criteria ? Postgres should work too.

To load XML you can use the spark-xml project from databricks.

jagsr
  • 535
  • 2
  • 6
  • Our system is a Web application that receives XML's from various sources (email, filesystem, user upload), indexes and archives. Today I have 300 simultaneous users accessing the tool and consulting this data. I need to do more detailed analysis of these data, so the need to use Spark. But I would not like to keep PostgreSQL and Spark. Have some basic connection tutorial for SnappyData. For example a user CRUD? Once again, thank you for your attention. – João Batista de Andrade Aug 14 '17 at 14:26
  • You may find this section of the documentation useful: http://snappydatainc.github.io/snappydata/howto/ – plamb Aug 15 '17 at 17:23
  • You will see examples that show how you can do CRUD operations on row tables. We are just about to release support for CRUD with column tables too. Are you planning to route your XML to Spark using streaming? If so, you can transform that XML to a DataFrame easily using the SparkXML package and simply store it in a SnappyTable. After this, hopefully, your queries run much faster than even spark native caching. – jagsr Aug 16 '17 at 18:10
  • Many thanks for the reply. I am developing a pilot and as I have doubts, I count on your help. – João Batista de Andrade Sep 04 '17 at 18:16