4

We have a complex finance / portfolio analytics that we would like to take advantage of Spark.

Instead of having the application submit isolated jars that perform the computation and then having to retrieve the data out of SQL, how viable would it be to simply have the entire application run as a Spark driver so that the results from Spark can be seamlessly accessed from the main application?

Is this a recommended use case of Spark? What would be the potential disadvantages of this approach? Would there be any performance or latency implications?

noahlz
  • 10,202
  • 7
  • 56
  • 75
sturm
  • 73
  • 5
  • 1
    Not viable, and a waste of what Spark is built for. You should look at Spark Streaming -- you can get data in and out of Spark in near real-time. – David Griffin Apr 22 '16 at 01:22
  • @DavidGriffin, Spark Streaming and Spark serve different purposes. If OP's application needs Spark, why should he look at Spark Streaming? – Aivean Apr 22 '16 at 01:30
  • 1
    Because the difference between them is exactly what the asker is looking for -- the ability to get data in and out of Spark quickly, for backending an app server. Otherwise, you can do exactly the same things with Spark Streaming as you do with Spark. I use `DataFrames` in Streaming, I use `GraphX`, I've even started using `GraphFrames` within my Spark Streaming applications. It's the logical recommendation given what the asker wants. – David Griffin Apr 22 '16 at 01:41
  • @DavidGriffin, I see you point, but I understood the question differently. Say, you have 1TB of log files and you want to run different queries against them using Spark. You can have driver to provide API for such queries. With spark streaming it's completely different approach, you'll also need some OLAP DB as middleware to run queries against it. – Aivean Apr 22 '16 at 03:00
  • 4
    Says specifically "financial application", which implies transactional processing not log file analytics. Well -- could be transaction detail records pulled out of log files, but transactional in nature. "Financial application server" screams OLTP to me, not OLAP. Certainly enough so that it more than warrants at least looking at Spark Streaming. – David Griffin Apr 22 '16 at 03:06
  • @DavidGriffin, I think you're making too many guesses. Question is about working with Spark, and I don't see how Spark Streaming can be drop-in replacement in general case. As a side note, mention (`@`) if you're replying. – Aivean Apr 22 '16 at 04:38
  • Typically analytics != OLTP – sourcedelica Apr 22 '16 at 20:44
  • @David Griffin. Please post your second comment as an answer. I just noticed it and it is exactly what I was looking ofr. tks – Jake Feb 26 '18 at 18:04

1 Answers1

0

This should be fine as long as you own the cluster and you don't mind holding it while having nothing to process.

You can programmatically set you spark context and keep it running for as long as you want.

Everything will be one long running application that is using some constant resources.

Things to worry about:

  • if spark dies, how will this affect your server?

  • if driver runs out of memory, it will crush your server.

If you have answers for the above I don't see something fundamentally wrong.

marios
  • 8,874
  • 3
  • 38
  • 62