I'm going to design distributed system with Scala and Akka. I want to aggregate tracing messages from a cluster and have possibility to view them in some kind of UI. Is Zipkin the best solution, or Flume(+some wrapper?), or something else?
-
1Your criteria of the *best*? – om-nom-nom Sep 26 '13 at 15:52
-
I'm facing a vaguely similar situation but the data being collected is business-critical. We're thinking of sending it straight to HBase. – Randall Schulz Sep 26 '13 at 15:59
-
The best for me = easy to use, min dependencies, less problems, good docs. I only need to monitor current system state. – Alexander Chepurnoy Sep 26 '13 at 18:50
1 Answers
Zipkin is the best solution.
--zipkin developer
EDIT - Ok ok, here's a serious answer:
Zipkin is a distributed tracing system developed by Twitter because our service-oriented-architecture is so goddamned big that it's often hard to understand WTF is happening in any given request. Seriously, here's a visualization in Zipkin of all the services dependencies at twitter:
Is your platform this intense? You should use zipkin. Did I mention it's one of the best scaling systems I've ever seen? It has zero problem keeping up with twitter-level load, and that might be important to you if you're that big.
What's that you say? You're not as big as twitter? You only have three services: a web frontend, some kind of middleware, and your database backend? Maybe zipkin is a bit overkill for you. We've done some work to make it a bit easier to setup, but really my job isn't to make zipkin easy for you, it's to make zipkin awesome for Twitter.
Still, if you plan on scaling scala, the twitter stack with Finagle etc is insanely good. Don't let all the evangelists from Typesafe fool you. Their stack has some serious deficiencies when you try to deploy it in massive-scale architectures. But again, our job isn't to tell you how good our stack is, or even help you use it. It's to make our stack awesome.

- 15,841
- 8
- 34
- 55
-
1
-
-
-
lol what a crazy pic. I will have only ~10 machines run data-processing tasks and wanna know current system state(number of failed tasks, queues overflown etc). That's far away from twitter case. – Alexander Chepurnoy Sep 26 '13 at 19:00
-
1@AlexanderChepurnoy then usual metrics + graphite/ganglia approach should be fine – om-nom-nom Sep 27 '13 at 12:57