I want to store realtime tweet following some filtering criteria in a MySQL database. I want to understand which approach is better given the fact that i have a 16 CPU machine. Since for my case is better to use the streaming api it's possible to easily build a java application using tweet4j library; In this case filtering and storing can be done using multithreading programming. On the other hand i just discovered Spark that with few line permit to do the same but remain the bottleneck of having only one memory.
I want to understand if spark could be a real improvement given that it's pretty difficult to reach twitter rate limit and I can't take advantage of a distributed cluster.
Thanks for helping.