So far that I know about Storm, that it's used to analyze Twitter tweets to get trending topics, but can it be used to analyze data from government's census? And because the data is structured, is storm suitable for that?
2 Answers
Storm is generally used for processing unending streams of data, e.g. logs, the twitter stream, or in my case the output of a web crawler.
I believe census type data would be in the form of a fixed report, which could be treated as a stream, but would probably lend itself better to processing via something like Map Reduce, using Hadoop (possibly with cacading or scalding as layers of abstraction over the details).
The structured nature of the data wouldn't prevent use of any of these technologies, that's more related to the problem you are trying to solve.

- 3,398
- 17
- 23
Storm is designed for streaming data processing, where the data is coming continuously. Your application has all the data it needs to process available, so a Batch processing is more suited. If the data is structured, you can use R or other tools for analysis, or write scripts to convert the data so that it can go to R as input. If its a humongous dataset, & u want to process it faster, only then think of getting into Hadoop & writing your program as per the analysis you have to do. Suggesting an architecture is only possible if you provide more details regarding data size, & what sort of analysis you are looking forward to do on it. If its a smaller dataset, both hadoop & storm can be an overkill for the problem that has to be solved. --gtaank

- 21
- 2