I am using Spark-Kafka Integration for working on my project which is to find top trending hashtags on twitter. For this, i am using Kafka for pushing tweets through tweepy Streaming and on the consumer side i am using Spark Streaming for DStream and RDD transformations...
My question is that whether running the streaming process through Kafka for some time may lead to storage issues as i am running both producer and consumer on my local machine... How long can i safely execute the producer (as i need it to run for sometime to get the right trending counts..) ?
Also will it be better if i run it on cloud platforms such as AWS ?