I have data in Cassandra database with one dependent variable(Continuous) and around 100 independent variables(Discrete). Data will be added into the database from various servers and I will get millions of data points each day.
I am planning to predict the dependent variable value given the independent variables values using the last 3 days data at any given day. I did some research and figured that Linear Regression is the best choice for me(Is it ?). I am thinking to use Python/R as the programming tool as they have the existing implementations.
Now my questions are
- I will have around 3 millions of samples to train the model every day. What is the best way of retrieving data from database and training the model ? What are my possible options in terms of implementation ?
- Can I make use of previously trained model weights for the next day training? If yes what are my options ?
Thanks In Advance.