Linear Regression- Big Training Dataset from Database

Asked Jul 07 '16 at 16:21

Active Sep 20 '17 at 07:57

Viewed 127 times

I have data in Cassandra database with one dependent variable(Continuous) and around 100 independent variables(Discrete). Data will be added into the database from various servers and I will get millions of data points each day.

I am planning to predict the dependent variable value given the independent variables values using the last 3 days data at any given day. I did some research and figured that Linear Regression is the best choice for me(Is it ?). I am thinking to use Python/R as the programming tool as they have the existing implementations.

Now my questions are

I will have around 3 millions of samples to train the model every day. What is the best way of retrieving data from database and training the model ? What are my possible options in terms of implementation ?
Can I make use of previously trained model weights for the next day training? If yes what are my options ?

Thanks In Advance.

edited Sep 22 '17 at 17:48

Community

asked Jul 07 '16 at 16:21

Varun Kumar Reddy B

Your question is quite broad. If you have a specific thing in R or Python you are struggeling with feel free to post another question. – Paul Hiemstra Jul 07 '16 at 16:26
i suggest please check lm function in R. Its inbuilt function for regression. And for fetching the data plz check this link http://stackoverflow.com/questions/21994077/how-to-read-data-from-cassandra-with-r – Sahil Desai Sep 15 '16 at 08:32

Linear Regression- Big Training Dataset from Database

0 Answers0