0

I have historical data on users - I would like to fit an Ordinary Least Squares regression to find out the trends.

my datalooks like

user_id    rating   item_id   date
12            3       19     2010-03-17
13            4       20     2010-03-18
1             3       123    2010-03-19
12            3.5     340    2010-03-17
19            2       19     2010-04-17

here is my function

def coef(y):
    s = y.shape[0]
    A = np.vstack([range(s), np.ones(s)]).T
    m, c = np.linalg.lstsq(A, y, rcond=None)[0]
    return(m)

I was hoping to do something like the following

mydt[:, coef(dt.f.rating), dt.by(dt.f.user_id)]

or some how run this function against each user id. Unfortunately, the data is too big that I can't use Pandas ! so really appreciate to hear even about alternatives.

Areza
  • 5,623
  • 7
  • 48
  • 79
  • 1
    at the moment custom functions are not yet supported in datatable. It is on the to-do list; since it is an open source library, it is dependent on volunteers to contribute. – sammywemmy Oct 12 '21 at 20:20
  • @sammywemmy I just know a bit of python and don't see myself qualified/competent to contribute into one of my favorite packages unless perhaps someone home-school me and break down all required steps :D – Areza Oct 13 '21 at 13:50
  • @sammywemmy is there any alternative approach you suggest ? spark or any other approach. – Areza Oct 13 '21 at 13:51
  • you could have a look at [polars](https://github.com/pola-rs/polars), which has a python version – sammywemmy Oct 14 '21 at 21:25
  • for reference, the github issue: https://github.com/h2oai/datatable/issues/1960 – topchef Nov 16 '21 at 22:37

0 Answers0