0

Statsmodels allows the use of R-style formulas for equation fitting using patsy and statsmodels.formula.api. I would like to fit a specific function using columns in a pandas DataFrame, however, I can only seem to get close. For example, if I have the following dataframe with columns ['A', 'B', 'C', 'D'] and want to fit an equation of the form:

y = (A + B) / D

I can write the formula string as y ~ (A + B):D-1 which results in two coefficients: A:D and B:D. I can then do some algebra and get rid of the coefficient in front of one of them, but not both.

Is there a simple way to fit a custom function of this form, without switching to scipy curve_fit or lmfit?

Edits

To clarify, my goal is to obtain a fit value for D, letting A and B be values stored in the dataframe. To get this to work, I generated a dummy column of 1's called D. So the known values I have are y, A, and B, with D being my fit parameter. As of now, I obtain two a result that looks like y=A:D*A + B:D*B, which I can then extract to get something like y=(A + B:D/A:D*B)/A:D. This works well, except that I would to not have the coefficient in front of B.

David Hagan
  • 1,156
  • 12
  • 23
  • What do you mean by "the coefficient in front of one of them"? In your particular example, there can't be a unique solution, since you could, e.g., multiply all of A, B, and D by two (or any nonzero constant) and get an equivalent expression. Also, it's not clear what you want to "fit", since your equation doesn't seem to leave any "room" for coefficients; you're just doing algebra directly on the data columns. – BrenBarn Feb 09 '17 at 19:38
  • Note `:` in patsy and R specifies interaction terms not division. Can you clarify where/what the parameters are? If you have a explanatory variable `x = (A + B) / D`, then you can just create a new variable or column with pandas. If it's a nonlinear function in parameters, then lmfit, or curve_fit are currently the better choices than statsmodels. – Josef Feb 09 '17 at 19:38
  • Edited for clarification @BrenBarn . Maybe curve_fit would be a better choice here. – David Hagan Feb 09 '17 at 20:19
  • @DavidHagan: As user333700 mentioned, you can just create an `A_plus_B` column and then fit against that. – BrenBarn Feb 10 '17 at 03:20
  • @BrenBarn Ahh. I see what you're saying. Yea, for this example that would definitely work. It might get messy as the equation becomes more complicated though. Thanks! – David Hagan Feb 10 '17 at 20:09

0 Answers0