I want to fit a model to a data that is assumed to be related in the form y = alpha*x^beta. My data looks like this:
And can be reproduced with this dput:
structure(list(y = c(15.8999997973442, 34.4999990463257, 60.0000017285347,
234.099998548627, 15.3000003099442, 89.8999990224838, 30, 28.9999990463257,
370.600006774068, 80.2999995946884, 91.3000009059906, 39.9000015258789,
71.0999984741211, 6.20000004768372, 234.099998548627, 8.99999995529652,
38.0000007152557, 17.5000001490116, 29.400000333786, 125.399999916553,
4.80000007152557, 0.899999976158142, 40.0999994277954, 2.5, 45.8000001907349,
0.899999976158142, 133.599999904633, 6.09999990463257, 70.7999984622002,
17.5, 38.2999992370605, 33.4000001698732, 44.3000001907349, 45.8000001907349,
0.800000011920929, 90.7999993562698, 29.5, 0.5, 130.800000190735,
195.300004005432, 0.300000011920929, 27.8999991416931, 3.70000004768372,
1, 4.79999995231628, 14.4999996423721, 46.599998831749, 3.3999999165535,
7.40000009536743, 370.600006774068, 18.5, 37.6999998092651, 24.800000667572,
34.9000000953674, 89.8999990224838, 92.7000005245209, 13.1999998092651,
21.400000333786, 110.799999713898, 0.699999988079071, 44.3999996185303,
20.8999996185303, 73.0000009536743, 86.5000005364418, 101.599999248981,
32.3000005036592, 4.1000000834465, 167.699998855591, 65.4999992847443,
15.0999998152256, 0.200000002980232, 30.0999995470047, 30.5,
37.6999995708466, 92.7999982833862, 33.4000001698732, 83.5999986678362,
24.7000007629395, 127.699999332428, 25, 27.8000001907349, 29.6999999582767,
62.800000667572, 0.300000011920929, 37.9999990463257, 1, 9.10000009834766,
33.8000000119209, 40.0999994277954, 15.5000000298023, 292.299997776747,
15.9999995231628, 33.4000001698732, 0.899999976158142, 68.3000026345253,
28, 30.3999996185303, 20, 30.3999996185303, 5), x = c(3L, 2L,
6L, 22L, 4L, 6L, 2L, 2L, 13L, 7L, 5L, 1L, 2L, 3L, 22L, 3L, 2L,
3L, 3L, 9L, 2L, 1L, 2L, 1L, 2L, 1L, 6L, 2L, 2L, 1L, 1L, 7L, 2L,
2L, 1L, 11L, 1L, 1L, 5L, 4L, 1L, 3L, 1L, 1L, 2L, 2L, 3L, 2L,
1L, 13L, 2L, 5L, 2L, 2L, 6L, 8L, 1L, 4L, 5L, 1L, 3L, 1L, 5L,
8L, 3L, 7L, 2L, 7L, 3L, 2L, 1L, 5L, 1L, 4L, 5L, 7L, 3L, 1L, 5L,
1L, 2L, 5L, 4L, 1L, 3L, 1L, 3L, 2L, 2L, 6L, 16L, 4L, 7L, 1L,
6L, 2L, 2L, 1L, 2L, 1L)), row.names = c("494", "7", "476", "478",
"462", "68", "357", "397", "105", "216", "53", "248", "366",
"338", "478.1", "190", "119", "147", "371", "418", "231", "208",
"19", "337", "408", "90", "44", "488", "435", "13", "249", "434",
"419", "408.1", "209", "120", "47", "526", "82", "84", "3", "1",
"485", "278", "15", "414", "467", "459", "137", "105.1", "425",
"492", "532", "170", "68.1", "429", "347", "491", "29", "215",
"151", "316", "352", "116", "465", "237", "376", "513", "472",
"186", "453", "504", "157", "261", "403", "434.1", "469", "333",
"83", "417", "301", "242", "46", "234", "487", "278.1", "134",
"183", "19.1", "288", "98", "411", "434.2", "117", "375", "5",
"356", "313", "356.1", "359"), class = "data.frame")
I know there are many (really good!!) answers on similar questions like:
https://stats.stackexchange.com/questions/61747/linear-vs-nonlinear-regression?rq=1
Fitting logarithmic curve in R, or
Exponential curve fitting in R
I however cannot get my head around it for some reason.
What I though about doing is the following. I want to fit a linear model in the log-transformed space of both variables. Because a linear model in the log-transformed space is like an exponential-model in the non-transformed space?! I know there are many assumptions about the distribution of the errors. Let's put them a litle bit side for the moment as this is really more about the understanding of the fitting mechanism. I also want to make sure, that only n-% of the data is below the fitted line. This seems like a perfect case for quantile regression. So I did the following:
plot(df$x, df$y)
# fit a linear quantile regression to the data
library(quantreg)
lm =rq(log(y) ~ log(x), data=df, tau = .05)
pr = predict(lm)
lines(exp(pr))
But what I get out is the following:
While I expected something like:
I am really sorry for these bad examples and the complete misunderstanding of basic topics. But maybe someone has an idea on what I'm not getting here.
Update
I mean something like this with the mammals
data in R
# log transformed data
hist(log(df$body))
plot(log(brain) ~ log(body), mammals)
lm_log = lm(log(brain) ~ (log(body)), mammals)
qr_log = rq(log(brain) ~ (log(body)), mammals, tau = .05)
abline(lm_log)
abline(qr_log)
# using the linear model fitted on the log-transformed variables to predict and plot
# in the untransformed plot
new_data = data.frame(body = seq(min(df$body), max(df$body)), by=.5)
pr = predict(lm_log, newdata=new_data)
pr_qr = predict(qr_log, newdata=new_data)
plot(brain ~ body, mammals)
lines(exp(pr), col="green")
lines(exp(pr_qr), col="blue")
Which gives this plot