-1

I have around 80 curves with this general form produced from gam smoothing using the package mgcv (this is dput for two curves):

curve1 = structure(c(7.49350131435014, 9.20913921518434, 10.897558273626, 12.5315396472817, 14.0838644937566, 15.5273139706588, 16.8354309019618, 17.9992764380826, 19.0274300558767, 19.9292328985738, 20.714026109397, 21.3911508315718, 21.9755738920627, 22.5047648527841, 23.0218189593889, 23.5698314575296, 24.1918975928601, 24.9307185773568, 25.8199328484249, 26.8841160688986, 28.1474498679369, 29.6341158746979, 31.3682957183393, 33.3653189702521, 35.6051069707551, 38.0587290024005, 40.6972543477403, 43.4917522893258, 46.412585160685, 49.4138554677925, 52.4334058890734, 55.4083721539251, 58.2758899917481, 60.973095131941, 63.4387734705847, 65.6183115704873, 67.4587461611377, 68.9071139720254, 69.9104517326394, 70.4166212524078, 70.3924611793416, 69.8237870000362, 68.6972392810251, 66.9994585888416, 64.7170854900195, 61.8447094036925, 58.4087151593964, 54.4434364392668, 49.9832069254401, 45.0623603000519, 39.7160061164762, 33.9970989665565, 27.9764384806052, 21.7256001601735, 15.3161595068118, 8.81969202207108, 2.30818949037147, -4.14469117238674, -10.4648767674334, -16.5782940959985, -22.4108699593122, -27.8894639845452, -32.9623907955166, -37.5994200126921, -41.7712540824789, -45.4485954512844, -48.6021465655158, -51.2130051654573, -53.3038501669034, -54.9077557795256, -56.057796212995, -56.7870456769835, -57.1291276417828, -57.130298571955, -56.8494479263324, -56.3460144243683, -55.6794367855146, -54.9091537292243, -54.0902712864655, -53.2605647342642, -52.4534766611619, -51.702449655701, -51.0409263064234, -50.5015466282473, -50.0984914427865, -49.8274823783449, -49.6834384896055, -49.6612788312509, -49.7559224579641, -49.9605546528188, -50.2614256124518, -50.6430517618898, -51.0899495261606, -51.5866353302916, -52.1178475575856, -52.6734296317039, -53.2483300166574, -53.8377191347404, -54.4367674082406, -55.0406452594513), .Dim = c(100L, 1L), .Dimnames = list(NULL, "pd_2"))
curve2 = structure(c(-4.50299508184076, -3.70453890848835,-2.91058337080674, -2.12562910446626, -1.35417674513703, -0.600726928489934, 0.130318651354656, 0.836833955940232, 1.51896860247409, 2.1769711497127, 2.81109015641316, 3.42157418133328, 4.00979189203588, 4.58159239130783, 5.1439448907422, 5.70381860193193, 6.26818273646995, 6.84402482790598, 7.4387538147967, 8.06020004070592, 8.71621217115462, 9.41463887166305, 10.163328807752, 10.9689152542477, 11.8331699231994, 12.7566491359618, 13.7399092138897, 14.783506478338, 15.8877756023413, 17.0479533475697, 18.254178564329, 19.4963684546043, 20.7644402203811, 22.0483110636448, 23.3368154616899, 24.6144569930464, 25.8646565115534, 27.07083487105, 28.2164129253751, 29.2848459145735, 30.2603799614208, 31.1280520714228, 31.8729336362912, 32.4800960477379, 32.9346106974744, 33.2229344326633, 33.33706592227, 33.2703892907107, 33.0162886624017, 32.5681481617592, 31.9195773795413, 31.0693716323692, 30.0215119627273, 28.7802048794415, 27.3496568913384, 25.7340745072439, 23.939275491331, 21.9775226291594, 19.8626899616355, 17.6086515296656, 15.2292813741561, 12.7385709973978, 10.153213513519, 7.49260364848593, 4.77625358964919, 2.02367552435912, -0.745618360033754, -3.5122785079423, -6.25760589083342, -8.96306411193715, -11.6101167744838, -14.1802274817035, -16.6550294974636, -19.0200582802881, -21.2647514833568, -23.3787164204875, -25.3515604054971, -27.1728907522033, -28.8342453001256, -30.3348839915929, -31.6759972946368, -32.8587756772887, -33.8844096075799, -34.7542687297092, -35.4738437397274, -36.0527463855374, -36.5007675912098, -36.8276982808148, -37.0433293784224, -37.157717968495, -37.1819857770595, -37.1275206905347, -37.0057105953392, -36.8279433778919, -36.6055169318738, -36.3476593180057, -36.0615287640475, -35.7541935050218, -35.4327217759511, -35.1041818118577), 
    .Dim = c(100L, 1L), .Dimnames = list(NULL, "pd_2"))

They all are centered around zero, but the distribution around zero is different between curves, i.e. the lowest value varies between curves so it is not as simple as say adding 55 to every value, because ideally the lowest number for each would be at zero. The actual value of the curves does not matter, they are only interesting relative to one another. How can you batch move all the curves to be above zero, retaining their general dimensions relative to one another?

So the goal is to move each curve so that its most negative value is at zero, given that the most negative value varies for every curve.

edit: Gavin proposed this solution

Say we have a model m <- gam(y ~ s(x), data = foo), then we can predict from the model at a set of new values of x, x', over the range of x. The new values go in newd: newd <- with(foo, data.frame(x = seq(min(x), max(x), length = 200))). And we predict using predict(), adding the predicted values to newd: newd <- transform(newd, fitted = predict(m, newdata = newd, type = "response")). Now you can plot with plot(fitted ~ x, data = newd, type = "l") and you'll see the curves on the appropriate scale.

Which works now that Gavin also corrected my formula (see comments): test.gam<-gam(ch1~s(row_id), data=test)

newd<-with(test, data.frame(row_id=seq(min(row_id), max(row_id), length=1197)))

newd<-transform(newd, fitted = predict(test.gam, newdata=newd, type="response"))

Thanks Gavin!!

user2472414
  • 131
  • 11
  • You need to describe how these curves were derived; from a single model or 80 separate ones, or from n < 80 models. – Gavin Simpson May 18 '17 at 01:33
  • 80 separate models, from 80 separate samples. – user2472414 May 18 '17 at 01:40
  • geom_smooth does the same gam smoothing, but does not have this behavior of relocating the curves to negative values. If I knew what was under the hood, I would apply that method, but as I don't, I am using gam from mgcv. – user2472414 May 18 '17 at 01:42
  • This is the function used to generate each curve: gam_data<-function(curves) {out<-gam(curves[,19] ~ s(curves[,5])) pd<-plot(out) return(pd) } – user2472414 May 18 '17 at 01:46
  • That's because they are `predict()`ing from the model and you are looking at the smooths with identifiability constraints applied. You really ought to add each model's intercept to the spline. You can do this via the `shift` argument to `plot.gam()`, or just `predict()` from each of the models. – Gavin Simpson May 18 '17 at 03:09
  • I apologize for my ignorance, but could you provide more details about how to do that with the predict() function? The man page is difficult to interpret... – user2472414 May 18 '17 at 16:31
  • Say we have a model `m <- gam(y ~ s(x), data = foo)`, then we can predict from the model at a set of new values of *x*, *x'*, over the range of *x*. The new values go in `newd`: `newd <- with(foo, data.frame(x = seq(min(x), max(x), length = 200)))`. And we predict using `predict()`, adding the predicted values to `newd`: `newd <- transform(newd, fitted = predict(m, newdata = newd, type = "response"))`. Now you can plot with `plot(fitted ~ x, data = newd, type = "l")` and you'll see the curves on the appropriate scale. – Gavin Simpson May 18 '17 at 16:40
  • This is giving me an error, I'm not sure why. Edited as part of the main discussion. Thank you for your assistance :) – user2472414 May 18 '17 at 18:17
  • **Don't** use `foo$` in formulas. This breaks `predict()` because it is looking for variables with names `foo$bar`, but of course `newd` only contains `bar`. – Gavin Simpson May 18 '17 at 18:22

1 Answers1

0

If your curves are in a list:

l = list(a = curve1, b = curve2) # ...etc...

try lapply to turn this into a list of curves where the minimum is zero:

l.min.zero = lapply(l, function(x) x - min(x))
lebelinoz
  • 4,890
  • 10
  • 33
  • 56