0

I would like to fit a bi-gaussian curve to my data, which is basically a normal distribution with two different standard deviations (two different sigma values). How can I do that with NLS?

enter image description here

What I am mainly interested in is the centre of these curves, i.e. the xc value in the graph.

They use the following function in originlabs enter image description here

df<-structure(list(group = c(673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 673.494, 
673.494, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 682.53, 
682.53, 682.53, 682.53, 682.53, 682.53, 682.53), x = c(260.042, 
260.173, 260.305, 260.436, 260.567, 260.699, 260.83, 260.962, 
261.093, 261.224, 261.356, 261.487, 261.618, 261.75, 261.881, 
262.012, 262.144, 262.275, 262.406, 262.538, 262.669, 262.801, 
262.932, 263.063, 263.195, 263.326, 263.457, 263.588, 263.72, 
263.851, 263.982, 264.114, 264.245, 264.376, 264.508, 264.639, 
264.77, 264.902, 265.033, 265.164, 265.295, 265.427, 265.558, 
265.689, 265.821, 265.952, 266.083, 266.214, 266.346, 266.477, 
266.608, 266.739, 266.871, 267.002, 267.133, 267.264, 267.396, 
267.527, 267.658, 267.789, 267.921, 268.052, 268.183, 268.314, 
268.446, 268.577, 268.708, 268.839, 268.97, 269.102, 269.233, 
269.364, 269.495, 269.626, 269.758, 269.889, 270.02, 270.151, 
270.282, 270.414, 270.545, 270.676, 270.807, 270.938, 271.069, 
271.201, 271.332, 271.463, 271.594, 271.725, 271.856, 271.987, 
272.119, 272.25, 272.381, 272.512, 272.643, 272.774, 272.905, 
273.037, 273.168, 273.299, 273.43, 273.561, 273.692, 273.823, 
273.954, 274.085, 274.217, 274.348, 274.479, 274.61, 274.741, 
274.872, 275.003, 275.134, 275.265, 275.396, 275.527, 275.658, 
275.79, 275.921, 276.052, 276.183, 276.314, 276.445, 276.576, 
276.707, 276.838, 276.969, 277.1, 277.231, 277.362, 277.493, 
277.624, 277.755, 277.886, 278.017, 278.148, 278.279, 278.41, 
278.541, 278.672, 278.803, 278.934, 279.065, 279.196, 279.327, 
279.458, 279.589, 279.72, 279.851, 279.982, 280.113, 280.244, 
280.375, 280.506, 280.637, 280.768, 280.899, 281.03, 281.161, 
281.292, 281.423, 281.554, 281.684, 281.815, 281.946, 282.077, 
282.208, 282.339, 282.47, 282.601, 282.732, 282.863, 282.994, 
283.125, 283.255, 283.386, 283.517, 283.648, 283.779, 283.91, 
284.041, 284.172, 284.302, 284.433, 284.564, 284.695, 284.826, 
284.957, 285.088, 285.218, 285.349, 285.48, 285.611, 285.742, 
285.873, 286.004, 286.134, 286.265, 286.396, 286.527, 286.658, 
286.788, 286.919, 287.05, 287.181, 287.312, 287.442, 287.573, 
287.704, 287.835, 287.966, 288.096, 288.227, 288.358, 288.489, 
288.62, 288.75, 288.881, 289.012, 289.143, 289.273, 289.404, 
289.535, 289.666, 289.796, 289.927, 260.042, 260.173, 260.305, 
260.436, 260.567, 260.699, 260.83, 260.962, 261.093, 261.224, 
261.356, 261.487, 261.618, 261.75, 261.881, 262.012, 262.144, 
262.275, 262.406, 262.538, 262.669, 262.801, 262.932, 263.063, 
263.195, 263.326, 263.457, 263.588, 263.72, 263.851, 263.982, 
264.114, 264.245, 264.376, 264.508, 264.639, 264.77, 264.902, 
265.033, 265.164, 265.295, 265.427, 265.558, 265.689, 265.821, 
265.952, 266.083, 266.214, 266.346, 266.477, 266.608, 266.739, 
266.871, 267.002, 267.133, 267.264, 267.396, 267.527, 267.658, 
267.789, 267.921, 268.052, 268.183, 268.314, 268.446, 268.577, 
268.708, 268.839, 268.97, 269.102, 269.233, 269.364, 269.495, 
269.626, 269.758, 269.889, 270.02, 270.151, 270.282, 270.414, 
270.545, 270.676, 270.807, 270.938, 271.069, 271.201, 271.332, 
271.463, 271.594, 271.725, 271.856, 271.987, 272.119, 272.25, 
272.381, 272.512, 272.643, 272.774, 272.905, 273.037, 273.168, 
273.299, 273.43, 273.561, 273.692, 273.823, 273.954, 274.085, 
274.217, 274.348, 274.479, 274.61, 274.741, 274.872, 275.003, 
275.134, 275.265, 275.396, 275.527, 275.658, 275.79, 275.921, 
276.052, 276.183, 276.314, 276.445, 276.576, 276.707, 276.838, 
276.969, 277.1, 277.231, 277.362, 277.493, 277.624, 277.755, 
277.886, 278.017, 278.148, 278.279, 278.41, 278.541, 278.672, 
278.803, 278.934, 279.065, 279.196, 279.327, 279.458, 279.589, 
279.72, 279.851, 279.982, 280.113, 280.244, 280.375, 280.506, 
280.637, 280.768, 280.899, 281.03, 281.161, 281.292, 281.423, 
281.554, 281.684, 281.815, 281.946, 282.077, 282.208, 282.339, 
282.47, 282.601, 282.732, 282.863, 282.994, 283.125, 283.255, 
283.386, 283.517, 283.648, 283.779, 283.91, 284.041, 284.172, 
284.302, 284.433, 284.564, 284.695, 284.826, 284.957, 285.088, 
285.218, 285.349, 285.48, 285.611, 285.742, 285.873, 286.004, 
286.134, 286.265, 286.396, 286.527, 286.658, 286.788, 286.919, 
287.05, 287.181, 287.312, 287.442, 287.573, 287.704, 287.835, 
287.966, 288.096, 288.227, 288.358, 288.489, 288.62, 288.75, 
288.881, 289.012, 289.143, 289.273, 289.404, 289.535, 289.666, 
289.796, 289.927), y = c(216, 226, 224, 206, 232, 256, 210, 226, 
230, 212, 228, 216, 224, 244, 240, 228, 270, 278, 274, 302, 324, 
302, 326, 334, 322, 372, 372, 376, 370, 428, 414, 466, 468, 524, 
552, 536, 604, 640, 666, 678, 684, 682, 770, 786, 816, 846, 920, 
980, 1020, 1028, 1086, 1134, 1288, 1326, 1330, 1392, 1488, 1558, 
1658, 1702, 1652, 1840, 1828, 2032, 1974, 2222, 2346, 2394, 2544, 
2764, 2958, 3022, 3104, 3402, 3586, 3722, 3878, 4016, 4270, 4372, 
4688, 4942, 5308, 5578, 5964, 6484, 6976, 7432, 7982, 8638, 9464, 
10542, 11448, 12450, 13524, 14498, 15306, 15962, 16746, 17432, 
18126, 18776, 19072, 19504, 19528, 19622, 19204, 18610, 18096, 
17372, 16834, 15838, 15248, 14442, 13996, 13326, 12552, 11896, 
11328, 11000, 10224, 9722, 9242, 8824, 8628, 8188, 7956, 7626, 
7432, 7108, 6786, 6668, 6494, 6326, 5998, 5816, 5668, 5512, 5438, 
5104, 5086, 4922, 4734, 4578, 4464, 4402, 4168, 4058, 3868, 3858, 
3776, 3598, 3450, 3434, 3258, 3218, 3132, 3044, 3056, 2824, 2942, 
2828, 2708, 2672, 2638, 2620, 2526, 2508, 2434, 2288, 2260, 2300, 
2184, 2144, 2112, 2042, 2036, 1968, 1922, 1896, 1822, 1776, 1746, 
1688, 1616, 1618, 1666, 1602, 1542, 1538, 1504, 1424, 1422, 1342, 
1338, 1328, 1292, 1248, 1248, 1242, 1160, 1176, 1108, 1076, 1144, 
1078, 1096, 1030, 1012, 1000, 1008, 984, 994, 936, 894, 876, 
822, 826, 806, 820, 826, 794, 778, 778, 758, 772, 772, 732, 740, 
238, 222, 210, 234, 188, 214, 232, 236, 234, 212, 212, 236, 232, 
236, 266, 246, 264, 276, 266, 278, 310, 316, 340, 342, 326, 316, 
366, 382, 404, 420, 430, 482, 512, 528, 582, 572, 568, 628, 596, 
690, 710, 698, 774, 782, 816, 870, 906, 964, 1048, 1078, 1106, 
1180, 1242, 1296, 1352, 1460, 1464, 1530, 1586, 1672, 1784, 1718, 
1814, 1936, 2074, 2152, 2342, 2484, 2580, 2722, 2898, 3046, 3172, 
3320, 3506, 3638, 3908, 4090, 4288, 4452, 4758, 4936, 5316, 5476, 
5904, 6450, 6876, 7494, 8014, 8720, 9714, 10568, 11628, 12724, 
13492, 14646, 15330, 16542, 16942, 17556, 18342, 18786, 19372, 
19762, 19766, 19904, 19648, 18964, 18528, 17908, 17060, 16226, 
15568, 15006, 14276, 13578, 12938, 12262, 11622, 11058, 10478, 
10018, 9510, 9116, 8758, 8396, 8122, 7842, 7558, 7254, 7022, 
6768, 6552, 6298, 6198, 5866, 5726, 5488, 5446, 5268, 5042, 4940, 
4842, 4630, 4544, 4360, 4212, 4108, 3982, 3904, 3738, 3600, 3536, 
3368, 3384, 3234, 3100, 3042, 2962, 2980, 2932, 2878, 2764, 2666, 
2666, 2594, 2506, 2492, 2454, 2334, 2330, 2198, 2204, 2118, 2114, 
2000, 1956, 1966, 1872, 1900, 1786, 1746, 1722, 1738, 1712, 1670, 
1574, 1598, 1538, 1516, 1502, 1370, 1424, 1386, 1310, 1400, 1310, 
1280, 1266, 1230, 1148, 1210, 1126, 1100, 1108, 1046, 1062, 1032, 
1054, 1074, 1008, 990, 938, 938, 890, 872, 874, 854, 880, 832, 
836, 796, 814, 736, 716, 712, 762, 714, 706)), row.names = c(NA, 
-458L), class = c("tbl_df", "tbl", "data.frame"))

This is a fit with the normal distribution

                                                                                                              
fit_parameters<-df%>%
  group_by(group) %>% 
  nls_table(y ~ y0 +A*dnorm(x, mean=xc, sd = sd) ,
            mod_start = c( A=1000, xc=275, sd =10, y0=0), output = "table" )

reload
  • 21
  • 3
  • 1
    I'm not familiar with this distribution. Do you have a reference? Or an equation for the density function? I don't think you are talking about a mixture distribution, which means there is only one standard deviation in the distribution. Maybe this is a skew-normal distribution? See https://search.r-project.org/CRAN/refmans/sn/html/dsn.html – Roland Jun 28 '23 at 13:18
  • @Roland It is not a distribution, just a curve. That's why we can choose its height H. – Stéphane Laurent Jun 28 '23 at 13:23
  • I have added the function they use in origin (different software) to do such fit. – reload Jun 28 '23 at 13:26
  • This may be a somewhat tricky fit (as I discovered when using `nls` to do it) because the objective function is nondifferentiable ... – Ben Bolker Jun 28 '23 at 14:25
  • What do you mean by objective function? I saw the answer that you posted earlier and tried it. I have just evaluated the maximum values of the fits but it seemed to work just fine. So thank you for your help! However, I don't really know yet how your defined functions work and if the fitted curve agrees well with the initial data, I will have to test that first. – reload Jun 28 '23 at 14:46
  • Why do you want, or need, to use a pair of half-gaussians? It's almost always a mistake to pick some overly complicated equation to fit data unless you are trying to prove some point. I agree with @Roland that using a single formula (skew-normal or similar) will be easier and more robust than trying to merge a pair of gaussians to subsets of your data. – Carl Witthoft Jun 28 '23 at 16:26
  • One more comment - as the DataMunger Guru says, don't tells **how** you want to do it (i.e. `nls`); tell us **what you want to do** – Carl Witthoft Jun 28 '23 at 16:27
  • To fit a skewed Gaussian, see here: https://stackoverflow.com/a/25965881/20513099 – I_O Jun 28 '23 at 16:56
  • I want to extract the centre of the peaks by fitting them. Up until now, I have been using a normal distribution, which does work to some extent, but I would also like to consider that the peaks are not symmetrical. I have looked into skewed normal distributions from the sn package in the first place, but I did not manage to make it work with nls_table. I guess a skewed normal distribution would give me similar results for the peak centre as fitting it with "two different standard deviations". – reload Jun 28 '23 at 17:18
  • I deleted my posted answer because the results *looked* sensible but plotting the fitted curve suggested something was wrong - I suspect that `nls()` got stuck because of the non-differentiability. "Objective function" means the sum-of-squares function that we're trying to minimize in the fit ... – Ben Bolker Jun 28 '23 at 20:24

1 Answers1

2

I want to extract the centre of the peaks by fitting them. Up until now, I have been using a normal distribution, which does work to some extent, but I would also like to consider that the peaks are not symmetrical.

Why do you need to fit a parametric function? It will be difficult to find a distribution that always fits the peaks well. Instead I suggest a GAM:

library(mgcv)
library(ggplot2)

df <- as.data.frame(df)
df$group <- factor(df$group)
fit <- gam(y ~ s(x, bs = "cr", k = 100, by = group), data = df)
summary(fit)
plot(fit, pages = 1)
gam.check(fit)
    
df$yhat <- predict(fit)

ggplot(df, aes(x, y)) +
  facet_wrap(~ group) +
  geom_line(aes(color = "data")) +
  geom_line(aes(y = yhat, color = "fit"), linetype = 2)

resulting plot showing data and fits

#zooming in
ggplot(df, aes(x, y)) +
  facet_wrap(~ group) +
  geom_line(aes(color = "data")) +
  geom_line(aes(y = yhat, color = "fit"), linetype = 2) +
  coord_cartesian(ylim = c(1.9e4, 2e4), xlim = c(273, 274))

same plot zoomed in

Note that we have only plotted the fitted values. We can predict for arbitrary resolution, which we use for finding the maximum.

#find maximum from smoothed curve
optimize(\(x) predict(fit, newdata = data.frame(x = x, group = factor(673.494))),
         interval = range(df[df$group == "673.494", "x"]), maximum = TRUE)
#$maximum
#[1] 273.6945
#
#$objective
#       1 
#19565.78 

#since more noisy data could result in local maxima after smoothing
#an exhaustive search for the global maximum would be safer
newdata <- data.frame(x = seq(from = min(df[df$group == "673.494", "x"]),
                          to = max(df[df$group == "673.494", "x"]),
                          by = 0.001),
                      group = "673.494")
newdata$y <- predict(fit, newdata)
newdata[newdata$y == max(newdata$y), "x"]
#[1] 273.695

#maximum in data
df1 <- df[df$group == "673.494",]
df1[df1$y == max(df1$y), "x"]
#[1] 273.823
Roland
  • 127,288
  • 10
  • 191
  • 288
  • Thank you for your answer. Correct me if I'm wrong, I'm still trying to understand this approach, So by smoothing the curve and creating artificial data to that smoothed curve, you give it a higher resolution around the maximum to then fetch the x-value of such maximum? It seems like this takes up a lot of processing power and when I try to fit a larger dataset it fails because of that. – reload Jun 28 '23 at 22:11
  • It could probably be optimized further. Since it looks like you are dealing with data from a spectrometer or from chromatography, you won't find a distribution model that is guaranteed to fit well (because you don't model a distribution). But a parametric model constrains the shape pretty hard and might distort the result. However, you might be better served with dedicated algorithms for peak fitting in chromatography. I can't help with that because I've always relied on commercial software there. – Roland Jun 29 '23 at 07:36