Questions tagged [least-squares]

Refers to a general estimation technique that selects the parameter value to minimize the squared difference between two quantities, such as the observed value of a variable, and the expected value of that observation conditioned on the parameter value. Questions about the theory behind least-squares should utilize the Cross Validated (https://stats.stackexchange.com/questions) Stack Exchange site.

Overview

From the "Least squares" article on Wikipedia:

The method of least squares is a standard approach to the approximate solution of overdetermined systems, i.e., sets of equations in which there are more equations than unknowns. "Least squares" means that the overall solution minimizes the sum of the squares of the errors made in the results of every single equation.

Least squares problems fall into two categories: linear or ordinary least squares and non-linear least squares, depending on whether or not the residuals are linear in all unknowns. The linear least-squares problem occurs in statistical regression analysis; it has a closed-form solution. A closed-form solution (or closed-form expression) is any formula that can be evaluated in a finite number of standard operations. The non-linear problem has no closed-form solution and is usually solved by iterative refinement; at each iteration the system is approximated by a linear one, and thus the core calculation is similar in both cases.

Other References

Least squares methods are treated in many introductory statistics resources and textbooks, but there are also advanced resources dedicated only to the subject, for example:

Tag usage

Questions on should be about implementation and programming problems, not about the statistical or theoretical properties of the technique. Consider whether your question might be better suited to Cross Validated, the StackExchange site for statistics, machine learning and data analysis.

1013 questions
13
votes
2 answers

k-means return value in R

I am using the kmeans() function in R and I was curious what is the difference between the totss and tot.withinss attributes of the returned object. From the documentation they seem to be returning the same thing, but applied on my dataset the value…
Marius
  • 990
  • 1
  • 14
  • 34
13
votes
1 answer

trying to display original and fitted data (nls + dnorm) with ggplot2's geom_smooth()

I am exploring some data, so the first thing I wanted to do was try to fit a normal (Gaussian) distribution to it. This is my first time trying this in R, so I'm taking it one step at a time. First I pre-binned my data: myhist = data.frame(size =…
13
votes
1 answer

How to solve a least squares (underdetermined system) quickly?

I have a program in R that is computing a large amount of least squares solutions (>10,000: typically 100,000+) and, after profiling, these are the current bottlenecks for the program. I have a matrix A with column vectors that correspond to…
Mark
  • 131
  • 4
13
votes
4 answers

How to do linear regression, taking errorbars into account?

I am doing a computer simulation for some physical system of finite size, and after this I am doing extrapolation to the infinity (Thermodynamic limit). Some theory says that data should scale linearly with system size, so I am doing linear…
Vladimir
  • 369
  • 1
  • 3
  • 12
12
votes
1 answer

Scipy optimize raises ValueError despite x0 being within bounds

I'm trying to fit a sigmoid curve onto a small set of points, basically generating a probability curve from a set of observations. I'm using scipy.optimize.curve_fit, with a slightly modified logistic function (so as to be bound completely within…
Iago
  • 121
  • 1
  • 5
12
votes
2 answers

Calculating the null space of a matrix

I'm attempting to solve a set of equations of the form Ax = 0. A is known 6x6 matrix and I've written the below code using SVD to get the vector x which works to a certain extent. The answer is approximately correct but not good enough to be useful…
Ainsworth
  • 123
  • 1
  • 4
12
votes
3 answers

How can I calculate a trend line in PHP?

So I've read the two related questions for calculating a trend line for a graph, but I'm still lost. I have an array of xy coordinates, and I want to come up with another array of xy coordinates (can be fewer coordinates) that represent a…
Stephen
  • 18,827
  • 9
  • 60
  • 98
12
votes
3 answers

How to use leastsq function from scipy.optimize in python to fit both a straight line and a quadratic line to data sets x and y

How would i fit a straight line and a quadratic to the data set below using the leastsq function from scipy.optimize? I know how to use polyfit to do it. But i need to use leastsq function. Here are the x and y data sets: x:…
user2956673
  • 129
  • 1
  • 1
  • 3
11
votes
1 answer

How can I perform a least-squares fitting over multiple data sets fast?

I am trying to make a gaussian fit over many data points. E.g. I have a 256 x 262144 array of data. Where the 256 points need to be fitted to a gaussian distribution, and I need 262144 of them. Sometimes the peak of the gaussian distribution is…
Michael
  • 203
  • 1
  • 4
  • 10
11
votes
2 answers

Chi square numpy.polyfit (numpy)

Could someone explain how to get Chi^2/doF using numpy.polyfit?
casper
  • 181
  • 1
  • 2
  • 4
11
votes
1 answer

Method signature for Jacobian of a least squares function in scipy

Can anyone provide an example of providing a Jacobian to a least squares function in scipy? I can't figure out the method signature they want - they say it should be a function, yet it's very hard to figure out what input parameters in what order…
George Karpenkov
  • 2,094
  • 1
  • 16
  • 36
11
votes
1 answer

Quantifying the quality of curve fit using Python SciPy

I'm using Scipy curve_fit to fit a Gaussian curve to data, and am interested in analysing the quality of the fit. I know curve_fit returns a useful pcov matrix, from which the standard deviation of each fitting parameter can be computed as…
IanRoberts
  • 2,846
  • 5
  • 26
  • 33
11
votes
2 answers

Fit points to a plane algorithms, how to iterpret results?

Update: I have modified the Optimize and Eigen and Solve methods to reflect changes. All now return the "same" vector allowing for machine precision. I am still stumped on the Eigen method. Specifically How/Why I select slice of the eigenvector…
Michael
  • 559
  • 6
  • 15
11
votes
2 answers

Weighted trendline

Excel produces scatter diagrams for sets of pair values. It also gives the option of producing a best fit trendline and formula for the trendline. It also produces bubble diagrams which take into consideration a weight provided with each value.…
Tams
  • 182
  • 1
  • 1
  • 8
11
votes
3 answers

trying to get reasonable values from scipy powerlaw fit

I'm trying to fit some data from a simulation code I've been running in order to figure out a power law dependence. When I plot a linear fit, the data does not fit very well. Here's the python script I'm using to fit the data: #!/usr/bin/env…
zje
  • 3,824
  • 4
  • 25
  • 31
1
2
3
67 68