Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
62
votes
2 answers

R tick data : merging date and time into a single object

I'm currently working in tick data with R and I would like to merge date and time into a single object as I need to get a precise time object to compute some statistics on my data. Here is how my data looks like: date time …
marino89
  • 899
  • 1
  • 10
  • 16
61
votes
1 answer

Plotting a 3D surface plot with contour map overlay, using R

I have a 3-tuple data set (X,Y,Z points) that I want to plot using R. I want to create a surface plot from the data, and superimpose a contour map on the surface plot, so as to create the impression of the contour map being the "shadow" or…
Stick it to THE MAN
  • 5,621
  • 17
  • 77
  • 93
61
votes
7 answers

confidence and prediction intervals with StatsModels

I do this linear regression with StatsModels: import numpy as np import statsmodels.api as sm from statsmodels.sandbox.regression.predstd import wls_prediction_std n = 100 x = np.linspace(0, 10, n) e = np.random.normal(size=n) y = 1 + 0.5*x +…
F.N.B
  • 1,539
  • 6
  • 23
  • 39
60
votes
9 answers

Error in contrasts when defining a linear model in R

When I try to define my linear model in R as follows: lm1 <- lm(predictorvariable ~ x1+x2+x3, data=dataframe.df) I get the following error message: Error in `contrasts<-`(`*tmp*`, value = contr.funs[1 + isOF[nn]]) : contrasts can be applied only…
REnthusiast
  • 1,591
  • 3
  • 16
  • 18
59
votes
8 answers

Cumulative Normal Distribution Function in C/C++

I was wondering if there were statistics functions built into math libraries that are part of the standard C++ libraries like cmath. If not, can you guys recommend a good stats library that would have a cumulative normal distribution function? More…
Tyler Brock
  • 29,626
  • 15
  • 79
  • 79
59
votes
1 answer

How to calculate the 95% confidence interval for the slope in a linear regression model in R

Here is an exercise from Introductory Statistics with R: With the rmr data set, plot metabolic rate versus body weight. Fit a linear regression model to the relation. According to the fitted model, what is the predicted metabolic rate for a body…
Yu Fu
  • 1,151
  • 1
  • 8
  • 15
58
votes
10 answers

Fitting polynomials to data

Is there a way, given a set of values (x,f(x)), to find the polynomial of a given degree that best fits the data? I know polynomial interpolation, which is for finding a polynomial of degree n given n+1 data points, but here there are a large…
ShreevatsaR
  • 38,402
  • 17
  • 102
  • 126
58
votes
7 answers

git find fat commit

Is it possible to get info about how much space is wasted by changes in every commit — so I can find commits which added big files or a lot of files. This is all to try to reduce git repo size (rebasing and maybe filtering commits)
tig
  • 25,841
  • 10
  • 64
  • 96
55
votes
18 answers

What is a good solution for calculating an average where the sum of all values exceeds a double's limits?

I have a requirement to calculate the average of a very large set of doubles (10^9 values). The sum of the values exceeds the upper bound of a double, so does anyone know any neat little tricks for calculating an average that doesn't require also…
Simon
  • 78,655
  • 25
  • 88
  • 118
54
votes
2 answers

What do all the distributions available in scipy.stats look like?

Visualizing scipy.stats distributions A histogram can be made of the scipy.stats normal random variable to see what the distribution looks like. % matplotlib inline import pandas as pd import scipy.stats as stats d = stats.norm() rv =…
tmthydvnprt
  • 10,398
  • 8
  • 52
  • 72
54
votes
6 answers

How to perform two-sample one-tailed t-test with numpy/scipy

In R, it is possible to perform two-sample one-tailed t-test simply by using > A = c(0.19826790, 1.36836629, 1.37950911, 1.46951540, 1.48197798, 0.07532846) > B = c(0.6383447, 0.5271385, 1.7721380, 1.7817880) > t.test(A, B, alternative="greater") …
Timo
  • 5,188
  • 6
  • 35
  • 38
53
votes
5 answers

Calculating percentile of dataset column

A quick one for you, dearest R gurus: I'm doing an assignment and I've been asked, in this exercise, to get basic statistics out of the infert dataset (it's in-built), and specifically one of its columns, infert$age. For anyone not familiar with the…
Dimitris Sfounis
  • 2,400
  • 4
  • 31
  • 46
52
votes
6 answers

How to find probability distribution and parameters for real data? (Python 3)

I have a dataset from sklearn and I plotted the distribution of the load_diabetes.target data (i.e. the values of the regression that the load_diabetes.data are used to predict). I used this because it has the fewest number of variables/attributes…
O.rka
  • 29,847
  • 68
  • 194
  • 309
50
votes
7 answers

P-value from Chi sq test statistic in Python

I have computed a test statistic that is distributed as a chi square with 1 degree of freedom, and want to find out what P-value this corresponds to using python. I'm a python and maths/stats newbie so I think what I want here is the probability…
Davy Kavanagh
  • 4,809
  • 9
  • 35
  • 50
50
votes
11 answers

how to calculate the Euclidean norm of a vector in R?

I tried norm, but I think it gives the wrong result. (the norm of c(1, 2, 3) is sqrt(1*1+2*2+3*3), but it returns 6.. x1 <- 1:3 norm(x1) # Error in norm(x1) : 'A' must be a numeric matrix norm(as.matrix(x1)) # [1] 6 as.matrix(x1) # [,1] # [1,] …
Hanfei Sun
  • 45,281
  • 39
  • 129
  • 237