Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
49
votes
12 answers

How do I determine the standard deviation (stddev) of a set of values?

I need to know if a number compared to a set of numbers is outside of 1 stddev from the mean, etc..
dead and bloated
  • 493
  • 1
  • 5
  • 5
49
votes
6 answers

Calculate the Cumulative Distribution Function (CDF) in Python

How can I calculate in python the Cumulative Distribution Function (CDF)? I want to calculate it from an array of points I have (discrete distribution), not with the continuous distributions that, for example, scipy has.
wizbcn
  • 1,064
  • 1
  • 12
  • 19
49
votes
3 answers

Multivariate time series modelling in R

I want do fit some sort of multi-variate time series model using R. Here is a sample of my data: u cci bci cpi gdp dum1 dum2 dum3 dx 16.50 14.00 53.00 45.70 80.63 0 0 1 6.39 17.45 16.00 64.00 …
Karl
  • 5,573
  • 8
  • 50
  • 73
48
votes
8 answers

How to generate distributions given, mean, SD, skew and kurtosis in R?

Is it possible to generate distributions in R for which the Mean, SD, skew and kurtosis are known? So far it appears the best route would be to create random numbers and transform them accordingly. If there is a package tailored to generating…
Aaron B
  • 583
  • 1
  • 5
  • 5
48
votes
2 answers

why does scikitlearn says F1 score is ill-defined with FN bigger than 0?

I run a python program that calls sklearn.metrics's methods to calculate precision and F1 score. Here is the output when there is no predicted sample: /xxx/py2-scikit-learn/0.15.2-comp6/lib/python2.6/site-packages/sklearn/metr\ ics/metrics.py:1771:…
Tim
  • 1
  • 141
  • 372
  • 590
48
votes
2 answers

Python pandas returns empty correlation matrix

I am running Python 2.7.6, pandas 0.13.1. I am unable to compute a correlation matrix from a DataFrame, and I'm not sure why. Here is my example DataFrame (foo): A B C 2011-10-12 0.006204908…
Max
  • 1,670
  • 1
  • 12
  • 17
47
votes
6 answers

Function to calculate R2 (R-squared) in R

I have a dataframe with observed and modelled data, and I would like to calculate the R2 value. I expected there to be a function I could call for this, but can't locate one. I know I can write my own and apply it, but am I missing something…
Esme_
  • 1,360
  • 3
  • 18
  • 30
47
votes
5 answers

Meaning of X = X[:, 1] in Python

I am studying this snippet of python code. What does X = X[:, 1] mean in the last line? def linreg(X,Y): # Running the linear regression X = sm.add_constant(X) model = regression.linear_model.OLS(Y, X).fit() a = model.params[0] b…
Taewan
  • 1,167
  • 4
  • 15
  • 25
47
votes
3 answers

predict.lm() in a loop. warning: prediction from a rank-deficient fit may be misleading

This R code throws a warning # Fit regression model to each cluster y <- list() length(y) <- k vars <- list() length(vars) <- k f <- list() length(f) <- k for (i in 1:k) { vars[[i]] <- names(corc[[i]][corc[[i]]!= "1"]) f[[i]] <-…
Mahsa
  • 531
  • 1
  • 5
  • 9
46
votes
3 answers

R Random Forests Variable Importance

I am trying to use the random forests package for classification in R. The Variable Importance Measures listed are: mean raw importance score of variable x for class 0 mean raw importance score of variable x for class…
thirsty93
  • 2,602
  • 6
  • 26
  • 26
46
votes
4 answers

Computing cross-correlation function?

In R, I am using ccf or acf to compute the pair-wise cross-correlation function so that I can find out which shift gives me the maximum value. From the looks of it, R gives me a normalized sequence of values. Is there something similar in Python's…
Legend
  • 113,822
  • 119
  • 272
  • 400
46
votes
5 answers

plotting a histogram on a Log scale with Matplotlib

I have a Pandas DataFrame that has the following values in a Series x = [2, 1, 76, 140, 286, 267, 60, 271, 5, 13, 9, 76, 77, 6, 2, 27, 22, 1, 12, 7, 19, 81, 11, 173, 13, 7, 16, 19, 23, 197, 167, 1] I was instructed to plot two histograms in a…
Tommy
  • 695
  • 2
  • 10
  • 15
46
votes
5 answers

How do I do a F-test in python

How do I do an F-test to check if the variance is equivalent in two vectors in Python? For example if I have a = [1,2,1,2,1,2,1,2,1,2] b = [1,3,-1,2,1,5,-1,6,-1,2] is there something similar to scipy.stats.ttest_ind(a, b) I found sp.stats.f(a,…
DrewH
  • 1,657
  • 3
  • 14
  • 10
45
votes
2 answers

Python p-value from t-statistic

I have some t-values and degrees of freedom and want to find the p-values from them (it's two-tailed). In the real world I would use a t-test table in the back of a Statistics textbook; how do I do the equivalent in Python? e.g. t-lookup(5, 7) =…
Andrew Latham
  • 5,982
  • 14
  • 47
  • 87
45
votes
5 answers

How to get GitHub Clone stats?

There used to be a "Clones" sub-tab in the "Stats & Graphs" tab of GitHub (for example https://github.com/TeamMentor/TeamMentor-Documentation/graphs/impact) but that is gone. Is there another way to get these stats? It would be great if we could get…
Dinis Cruz
  • 4,161
  • 2
  • 31
  • 49