Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
141
votes
20 answers

Statistics: combinations in Python

I need to compute combinatorials (nCr) in Python but cannot find the function to do that in math, numpy or stat libraries. Something like a function of the type: comb = calculate_combinations(n, r) I need the number of possible combinations, not…
Morlock
  • 6,880
  • 16
  • 43
  • 50
138
votes
12 answers

How do I calculate r-squared using Python and Numpy?

I'm using Python and Numpy to calculate a best fit polynomial of arbitrary degree. I pass a list of x values, y values, and the degree of the polynomial I want to fit (linear, quadratic, etc.). This much works, but I also want to calculate r…
Travis Beale
  • 5,534
  • 7
  • 34
  • 34
136
votes
8 answers

How to calculate cumulative normal distribution?

I am looking for a function in Numpy or Scipy (or any rigorous Python library) that will give me the cumulative normal distribution function in Python.
toma
136
votes
3 answers

What exactly does numpy.exp() do?

I'm very confused as to what np.exp() actually does. In the documentation it says that it: "Calculates the exponential of all elements in the input array." I'm confused as to what exactly this means. Could someone give me more information to what it…
bugsyb
  • 5,662
  • 7
  • 31
  • 47
127
votes
13 answers

Rolling median algorithm in C

I am currently working on an algorithm to implement a rolling median filter (analogous to a rolling mean filter) in C. From my search of the literature, there appear to be two reasonably efficient ways to do it. The first is to sort the initial…
AWB
  • 1,443
  • 2
  • 10
  • 8
124
votes
6 answers

Find percentile stats of a given column

I have a pandas data frame my_df, where I can find the mean(), median(), mode() of a given column: my_df['field_A'].mean() my_df['field_A'].median() my_df['field_A'].mode() I am wondering is it possible to find more detailed stats such as 90…
Edamame
  • 23,718
  • 73
  • 186
  • 320
122
votes
18 answers

How to plot ROC curve in Python

I am trying to plot a ROC curve to evaluate the accuracy of a prediction model I developed in Python using logistic regression packages. I have computed the true positive rate as well as the false positive rate; however, I am unable to figure out…
user3847447
  • 1,291
  • 3
  • 11
  • 8
122
votes
10 answers

How to calculate probability in a normal distribution given mean & standard deviation?

How to calculate probability in normal distribution given mean, std in Python? I can always explicitly code my own function according to the definition like the OP in this question did: Calculating Probability of a Random Variable in a Distribution…
clwen
  • 20,004
  • 31
  • 77
  • 94
120
votes
5 answers

How to use the 'sweep' function

When I look at the source of R Packages, i see the function sweep used quite often. Sometimes it's used when a simpler function would have sufficed (e.g., apply), other times, it's impossible to know exactly what it's is doing without spending a…
doug
  • 69,080
  • 24
  • 165
  • 199
119
votes
9 answers

Geometric Mean: is there a built-in?

I tried to find a built-in for geometric mean but couldn't. (Obviously a built-in isn't going to save me any time while working in the shell, nor do I suspect there's any difference in accuracy; for scripts I try to use built-ins as often as…
doug
  • 69,080
  • 24
  • 165
  • 199
116
votes
4 answers

How can I compute a histogram (frequency table) for a single Series?

How can I generate a frequency table (or histogram) for a single Series? For example, if I have my_series = pandas.Series([1,2,2,3,3,3]), how can I get a result like {1: 1, 2: 2, 3: 3} - that is, a count of how many times each value appears in the…
Abe
  • 22,738
  • 26
  • 82
  • 111
111
votes
9 answers

Quantile-Quantile Plot using SciPy

How would you create a qq-plot using Python? Assuming that you have a large set of measurements and are using some plotting function that takes XY-values as input. The function should plot the quantiles of the measurements against the corresponding…
John
  • 1,721
  • 3
  • 15
  • 15
110
votes
3 answers

Two-sample Kolmogorov-Smirnov Test in Python Scipy

I can't figure out how to do a Two-sample KS test in Scipy. After reading the documentation of scipy kstest, I can see how to test whether a distribution is identical to standard normal distribution from scipy.stats import kstest import numpy as…
Akavall
  • 82,592
  • 51
  • 207
  • 251
108
votes
6 answers

Browser statistics on JavaScript disabled

I am having a hard time collecting publically available statistics on the percentage of web users that browse with JavaScript disabled. Yahoo has published data from 2010 and R. Reid published data from 2009 (picked from a site he had access to).…
Jesper Rønn-Jensen
  • 106,591
  • 44
  • 118
  • 155
104
votes
10 answers

Calculate mean and standard deviation from a vector of samples in C++ using Boost

Is there a way to calculate mean and standard deviation for a vector containing samples using Boost? Or do I have to create an accumulator and feed the vector into it?
user393144
  • 1,575
  • 3
  • 14
  • 21