Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example r, python, spss, sas, matlab).

16319 questions

votes

9 answers

Pandas - Compute z-score for all columns

I have a dataframe containing a single column of IDs and all other columns are numerical values for which I want to compute z-scores. Here's a subsection of it: ID Age BMI Risk Factor PT 6 48 19.3 4 PT 8 43 20.9 NaN PT…

python pandas dataframe indexing statistics

asked Jul 15 '14 at 15:18

Slavatron

2,278
5
29
40

votes

14 answers

Rolling variance algorithm

I'm trying to find an efficient, numerically stable algorithm to calculate a rolling variance (for instance, a variance over a 20-period rolling window). I'm aware of the Welford algorithm that efficiently computes the running variance for a stream…

algorithm statistics variance

asked Feb 28 '11 at 20:46

Abiel

5,251
9
54
74

votes

3 answers

T-test in Pandas

If I want to calculate the mean of two categories in Pandas, I can do it like this: data = {'Category': ['cat2','cat1','cat2','cat1','cat2','cat1','cat2','cat1','cat1','cat1','cat2'], 'values': [1,2,3,1,2,3,1,2,3,5,1]} my_data =…

python pandas scipy statistics hypothesis-test

asked Nov 15 '12 at 19:11

hirolau

13,451
8
35
47

votes

14 answers

Select k random elements from a list whose elements have weights

Selecting without any weights (equal probabilities) is beautifully described here. I was wondering if there is a way to convert this approach to a weighted one. I am also interested in other approaches as well. Update: Sampling without replacement

algorithm math random statistics probability

asked Jan 26 '10 at 16:26

nimcap

10,062
15
61
69

votes

7 answers

Convert Z-score (Z-value, standard score) to p-value for normal distribution in Python

How does one convert a Z-score from the Z-distribution (standard normal distribution, Gaussian distribution) to a p-value? I have yet to find the magical function in Scipy's stats module to do this, but one must be there.

python statistics scipy

asked Aug 16 '10 at 19:35

gotgenes

38,661
28
100
128

votes

2 answers

Confidence intervals for predictions from logistic regression

In R predict.lm computes predictions based on the results from linear regression and also offers to compute confidence intervals for these predictions. According to the manual, these intervals are based on the error variance of fitting, but not on…

r statistics glm confidence-interval

asked Jan 20 '13 at 09:45

unique2

2,162
2
18
23

votes

1 answer

Statistical performance of purely functional maps and sets

Given a data structure specification such as a purely functional map with known complexity bounds, one has to pick between several implementations. There is some folklore on how to pick the right one, for example Red-Black trees are considered to be…

data-structures functional-programming statistics avl-tree red-black-tree

asked Apr 05 '13 at 16:44

t0yv0

4,714
19
36

votes

4 answers

Standard deviation of generic list?

I need to calculate the standard deviation of a generic list. I will try to include my code. Its a generic list with data in it. The data is mostly floats and ints. Here is my code that is relative to it without getting into to much detail:…

c# math statistics standard-deviation

asked Jun 29 '10 at 14:35

Tom Hangler

votes

5 answers

Simple statistics - Java packages for calculating mean, standard deviation, etc

Could you please suggest any simple Java statistics packages? I don't necessarily need any of the advanced stuff. I was quite surprised that there does not appear to be a function to calculate the Mean in the java.lang.Math package... What are you…

java math statistics package

asked Nov 14 '09 at 22:43

Peter Perháč

20,434
21
120
152

votes

4 answers

Constructing a co-occurrence matrix in python pandas

I know how to do this in R. But, is there any function in pandas that transforms a dataframe to an nxn co-occurrence matrix containing the counts of two aspects co-occurring. For example a matrix df: import pandas as pd df = pd.DataFrame({'TFD' :…

python pandas statistics

asked Dec 13 '13 at 19:15

user3084006

5,344
11
32
41

votes

4 answers

Warning: non-integer #successes in a binomial glm! (survey packages)

I am using the twang package to create propensity scores, which are used as weights in a binomial glm using survey::svyglm. The code looks something like this: pscore <- ps(ppci ~ var1+var2+.........., data=dt....) dt$w <- get.weights(pscore,…

r statistics glm

asked Oct 18 '12 at 10:57

Robert Long

5,722
5
29
50

votes

5 answers

Screening (multi)collinearity in a regression model

I hope that this one is not going to be "ask-and-answer" question... here goes: (multi)collinearity refers to extremely high correlations between predictors in the regression model. How to cure them... well, sometimes you don't need to "cure"…

r statistics regression

asked Jun 15 '10 at 02:10

aL3xa

35,415
18
79
112

votes

5 answers

Pythonic way of detecting outliers in one dimensional observation data

For the given data, I want to set the outlier values (defined by 95% confidense level or 95% quantile function or anything that is required) as nan values. Following is the my data and code that I am using right now. I would be glad if someone could…

python numpy matplotlib statistics statsmodels

asked Mar 12 '14 at 14:07

user3410943

votes

8 answers

Sorting algorithms for data of known statistical distribution?

It just occurred to me, if you know something about the distribution (in the statistical sense) of the data to sort, the performance of a sorting algorithm might benefit if you take that information into account. So my question is, are there any…

algorithm performance sorting statistics complexity-theory

asked May 29 '11 at 07:46

static_rtti

53,760
47
136
192

votes

9 answers

Variance Inflation Factor in Python

I'm trying to calculate the variance inflation factor (VIF) for each column in a simple dataset in python: a b c d 1 2 4 4 1 2 6 3 2 3 7 4 3 2 8 5 4 1 9 4 I have already done this in R using the vif function from the usdm library which gives the…

python r numpy statistics statsmodels

asked Mar 07 '17 at 21:09

Nizag

Prev 1 2 3

…

99 100 Next