Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
103
votes
8 answers

How to find the standard error of the mean?

Is there any command to find the standard error of the mean in R?
alex
  • 1,213
  • 2
  • 11
  • 9
102
votes
12 answers

Which Git commit stats are easy to pull

Previously I have enjoyed TortoiseSvn's ability to generate simple commit stats for a given SVN repository. I wonder what is available in Git and am particularly interested in : Number of commits per user Number of lines changed per user activity…
Jesper Rønn-Jensen
  • 106,591
  • 44
  • 118
  • 155
101
votes
8 answers

Is Python faster and lighter than C++?

I've always thought that Python's advantages are code readibility and development speed, but time and memory usage were not as good as those of C++. These stats struck me really hard. What does your experience tell you about Python vs C++ time and…
Alex
  • 43,191
  • 44
  • 96
  • 127
101
votes
4 answers

Probability to z-score and vice versa

How do I calculate the z score of a p-value and vice versa? For example if I have a p-value of 0.95 I should get 1.96 in return. I saw some functions in scipy but they only run a z-test on an array. I have access to numpy, statsmodel, pandas, and…
user3084006
  • 5,344
  • 11
  • 32
  • 41
96
votes
6 answers

Why is the Fibonacci series used in agile planning poker?

When estimating the relative size of user stories in agile software development the members of the team are supposed to estimate the size of a user story as being 1, 2, 3, 5, 8, 13, ... . So the estimated values should resemble the Fibonacci series.…
asmaier
  • 11,132
  • 11
  • 76
  • 103
96
votes
6 answers

Is a Markov chain the same as a finite state machine?

Is a finite state machine just an implementation of a Markov chain? What are the differences between the two?
Carson
  • 17,073
  • 19
  • 66
  • 87
96
votes
17 answers

How to efficiently calculate a running standard deviation

I have an array of lists of numbers, e.g.: [0] (0.01, 0.01, 0.02, 0.04, 0.03) [1] (0.00, 0.02, 0.02, 0.03, 0.02) [2] (0.01, 0.02, 0.02, 0.03, 0.02) ... [n] (0.01, 0.00, 0.01, 0.05, 0.03) I would like to efficiently calculate the mean and…
Alex Reynolds
  • 95,983
  • 54
  • 240
  • 345
95
votes
2 answers

How to highlight specific x-value ranges

I'm making a visualization of historical stock data for a project, and I'd like to highlight regions of drops. For instance, when the stock is experiencing significant drawdown, I would like to highlight it with a red region. Can I do this…
alexgolec
  • 26,898
  • 33
  • 107
  • 159
90
votes
3 answers

Is there a good math/stats library for Scala?

I'm looking for a good open source library for scala for math and statistics. Hopefully something like Apache Math or Colt, but implemented in Scala. Can anyone point me in the right direction?
dave
  • 12,406
  • 10
  • 42
  • 59
88
votes
14 answers

"On-line" (iterator) algorithms for estimating statistical median, mode, skewness, kurtosis?

Is there an algorithm to estimate the median, mode, skewness, and/or kurtosis of set of values, but that does NOT require storing all the values in memory at once? I'd like to calculate the basic statistics: mean: arithmetic average variance: …
Ryan B. Lynch
  • 2,307
  • 3
  • 21
  • 21
83
votes
17 answers

command line utility to print statistics of numbers in linux

I often find myself with a file that has one number per line. I end up importing it in excel to view things like median, standard deviation and so forth. Is there a command line utility in linux to do the same? I usually need to find the average,…
MK.
  • 3,907
  • 5
  • 34
  • 46
83
votes
6 answers

Boxplots in matplotlib: Markers and outliers

I have some questions about boxplots in matplotlib: Question A. What do the markers that I highlighted below with Q1, Q2, and Q3 represent? I believe Q1 is maximum and Q3 are outliers, but what is Q2?                        Question B How does…
Amelio Vazquez-Reina
  • 91,494
  • 132
  • 359
  • 564
83
votes
6 answers

Getting statistics from Google Play Developers with an API

I am in charge of developing a website which should be able to show statistics from both Apple's app store and Google Play Store to clients, so they can easily see what's going on. I have figured out some ways to get App Store's data, but the Google…
Cécile Fecherolle
  • 1,695
  • 3
  • 15
  • 32
81
votes
10 answers

What is a better way to sort by a 5 star rating?

I'm trying to sort a bunch of products by customer ratings using a 5 star system. The site I'm setting this up for does not have a lot of ratings and continue to add new products so it will usually have a few products with a low number of ratings. I…
Vizjerai
  • 1,912
  • 1
  • 24
  • 33
80
votes
3 answers

Explain the quantile() function in R

I've been mystified by the R quantile function all day. I have an intuitive notion of how quantiles work, and an M.S. in stats, but boy oh boy, the documentation for it is confusing to me. From the docs: Q[i](p) = (1 - gamma) x[j] + gamma …
Gregg Lind
  • 20,690
  • 15
  • 67
  • 81