Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
5
votes
1 answer

Profiling SVM (e1071) in R

I am new to R and SVMs and I am trying to profile svm function from e1071 package. However, I can't find any large dataset that allows me to get a good profiling range of results varying the size of the input data. Does anyone know how to work svm…
Manolete
  • 3,431
  • 7
  • 54
  • 92
5
votes
1 answer

weighted standard deviation in sql server without aggregation error

Redoing the weighted mean (which is already in another column) in working out the weighted-Sum-Of-Squared-Deviations, results in the error "Cannot perform an aggregate function on an expression containing an aggregate or a…
user1444275
  • 51
  • 1
  • 2
5
votes
1 answer

How to compute CDF probability of normal distribution in C++?

Is there any function that allow me to compute the CDF probability of a normal distribution, given a mean and sigma ? i.e. for example P( X < x ) given the normal distribution with $\bar{x}$ and $\sigma$. I think boost have this, but I think that it…
shn
  • 5,116
  • 9
  • 34
  • 62
5
votes
1 answer

Calculate variogram of raster data with NAs in R

Summary: I have a raster dataset which contains NA values, and want to calculate a variogram of it, ignoring the NAs. How can I do this? I have an image which I have loaded into R using the readGDAL function, stored as im. To make this reproducible,…
robintw
  • 27,571
  • 51
  • 138
  • 205
5
votes
2 answers

Fitting a distribution to data - MATLAB

I am trying to fit a distribution to some data I've collected from microscopy images. We know that the peak at about 152 is due to a Poisson process. I'd like to fit a distribution to the large density in the center of the image, while ignoring the…
kelvin_11
  • 153
  • 1
  • 1
  • 4
5
votes
1 answer

Looking for a simple machine learning approach to predict final exam score from training set

I am trying to predict test reuslts based on known previous scores. The test is made up of three subjects, each contributing to the final exam score. For all students I have their previous scores for mini-tests in each of the three subjects, and I…
5
votes
4 answers

How to see contents of /proc/[pid]/status after process finishes execution?

I want to see statistics of a small C program, but is a small program that begins and ends. (Not some program that is long time running). I want to improve this program in terms of access to memory, cache hits, context switches, and that sort of…
jperelli
  • 6,988
  • 5
  • 50
  • 85
5
votes
2 answers

Is there a Java library that implements one of the tests for the normality of a sample distribution?

I have a dataset and I want to test to see how close it is to a normal or gaussian distribution. I know there are a variety of algorithms for doing this, eg. the Jarque-Bera test, the Anderson–Darling test and many others. I'm hoping to find an…
sanity
  • 35,347
  • 40
  • 135
  • 226
5
votes
2 answers

ORDER BY RAND() seems to be less than random

I have a fairly simple SQL (MySQL): SELECT foo FROM bar ORDER BY rank, RAND() I notice that when I refresh the results, the randomness is suspiciously weak. In the sample data at the moment there are six results with equal rank (integer zero).…
spraff
  • 32,570
  • 22
  • 121
  • 229
4
votes
1 answer

multivariate skew normal in R

I'm trying to generate random numbers with a multivariate skew normal distribution using the rmsn command from the sn package in R. I would like, ideally, to be able to get three columns of numbers with a specified variances and covariances, while…
Malcolm
  • 41
  • 2
4
votes
2 answers

Algorithm to "smooth out" data values for visualization

I'm reading some data for countries around the world and am playing with Google's visualization gadgets, in particular the map visualizations. The problem is, that the US always comes out way in front. While most countries have values between 1 and…
deceze
  • 510,633
  • 85
  • 743
  • 889
4
votes
4 answers

random variable from skewed distribution with scipy

trying to draw a random number from a distribution in SciPy, just like you would with stats.norm.rvs. However, I'm trying to take the number from an empirical distribution I have - it's a skewed dataset and I want to incorporate the skew and…
eric p
  • 235
  • 5
  • 13
4
votes
3 answers

Finding PI digits using Monte Carlo

I have tried many algorithms for finding π using Monte Carlo. One of the solutions (in Python) is this: def calc_PI(): n_points = 1000000 hits = 0 for i in range(1, n_points): x, y = uniform(0.0, 1.0), uniform(0.0, 1.0) …
Jon Romero
  • 4,062
  • 6
  • 36
  • 34
4
votes
2 answers

Gaussian Kernel Density Estimation (KDE) of large numbers in Python

I have 1000 large numbers, randomly distributed in range 37231 to 56661. I am trying to use the stats.gaussian_kde but something does not work. (maybe because of my poor knowledge of statistics?). Here is the code: from scipy import…
Proteos
  • 83
  • 1
  • 8
4
votes
1 answer

Android installed base dropped significantly in a span of 2 days. How to find out what happened?

I am checking stats for my app on the Android Marketplace (ahem Google Play) and inexplicably the numbers for my app dropped like a rock between the dates of Feb 12 and 14. I did not release new versions or anything. And nothing like this happened…
AngryHacker
  • 59,598
  • 102
  • 325
  • 594