Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
5
votes
1 answer

Approximate the distribution of a sum of binomial random variables in R

My goal is approximate the distribution of a sum of binomial variables. I use the following paper The Distribution of a Sum of Binomial Random Variables by Ken Butler and Michael Stephens. I want to write an R script to find Pearson approximation to…
5
votes
2 answers

Simple linear regression for data set

I am looking to create a trend function in C# for a set of data and it seems like using a big math library is a bit overkill for my needs. Given a list of values such as 6,13,7,9,12,4,2,2,1. I would like to get the slope of the simple linear…
Justin
  • 6,373
  • 9
  • 46
  • 72
5
votes
1 answer

Calculate urgency of a task from two variables

I am trying to figure out a formula to calculate the urgency of a set of arbitrary tasks, based on the number of days until a 'deadline' and the % completion of the task already completed. So far I have a 'function' which gives the represents: U =…
rStyles
  • 111
  • 6
5
votes
2 answers

manipulate data to better fit a Gaussian Distribution

I have got a question concerning normal distribution (with mu = 0 and sigma = 1). Let say that I firstly call randn or normrnd this way x = normrnd(0,1,[4096,1]); % x = randn(4096,1) Now, to assess how good x values fit the normal distribution, I…
fpe
  • 2,700
  • 2
  • 23
  • 47
5
votes
1 answer

Kurtosis of a normal distribution

According to what I read from here, the kurtosis of a normal distribution should be around 3. However, when I use the kurtosis function provided by MATLAB, I could not verify it: data1 = randn(1,20000); v1 = kurtosis(data1) It seems that the…
feelfree
  • 11,175
  • 20
  • 96
  • 167
5
votes
1 answer

statistics bootstrap library in Python?

Is there a statistics bootstrap library in Python? I would like to have functionality similar to what is offered in R bootstrap: http://statistics.ats.ucla.edu/stat/r/library/bootstrap.htm Searching I…
gliptak
  • 3,592
  • 2
  • 29
  • 61
5
votes
2 answers

How can I take multiple vectors and recode their datatypes in R?

I'm looking for an elegant way to change multiple vectors' datatypes in R. I'm working with an educational dataset: 426 students' answers to eight multiple choice questions (1 = correct, 0 = incorrect), plus a column indicating which instructor (1,…
briandk
  • 6,749
  • 8
  • 36
  • 46
5
votes
1 answer

Evaluating the ikelihood function in linear mixed models (lme4)

I am currently writing a script to evaluate the (restricted) log-likelihood function for use in linear mixed models. I need it to calculate the likelihood of a model with some parameters fixed to arbitrary values. Maybe this script is helpful to…
SimonG
  • 4,701
  • 3
  • 20
  • 31
5
votes
3 answers

Parallelizing on a supercomputer and then combining the parallel results (R)

I've got access to a big, powerful cluster. I'm a halfway decent R programmer, but totally new to shell commands (and terminal commands in general besides basic things that one needs to do to use ubuntu). I want to use this cluster to run a bunch…
generic_user
  • 3,430
  • 3
  • 32
  • 56
5
votes
2 answers

Best way to extract Mean Square Values from aov object in r

I'm trying to write a function to automate doing a variance analysis, part of which involves doing some further calculations. The method I've been using isn't very robust, if variable names change then it stops working. For this dummy data >…
PaulHurleyuk
  • 8,009
  • 15
  • 54
  • 78
5
votes
1 answer

How can I find out how many people are subscribed to an RSS feed i'm serving?

We have a site that is serving some RSS feeds, and we'd like to know how many people is subscribed to each one, without using a system like FeedBurner to serve them. The original approach to figuring this out was basically logging requests, and then…
Daniel Magliola
  • 30,898
  • 61
  • 164
  • 243
5
votes
2 answers

Fastest approximate counting algorithm

Whats the fastest way to get an approximate count of number of rows of an input file or std out data stream. FYI, this is a probabilistic algorithm, I can't find many examples online. The data could just be one or 2 columns coming from an awk…
5
votes
2 answers

Making a more efficient monte carlo simulation

So, I've written this code that should effectively estimate the area under the curve of the function defined as h(x). My problem is that i need to be able to estimate the area to within 6 decimal places, but the algorithm i've defined in estimateN…
Khodeir
  • 465
  • 4
  • 15
5
votes
1 answer

Why is my implementation of the parking lot test for random number generators producing bad results?

I'm trying to write an implementation of the parking lot test for random number generators. Here are the sources that I'm getting my information about the test from: Intel math library documentation and Page 4 of this paper along with the phi…
Chris Dibble
  • 388
  • 1
  • 5
  • 16
5
votes
3 answers

How do I compute the Inverse gaussian distribution from given CDF?

I want to compute the parameters mu and lambda for the Inverse Gaussian Distribution given the CDF. By 'given the CDF' I mean that I have given the data AND the (estimated) quantile for the data I.e. Quantile - Value 0.01 - 10 0.5 - 12 0.7 -…
user1141785
  • 431
  • 7
  • 21