Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
5
votes
2 answers

Random identifier in java

I would like to generate random identifier in java. The identifier should have a fixed size, and the probability of generating the same identifier twice should be very low(The system has about 500 000 users).In addition; the identifier should be so…
magnarwium
  • 235
  • 2
  • 14
5
votes
2 answers

Correlation between numeric and boolean variables

I am creating a plot in R, using: plot(IQ, isAtheist) abline(lm(isAtheist~IQ)) IQ is numeric and isAtheist is boolean, having values TRUE or FALSE. I have tried to write: cor(IQ, isAtheist) But it is gives me an error: Error in cor(IQ, isAtheist)…
Edward Ruchevits
  • 6,411
  • 12
  • 51
  • 86
5
votes
2 answers

SQL Statistical sampling

I'm looking for some genius SQL help with a tricky statistical problem I'm having. What I'm looking to do is pull a statistically balanced sample out of an unbalanced group of user profiles. Doing this for a single profile attribute (e.g. gender)…
tbacos
  • 733
  • 2
  • 7
  • 12
5
votes
6 answers

What is a "good" R value when comparing 2 signals using cross correlation?

I apologize for being a bit verbose in advance: if you want to skip all the background mumbo jumbo you can see my question down below. This is pretty much a follow up to a question I previously posted on how to compare two 1D (time dependent)…
oort
  • 1,840
  • 2
  • 20
  • 29
5
votes
1 answer

How to calculate error for polynomial fitting (in slope and intercept)

Hi I want to calculate errors in slope and intercept which are calculated by scipy.polyfit function. I have (+/-) uncertainty for ydata so how can I include it for calculating uncertainty into slope and intercept? My code is, from scipy import…
physics_for_all
  • 2,193
  • 4
  • 19
  • 20
5
votes
3 answers

How to group time intervals in php/mysql and get statistics based on those time groups?

I am new to php and I have a problem with time/date manipulation. I need to make statistics about daily/monthly/yearly visits in some store. There is a mysql database with table "statistics" and fields: "statistic_id" ( integer, primary key ) ,…
offline
  • 1,589
  • 1
  • 21
  • 42
5
votes
2 answers

trying to compare two distributions

I found this code on internet that compares a normal distribution to different student distributions: x <- seq(-4, 4, length=100) hx <- dnorm(x) degf <- c(1, 3, 8, 30) colors <- c("red", "blue", "darkgreen", "gold", "black") labels <- c("df=1",…
jeremy.staub
  • 369
  • 4
  • 12
5
votes
1 answer

Difference between regression tree and model tree

I need some help in understanding the difference between regression trees and linear model tree. Regards Shahzad
Shahzad
  • 1,999
  • 6
  • 35
  • 44
5
votes
5 answers

How to create graphs in Rails?

Does anyone know how do I create graphs in Rails. This is because I need to read and present statistics in rails and the best way to present the statistics is in a graph. So can anyone tell me how do I do so? Thanks! :)
user1480797
  • 185
  • 1
  • 4
  • 16
5
votes
3 answers

Truncating SciPy random distributions

Does anyone have suggestions for efficiently truncating the SciPy random distributions. For example, if I generate random values like so: import scipy.stats as stats print stats.logistic.rvs(loc=0, scale=1, size=1000) How would I go about…
TimY
  • 5,256
  • 5
  • 44
  • 57
5
votes
1 answer

Error in fitting a model with gee(): NA/NaN/Inf in foreign function call (arg 3)

I'm fitting a gee model on a dataset including 13,500 observations (here students). Students are grouped into 52 different schools. I know that there is evidence that students are nested within schools (low ICC) and therefore I should adjust this…
Sam
  • 4,357
  • 6
  • 36
  • 60
5
votes
2 answers

Is there a scientific library for generating probability distributions in JavaScript?

Is there a scientific library in JavaScript that can generate probability distributions like this library in Ruby? http://rb-gsl.rubyforge.org/ For more details on the use cases see this related question: Generate Array of Numbers that fit to a…
Lance
  • 75,200
  • 93
  • 289
  • 503
5
votes
1 answer

Naive Bayes row classification

How do you classify a row of seperate cells in MATLAB? At the moment I can classify single coloums like so: training = [1;0;-1;-2;4;0;1]; % this is the sample data. target_class = ['posi';'zero';'negi';'negi';'posi';'zero';'posi']; % target_class…
G Gr
  • 6,030
  • 20
  • 91
  • 184
5
votes
1 answer

ADF test in statsmodels in Python

I am trying to run a Augmented Dickey-Fuller test in statsmodels in Python, but I seem to be missing something. This is the code that I am trying: import numpy as np import statsmodels.tsa.stattools as ts x = np.array([1,2,3,4,3,4,2,3]) result =…
Akavall
  • 82,592
  • 51
  • 207
  • 251
5
votes
2 answers

postgres STDDEV aggregate behavior when n<2

My Postgres query calculates statistical aggregate from a bunch of sensor readings: SELECT to_char(ipstimestamp, 'YYYYMMDDHH24') As row_name, to_char(ipstimestamp, 'FMDD mon FMHH24h') As hour_row_name, varid As category,…
aag
  • 680
  • 2
  • 12
  • 33