Questions tagged [statistics]

Consider whether your question would be better asked at https://stats.stackexchange.com. Statistics is the mathematical study of using probability to infer characteristics of a population from a limited number of samples or observations.

Statistics is the scientific study of the collection, analysis, interpretation, presentation, and organization of data. Numerous programming languages provide support for implementing statistical techniques.

Consider whether your question would be better asked at CrossValidated, a Stack Exchange site for probability, statistics, data analysis, data mining, experimental design, and machine learning. StackOverflow questions on statistics should be about implementation and programming problems, not about theoretical discussions of statistics or research design. Therefore, this tag should never be used alone but always in combination with a specific programming language (like for example , , , , ).

16319 questions
4
votes
1 answer

Data mining for significant variables (numerical): Where to start?

I have a trading strategy on the foreign exchange market that I am attempting to improve upon. I have a huge table (100k+ rows) that represent every possible trade in the market, the type of trade (buy or sell), the profit/loss after that trade…
Mike Furlender
  • 3,869
  • 5
  • 47
  • 75
4
votes
1 answer

Partial Least Squares Implementation for C/C++?

Does anyone know of an open-source implementation of a partial least squares algorithm in C or C++?
Paul
  • 6,435
  • 4
  • 34
  • 45
4
votes
3 answers

How to check for unit root in Panel Data using Python?

I am working on time series analysis and I have sales data (lets call it df_panel as we panel data structure) for 700 individual areas for each month of 2021. e.g. Area Month Sales Area 1 January 1000 Area 1 February 2000 Area…
smallbirds
  • 877
  • 12
  • 35
4
votes
1 answer

pdf of a particular distribution

I am new to Matlab. I would like to check the so call "logarithmic law" for determinant of random matrices with Matlab, but still do not know how. Logarithmic law: Let A be a random Bernoulli matrix (entries are iid, taking value +-1 with prob.…
H. H.
  • 41
  • 1
4
votes
2 answers

Need to fix Stan code for the generalized pareto distribution to take in real arguments rather than vectors

I am using the functions defined here: Extreme value analysis and user defined probability functions in Stan for modeling the data with a generalized pareto distribution, but my problem is that my model is in a for-loop and expects three real valued…
John Smith
  • 133
  • 8
4
votes
2 answers

How to generate random normal distribution without numpy? (Google interview)

So I have a data science interview at Google, and I'm trying to prepare. One of the questions I see a lot (on Glassdoor) from people who have interviewed there before has been: "Write code to generate random normal distribution." While this is…
Kelsey
  • 401
  • 9
  • 21
4
votes
0 answers

My Julia is broken???? When I use RCall it pops up and gives me the intro, then i cant type into the console

The title says it all. I'm not sure what to do. I tried to download a package in R through Julia and I think it's all broken now (when I try to use RCall. Is there anything I can do? I've found nothing on the internet about it. _ _…
Jeremy S
  • 141
  • 5
4
votes
2 answers

How can I identify which factor group a value belongs to?

I'm using the cut function to split my data into groups using the max/min range. here is an example of the code that I am using: # sample data frame - used to identify intial groups testdf <- data.frame(a = c(1:100), b = rnorm(100)) # split into…
djq
  • 14,810
  • 45
  • 122
  • 157
4
votes
0 answers

How to perform Principal Component Analysis for different discharge stations?

This is rather a conceptual question for me. I have station discharge data. A reproducible example is shown below: > A <- rnorm(n = 10) > B <- rnorm(n = 10) > C <- rnorm(n = 10) > D <- rnorm(n = 10) > Year <- seq(1981, 1990,1) > df <-…
Sayantan4796
  • 169
  • 1
  • 10
4
votes
2 answers

How to associate point on a curve with points in an array of objects?

I have a bunch of names from the web (first name, last name, of people in different countries). Some of the countries have statistics on how many people have each last name, as shown in some places like here. Well, that Japanese surname list only…
Lance
  • 75,200
  • 93
  • 289
  • 503
4
votes
1 answer

Normalizing a Phylogenetic Tree in R

When working with phylogenetic tree data in R (specifically when working with "phylo" or "phylo4" objects) it would be useful to normalize branch lengths so that certain taxa (the ones that evolve faster) do not contribute a disproportionate amount…
Krisrs1128
  • 311
  • 4
  • 8
4
votes
2 answers

What's a good technique to store a time-dependent metric in Redis?

I have some metrics (like counts of logged in users, or SQL queries, or whatever), and I want to gather some time-dependent stats on a regular basis. For example I want to know how many users were registered in some particular year, month, week, day…
Valentin Golev
  • 9,965
  • 10
  • 60
  • 84
4
votes
1 answer

Numeric precision for log(1-exp(x))

I'm doing some math with really big numbers (I'm using Python, but this question isn't Python specific). For one value, I have a formula that gives me f(t) = Pr(X < t). I want to use this formula to get Pr(X >= t) = 1 - f(t). Because f(t) returns…
user
  • 7,123
  • 7
  • 48
  • 90
4
votes
1 answer

statsmodels.tsa._stl.STL "Unable to determine period from endog"

I want to get decomposed by statsmodels STL method my time series data looks like bellow: success.rate Date 2020-09-11 24.735701 2020-09-14 24.616301 2020-09-15 24.695900 2020-09-16 24.467051 2020-09-17 24.118799 when I put it into…
yunfei
  • 526
  • 2
  • 6
  • 20
4
votes
2 answers

Random.choices not returning uniform distribution

I am trying to simulate a uniform distribution of discrete values using random.choices. Each time that a new set is generated, a key representing the unique counts is incremented. Why is the uniform outcome ([2,2]) less likely to occur than…
Daniel Scott
  • 979
  • 7
  • 16
1 2 3
99
100