Questions tagged [statistical-sampling]

33 questions
1
vote
1 answer

Sampling on a aggregated dataset

Input is a dataset where every row contains for an event, say click. The member ID is a unique ID. sample data: M1,100 M2,100 M3,50 M4,50 The goal is to sample 1% of the clicks, where total clicks are given by summing up all clicks across all…
Duckling
  • 923
  • 7
  • 12
1
vote
4 answers

recognize the levels of 1D data by only knowing the number of levels

I have a sensor that output data consist of one attribute (mono value). An example of punch of sequenced data is as…
1
vote
1 answer

why divide sample standard deviation by sqrt(sample size) when calculating z-score

I have been following Khan Academy videos to gain understanding of hypothesis testing, and I must confess that all my understanding thus far is based on that source. Now, the following videos talk about z-score/hypothesis testing: Hypothesis…
1
vote
1 answer

SAS - proc cusum not found

I have the following SAS code: data vis; input v; datalines; 3169 3173 3162 3154 3139 3145 3160 3172 3175 3205 3203 3209 3208 3211 3214 3215 3209 3203 3185 3187 3192 3199 3197 3193 3190 3183 3197 3188 3183 3175 3174 3171 3180 3179 3175 3174 ; proc…
Stoner
  • 846
  • 1
  • 10
  • 30
1
vote
1 answer

Music genre classification with sklearn: how to accurately evaluate different models

I'm working on a project to classify 30 second samples of audio from 5 different genres (rock, electronic, rap, country, jazz). My dataset consists of 600 songs, exactly 120 for each genre. The features are a 1D array of 13 mfccs for each song and…
ohbrobig
  • 939
  • 2
  • 13
  • 34
1
vote
2 answers

How to remove a percentage from a dataset in Weka but keep the class balance?

I have a data set with 50% instances from class A and 50% instances of class B. I want to split my data set into a training set and a test set. I know the RemovePercentage filter exists but it doesn't care about the class balance. How do I remove…
Stanko
  • 4,275
  • 3
  • 23
  • 51
0
votes
0 answers

Error associated with using NumPyro to create a linear regression model

I'm using Numpyro to create a simple linear regression model consisting of two variables, the aim is to obtain a similar graph to https://num.pyro.ai/en/latest/tutorials/bayesian_regression.html (3rd graph). I have used numpyro to generate 2000…
0
votes
1 answer

(R) Finding proportion of population defectives at probability 0.1 acceptance

I'm using the following R code: library(AcceptanceSampling) x <- OC2c(50, 2, type="hypergeom", N=4000) plot(x, xlim=c(0,0.2)) which generates the plot: I will like to find the proportion when P(accept) (Y-axis) is 0.1. Is there a way to do this…
Stoner
  • 846
  • 1
  • 10
  • 30
0
votes
0 answers

Is there a way to handle "cannot allocate vector of size" issue without dropping data?

Unlike a previous question about this, this case is different to that and that is why I'm asking. I have an already cleaned dataset containing 120 000 observations of 25 variables, and I am supposed to analyze it all through logistic regression and…
Aite97
  • 155
  • 1
  • 9
0
votes
0 answers

What do you do if the sample size for an A/B test is larger than the population?

I have a list of 7337 customers (selected because they only had one booking from March-August 2018). We are going to contact them and are trying to test the impact of these activities on their sales. The idea is that contacting them will cause them…
0
votes
1 answer

Generate n samples, Rejection sampling in R

Rejection Sampling Im working with rejection sampling with a truncated normal distribution, see r code below. How can I make the sampling stop at a specific n? for example 1000 observations. I.e. I want to stop the sampling when the number of…
0
votes
2 answers

What's wrong with this simple method to sample from multinomial in C#?

I wanted to implement a simple method to sample from a multinomial distribution in C# (the first argument is an array of integers we want to sample and the second one is the probabilities of selecting each of those integers). When I do this with…
Rohit Pandey
  • 2,443
  • 7
  • 31
  • 54
0
votes
1 answer

Fit a line to small multiples

I want to fit a line that goes through the mean of sampling distributions on a shared plot. This code creates a similar data set to the one I am using. It creates a sampling distribution and plots the distributions on the same graphs. Then, I draw a…
0
votes
2 answers

simple random sampling while pulling data from warehouse(oracle engine) using proc sql in sas

I need to pull humongous amount of data, say 600-700 variables from different tables in a data warehouse...now the dataset in its raw form will easily touch 150 gigs - 79 MM rows and for my analysis purpose I need only a million rows...how can I…
Rohan
  • 93
  • 1
  • 8
0
votes
1 answer

creating a stratified sample in SAS with known stratas

I have a target population with some characteristics and I have been asked to select an appropriate control based on these characteristics. I am trying to do a stratified sample using SAS base but I need to be able to define my 4 starta %s from my…
Annita
  • 1
  • 1