Questions tagged [scipy.stats]

297 questions
0
votes
0 answers

Fitting a lognormal distribution to negative values with scipy

I have a 40 year time-series of surge levels in the ocean to which I'm trying to fit a lognormal distribution using scipy.stats. However, as far as I know (and read) a lognormal distribution cannot have negative values by definition. The scipy…
0
votes
0 answers

Time consumption of SciPy's bootstrap as a function of the number of resamples

I have a large dataset, with on the order of 2^15 entries, and I calculate the confidence interval of the mean of the entries with scipy.stats.bootstrap. For a dataset this size, this costs about 6 seconds on my laptop. I have a lot of datasets, so…
Georg
  • 113
  • 4
0
votes
0 answers

How does the scipy.stats.shapiro calculate a test statistic

I know that scipy.stats.shapiro is used to test for normality. I also know that the calculation of a test statistic always involves the mean m0 or variance of the population (as explained here). My question then: What type of test statistic (t,…
Nemo
  • 1,124
  • 2
  • 16
  • 39
0
votes
0 answers

p-value from scipy.stats.ttest_1samp

I am running the one sample t-test using the following python code: import scipy import numpy as np mu, sigma = 0.67, 0.11 s = np.random.normal(mu, sigma, 10000) scipy.stats.ttest_1samp(s, popmean=0.60, alternative='greater') which…
Giulia B.
  • 31
  • 1
  • 3
0
votes
1 answer

Fitting & scaling a probability density function correctly to a histogram with a logarithmic x-axis?

I am trying to fit a gilbrat PDF to a dataset (that I have in form of a list). I want to show the data in a histogram with a logarithmic x-scale and add the fitted curve. However, the curve seems too flat compared to the histogram, like in this…
0
votes
0 answers

Can't run student t_test using combinations

I've been trying to implement a t_student test in a DataFrame but I always end up with an error like raise KeyError(key) KeyError: 'patid' This is my DataFrame: df = pd.DataFrame.from_records(data=[ dict(id=1, rd=True, drk=True, hn=True,…
0
votes
0 answers

Fast binning of geographical data with negative values

Trying to bin some geolocated data using scipy stats.binned_statisc_2d but it seems there cannot exist any negative values in the data. Is there a way to do this accurately and fast? import numpy as np ilats = np.linspace(90,-90, 4000) ilons =…
Shejo284
  • 4,541
  • 6
  • 32
  • 44
0
votes
0 answers

Drawing sample and calculating sample probability from multivariate normal distribution using scipy.stats.multivariate_normal

I would like to do something that is likely very simple, but is giving me difficulty. Trying to draw N samples from a multivariate normal distribution and calculate the probability of each of those randomly drawn samples. Here I attempt to use…
0
votes
1 answer

Check result of chi square test on pandas columns data

I wrote the test according to an approach I found. When looking in Stack Overflow I saw another approach (can be seen here) which was a little more complicated, and made me wonder if I chose the right one. I'm looking for ways to check if my…
Ziv
  • 109
  • 10
0
votes
0 answers

How to fit data with log-normal distribution using norm.fit() in Scipy

I am trying to use Scipy.stats norm.fit() with some modifications to fit data with a log-normal distribution. And I want to verify the result with fitting the data using Scipy.stats lognorm.fit(). The result comes out to be just similar, but it…
0
votes
1 answer

scipy.stats.multivariable_norm.pdf: "The input matrix must be symmetric positive semidefinite."

So I have the following code below. L = np.array([1,2,3]) M = np.array([1,2,3]) Q = np.random.uniform(0,10,size=(3,3)) S = Q.T*Q print(sp.stats.multivariate_normal.pdf(L,M,S)) Clearly S is a symmetric positive semidefinite matrix. I can prove it…
Kookie
  • 328
  • 4
  • 14
0
votes
2 answers

Excel vs. Sci Kit Learn Linear Regression or scipy.stats Provide Different Slopes, Intercepts, R2 Values

I cannot figure out why I get different values for slope, intercept, and r2 values from excel vs. scikit learn (or scipy.stats!). This is a very simple linear regression, literally six "x" values and six "y" values. I use Excel all the time for…
0
votes
0 answers

Python: How to discretize continuous probability distributions for Kullback-Leibler Divergence

I want to find out how many samples are needed at minimum to more or less correctly fit a probability distribution (In my case the Generalized Extreme Value Distribution from scipy.stats). In order to evaluate the matched function, I want to compute…
0
votes
0 answers

Using kstest from scipy.stats within python - it seems I'm calling the cdf() wrong?

I've been using kstest to try to see if a distribution fits my data, going through the discrete distributions from this link. I've managed to get to logser (logarithmic discrete random variable), but I can't figure out how to make this work. I've…
0
votes
0 answers

How to compute the slope of random set using the bootstrap method?

Somehow, the scipy.stats.bootstrap is not working in my Jupyter notebook. Therefore, I decide to write a sample function for bootstrap estimation. Here is what I did. def bootstrap(x, Nboot, statfun): '''Bootstrap code''' x = np.array(x) …
Adnan
  • 51
  • 6