0

I would like to apply the statistical bootstrapping method with scipy.stats.bootstrap.

In my code below, I load two distinct .txt files into Python. Each file contains a single column of numeric values (floats). I would like to calculate the coefficient of variation (CV) for each file and compare if their difference in the CV is statistically significant. That is the reason why I use bootstrapping.

Here is the full code:

import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import scipy as sp
from scipy import stats


# Coefficient of variation
Core_values = np.loadtxt(f"pathtofile/file1.txt", comments=None, delimiter=None, converters=None,
skiprows=0, usecols=0,unpack=False, ndmin=0, encoding=None, max_rows=None, like=None)

Periphery_values = np.loadtxt(f"pathtofile/file2.txt", comments=None, delimiter=None, converters=None,
skiprows=0, usecols=0, unpack=False, ndmin=0, encoding=None, max_rows=None, like=None)

Results = sp.stats.bootstrap((Core_values, Periphery_values), sp.stats.variation((Core_values, Periphery_values), axis=None), vectorized=False, paired=True, confidence_level=0.95, n_resamples=20000)

print(Results)

When I only apply the code that calculates the CV of both files as follows:

Results =  sp.stats.variation((Core_values, Periphery_values), axis=None)
print(Results)

Python gives me the correct result, i.e., one CV value of the values from both input files. However, when implementing sp.stats.variation((Core_values, Periphery_values), axis=None) into the the code for bootstrapping as shown in the full code, I receive the following error message: TypeError: 'numpy.float64' object is not callable

I therefore assume that my mistake is that I provide both samples (Core_values and Periphery_values) into

Results = sp.stats.bootstrap((Core_values, Periphery_values), sp.stats.variation...

I cannot figure out what the correct implementation would look like to tell Python that I would like to use both samples for bootstrapping in order to avoid the error message.

Philipp
  • 335
  • 2
  • 4
  • 12

1 Answers1

1

The answer is in the documentation for the second argument of scipy.stats.bootstrap, statistic: "statistic must be a callable [emphasis added] that accepts len(data) samples as separate arguments [where data is the first argument to bootstrap] and returns the resulting statistic. If vectorized is set True, statistic must also accept a keyword argument axis and be vectorized to compute the statistic along the provided axis."

A "callable" is a function or something that acts like a function. What you provided as the statistic argument, sp.stats.variation((Core_values, Periphery_values), axis=None) is neither. It is just a number (specifically a floating-point value), hence the error, "TypeError: numpy.float64 object is not callable`.

What you might want to pass as a statistic argument is perhaps something like this: lambda c,v: sp.stats.variation((c,v), axis=None). A lambda is a callable. Note that, following the documentation for statistic, the number of arguments in the lambda is the same as the length of the first argument of bootstrap, i.e the tuple (Core_values, Periphery_values). However, this particular lambda may not do what you really want. (Whether it does is out of scope for this particular question.)

jjramsey
  • 1,131
  • 7
  • 17