How to pass values from pandas column into scipy.stats.binomtest in a vectorized way and store answers in new column?

Question

I have the dataframe births:

	year	sex	num_births	total_births
1	1880	M	110491	201484
3	1881	M	100743	192696
5	1882	M	113686	221533
7	1883	M	104627	216946
9	1884	M	114442	243462
11	1885	M	107799	240854
13	1886	M	110784	255317
15	1887	M	101413	247394
17	1888	M	120851	299473
19	1889	M	110580	288946

And I want to test use the binomtest from scipy and add the p-value to a new column:

(births
 .assign(binom_pvalue=lambda x: stats.binomtest(x.num_births, x.total_births).pvalue)
)

but I get the error TypeError: k must be an integer.

It looks like I am passing the whole series instead of the value for each row. However, this methods works when doing something like:

(num_births
 .assign(ratio=lambda x: x.num_births / x.total_births)
)

output:

	year	sex	num_births	total_births	ratio
1	1880	M	110491	201484	0.548386
3	1881	M	100743	192696	0.522808
5	1882	M	113686	221533	0.513179
7	1883	M	104627	216946	0.482272
9	1884	M	114442	243462	0.470061

In this scenario, it uses the value for each row in a vectorized fashion.

How can I use the binomtest function, using the same style as I am trying above?

Thankful for answers!

William

blunova · Answer 1 · 2022-09-18T12:59:38.087

0

You can try this using lambda and assign:

import pandas as pd
from scipy.stats import binomtest

births = pd.DataFrame(
    data={
        "year": [1880, 1881],
        "sex": ["M", "M"],
        "num_births": [110491, 100743],
        "total_births": [201484, 192696],
    }
)

births.assign(
    p_value=lambda pvalue: [
        binomtest(i, j).pvalue for i, j in zip(births.num_births, births.total_births)
    ]
)

The following is the output:

   year sex  num_births  total_births       p_value
0  1880   M      110491        201484  0.000000e+00
1  1881   M      100743        192696  3.317445e-89

edited Sep 18 '22 at 12:59

answered Sep 17 '22 at 21:28

blunova

2,122
3
9
21

Sorry, but that approach does not follow the style that I want to use. – William Rosenbaum Sep 18 '22 at 07:22
I have updated my answer to strictly follow your style guideline. Hope it helps. – blunova Sep 18 '22 at 13:00
Sorry, it relies on a for loop, I want a vectorized solution. I will have to continue the search. Thanks anyways. – William Rosenbaum Sep 18 '22 at 18:19

How to pass values from pandas column into scipy.stats.binomtest in a vectorized way and store answers in new column?

1 Answers1