2

I have the dataframe births:

year sex num_births total_births
1 1880 M 110491 201484
3 1881 M 100743 192696
5 1882 M 113686 221533
7 1883 M 104627 216946
9 1884 M 114442 243462
11 1885 M 107799 240854
13 1886 M 110784 255317
15 1887 M 101413 247394
17 1888 M 120851 299473
19 1889 M 110580 288946

And I want to test use the binomtest from scipy and add the p-value to a new column:

(births
 .assign(binom_pvalue=lambda x: stats.binomtest(x.num_births, x.total_births).pvalue)
)

but I get the error TypeError: k must be an integer.

It looks like I am passing the whole series instead of the value for each row. However, this methods works when doing something like:

(num_births
 .assign(ratio=lambda x: x.num_births / x.total_births)
)

output:

year sex num_births total_births ratio
1 1880 M 110491 201484 0.548386
3 1881 M 100743 192696 0.522808
5 1882 M 113686 221533 0.513179
7 1883 M 104627 216946 0.482272
9 1884 M 114442 243462 0.470061

In this scenario, it uses the value for each row in a vectorized fashion.

How can I use the binomtest function, using the same style as I am trying above?

Thankful for answers!

William

Progman
  • 16,827
  • 6
  • 33
  • 48

1 Answers1

0

You can try this using lambda and assign:

import pandas as pd
from scipy.stats import binomtest

births = pd.DataFrame(
    data={
        "year": [1880, 1881],
        "sex": ["M", "M"],
        "num_births": [110491, 100743],
        "total_births": [201484, 192696],
    }
)

births.assign(
    p_value=lambda pvalue: [
        binomtest(i, j).pvalue for i, j in zip(births.num_births, births.total_births)
    ]
)

The following is the output:

   year sex  num_births  total_births       p_value
0  1880   M      110491        201484  0.000000e+00
1  1881   M      100743        192696  3.317445e-89
blunova
  • 2,122
  • 3
  • 9
  • 21