0

I recently wrote some code to substantially improve the scipy.stats.binom_test method. Basically, the function was creating an array of the size of the inputs and this was causing memory errors when the inputs were of the order of 100 million. Creating these arrays was unnecessary and an artifact of porting the method from R. I modified this logic in the following PR: https://github.com/ryu577/scipy/pull/1/files.

To see how this unnecessary creation of arrays causes issues, run the following code:

from scipy.stats import binom_test
binom_test(100000000,100000001,.5)

Here, I replaced the searching for the value in an array with binary search. This makes the method much more memory and time efficient. This takes the method from being un-usable for inputs sized hundreds of millions to running in a blink of an eye with no memory overhead at all.

I tested usage and it produces the same output as the original version in a variety of contexts.

However, this PR has not been getting any attention. I even sent an email about it to the scipy mailing list and got no response.

I'm committing to do whatever it takes to get this change into scipy, but am lost as to the next steps. Is there anyone who has contributed to scipy that can guide me?

Rohit Pandey
  • 2,443
  • 7
  • 31
  • 54
  • There seem to be 268 open PRs at the minute. Maybe it's a just a matter of time (and patience) :-) – NomadMonad Apr 18 '20 at 21:34
  • 1
    I'm not an expert at using `github`, but it looks to me like you've submitted the pull request to your own fork, rather than to the main `scipy` repository. No one is watching your fork. – hpaulj Apr 18 '20 at 21:44
  • Shoot, that was what I was afraid of. How do I submit it to the main scipy repo? – Rohit Pandey Apr 18 '20 at 22:17
  • 1
    This should help. You basically have to select upstream as the base: https://help.github.com/en/github/collaborating-with-issues-and-pull-requests/creating-a-pull-request-from-a-fork – Joey Dumont Apr 20 '20 at 16:34

0 Answers0