1

Corresponding to the official DOC, I prepare the following URL: https://api.github.com/repos/gooddata/gooddata-python-sdk/pulls?per_page=10&page=1&sort=updated&direction=asc&state=all&q=updated:%3E=2022-06-01

It returns the first 10 pull requests in this (public) repo, first with updated at 2021-08-05T12:55:43Z. It seems that the following part of the query does not work: &q=updated:%3E=2022-06-01

I tried to utilize the existing python library for integration with Github API: https://github.com/PyGithub/PyGithub, but there is no support for limiting the updated field.

1 Answers1

0

I think with PyGitHub my approach would be to get all open PRs, sort by updated in ascending/descending order whichever you prefer.

In this example https://pygithub.readthedocs.io/en/latest/examples/PullRequest.html?highlight=pr#get-pull-requests-by-query

I'd change sort="updated" and direction="desc" so:

repo.get_pulls(state="open", sort="updated", direction="desc")

I'm unfamiliar with gooddata so cannot help there sorry.

Leslie Alldridge
  • 1,427
  • 1
  • 10
  • 26
  • Thanks for your answer. Sort and direction work OK. But I want to crawl all pull requests, not only open, and I want to crawl them incrementally. The first time I collect all and I save the current date-time somewhere. The second time I want to download only pull requests newer than the date-time saved during the previous run. – Jan Soubusta Sep 05 '22 at 06:40
  • You could use `state="all"` for the first part so you look at both open/closed/whatever PRs. If you are doing this for many repositories you could make one massive list. I feel like the list can then be iterated over with your date condition logic, then you'd use the slimmed down list. Future runs you could edit the functionality so it requests specific pages and you'd check each page to compare the timestamp you saved from the first run: https://pygithub.readthedocs.io/en/latest/utilities.html#pagination – Leslie Alldridge Sep 05 '22 at 23:29