I am trying to fetch list of all repositories from Github to do some analysis on it. I have started my job with their v3.0 API which is a Restful one and then when I needed more info like star count, migrated from v3.0 to v4.0 which is provided as GraphQL. Now I am making request for 100 records each time and doing this recursively to be able to fetch all records.
The problem is about pagination job. To have pagination work, I have to get endCursor of each request and then in the next request, I have to fill after property with this value. Now the problem is that data is not paginated properly. For example:
- Requesting first page (without any cursor) results in different records.
- Requesting a page with same cursor for multiple times, also retrieves different results.
- And if simply not check this, and try to fetch on after another, each 100 records, have many duplicates with previous requests, which means pagination does not work correctly.
The query that I am sending (in a nodejs app) is as below:
{
search(query: "is:public", type: REPOSITORY, first: 100, after: "Y3Vyc29yOjEwMA==") {
repositoryCount
userCount
wikiCount
pageInfo {
startCursor
endCursor
hasNextPage
hasPreviousPage
}
edges {
node {
... on Repository {
databaseId
id
name
description
forkCount
isFork
issues {
totalCount
}
labels (first: 100) {
nodes {
name
}
}
languages (first: 100) {
nodes {
name
}
}
licenseInfo {
name
}
nameWithOwner
primaryLanguage {
name
}
pullRequests {
totalCount
}
watchers {
totalCount
}
stargazers {
totalCount
}
}
}
}
}
}
as I have previously said, first time, I remove the parameter after from the search inputs, and then use endCursor of previous request as the after param of next one.
Am I miss understanding the cursor purpose and its usage or is this a bug (intended/unintended) from Github itself?