Unable to paginate through all bing API results

Question

I'm currently using the Bing Web Search API v7 to query Bing for search results. As per the API docs, the parameters count and offset are used to paginate through the results, the total number of which are defined in the results themselves by the value of totalEstimatedMatches.

As below from the documentation:

totalEstimatedMatches: The estimated number of webpages that are relevant to the query. Use this number along with the count and offset query parameters to page the results.

This seems to work up to a point, after which the API just continues to return the exact same results over and over, regardless of the values of count and offset.

In my specific case, the totalEstimatedMatches was set at 330,000. With a count of 50 (i.e. 50 results per request) the results begin repeating at around offset 700 i.e. 3,500 results into the estimated 330,000.

In playing with the bing front end, I have noticed a similar behaviour once the page count get sufficiently high e.g.

https://www.bing.com/search?q=feed%3amp3&first=1&FORM=PERE - initial search, estimated 51,000 results
https://www.bing.com/search?q=feed%3amp3&first=1000&FORM=PERE - first 1000, should get results 1000 to 1010 but returns same results as url below
https://www.bing.com/search?q=feed%3amp3&first=2000&FORM=PERE - first = 2000, should get results 2000 to 2010 but returns same results as url above

Am I using the API incorrectly or is this just some sort of limitation or bug in which the totalEstimatedMatches is just way off?

Hitting this bug today, see https://stackoverflow.com/questions/76097614/how-to-get-more-than-100-results-with-the-bing-search-api-v7/76097615. Does anybody know whether the Google Search API is better? — Martin Monperrus, Apr 25 '23 at 04:23

score 3 · Accepted Answer · answered May 21 '18 at 17:09

3

totalEstimatedMatches provides total number of matches for that query around the web - that includes duplicate results and near similar content as well.

In order to optimize indexing all search engines restrict results to top N webpages. This is what you are seeing. This behavior is consistent across all the search engines as typically near all the users change a query/select a webpage/abandon within 2-3 search pages.

In short, this is not a bug/incorrect implementation but optimization of index that's restricting you from getting more results. If you really need to get more results, you can use the related searches and append the unique webpages.

answered May 21 '18 at 17:09

Ronak

751
5
10

I would have thought the API would allow full access to the index though even if the site doesn't? – user783836 May 21 '18 at 19:23
API is actually a manifestation of the site with convenience of TPS, filters, and sorting. So, unfortunately, APIs won't provide all the results as well. This is true in fact across all the search engines and APIs. – Ronak May 22 '18 at 18:11

score 0 · Answer 2 · answered Mar 08 '19 at 01:24

Technically this isn't a direct answer to the question as asked. Hopefully it's helpful to provide a way to paginate efficiently through Bing's API without having to use the "totalEstimatedMatches" return value which, as the other answer explains, can behave really unpredictably: Here's some python:

class ApiWorker(object):
    def __init__(self, q):
        self.q = q
        self.offset = 0
        self.result_hashes = set()
        self.finished = False

    def calc_next_offset(self, resp_urls):
       before_adding = len(self.result_hashes)
       self.result_hashes.update((hash(i) for i in resp_urls)) #<==abuse of set operations.
       after_adding = len(self.result_hashes)
       if after_adding == before_adding: #<==then we either got a bunch of duplicates or we're getting very few results back.
           self.finished = True
       else:
           self.offset += len(new_results)

    def page_through_results(self, *args, **kwargs):
        while not self.finished:
            new_resp_urls = ...<call_logic>...
            self.calc_next_offset(new_resp_urls) 
            ...<save logic>...
        print(f'All unique results for q={self.q} have been obtained.')

This^ will stop paginating as soon as a full response of duplicates have been obtained.

Unable to paginate through all bing API results

2 Answers2

Linked