0

I have a file that contains the link of some Stack Overflow pages (posts) (12000 records).

I need the number of accepted answers for each of these pages. Is there any API or another optimum way (like ones in Stack Exchange) that I can give it this file of links and get their number of answers?

double-beep
  • 5,031
  • 17
  • 33
  • 41
maria
  • 19
  • 3
  • The question is rather unclear to me. You have 12k links to some questions (no answers, right?). Would you like to get the total number of **answers** these questions have or the number of **accepted answers**? There's a difference – double-beep Jul 19 '20 at 10:44
  • Yes, you got it right. I need the number of their accepted answers – maria Jul 20 '20 at 17:49

1 Answers1

3

First of all, since you have around 12k posts, you may want to get a key by registering an app on StackApps.

Then, depending on the language you use, you need to follow these steps:

  1. Make sure all of those ids are question ids, they are unique and all belong on the same site (in order to save some quota)!.
  2. Split the ids and store them in an array.
  3. Split the array into other, smaller arrays. Each should contain up to 100 ids.
  4. Now for each of those sub-arrays do:
    • Join the ids with ; as a separator.
    • Create a variable hasMore and set it to true.
      Create a variable page and set it to 0.
      Create a variable acceptedAnswerCount and set it to 0
    • Now while hasMore is true, do:
      • Increase the page by 1.
      • GET /questions/{ids}/answers (replace {ids} with the semicolon-separated ones above) and use the following parameters: pagesize=100, page=<the page variable>, order=desc, sort=votes and site=stackoverflow (or whatever site you want).
      • Loop through the items in the items array, (property of the JSON the API returns). For each of these items get the value of is_accepted property. If it's true, increase acceptedAnswerCount, else continue.
      • Check if a backoff field exists. If yes, get its value and wait for that number of seconds. If no, add a smaller timeout greater than 100 ms to decrease the chances of getting a backoff.
  5. Now that the while loop is over, get the value of the acceptedAnswerCount variable, which is the number you want!

Here's a JavaScript example (the question ids are 200):

async function getAcceptedAnswerCount() {
  document.querySelector('textarea').value = ''; // clear logs
  document.querySelector('#submit-ids').disabled = true; // disable button

  const postIds = document.querySelector('#ids-input').value;
  const key = 'U4DMV*8nvpm3EOpvf69Rxw((';
  const sitename = 'stackoverflow';

  // example: comma-separated ids
  const buildApiUrl = ids => `https://api.stackexchange.com/2.3/questions/${ids}/answers`;
  const delay = async seconds => await new Promise(resolve => setTimeout(resolve, seconds * 1e3));

  const filter = '!bN4iJfRmwXY5VE';
  let hasMore = true;
  let page = 0;
  let acceptedAnswerCount = 0;
  const arrIds = postIds.split(',');

  async function callApi(page, ids) {
    const url = `${buildApiUrl(ids.join(';'))}?site=${sitename}&filter=${filter}&key=${key}&page=${page}`
    const apiCall = await fetch(url);
    const apiResponse = await apiCall.json();

    const {
      backoff,
      quota_remaining,
      page: currentPage
    } = apiResponse;
    appendToLogs(`INFO: Fetched page ${currentPage}. Quota remaining is ${quota_remaining}`);

    if (backoff) {
      appendToLogs(`WARNING: BACKOFF received. Waiting for ${backoff} seconds.`);
      await delay(backoff);
    }

    const count = apiResponse.items.filter(item => item.is_accepted).length;
    return count;
  }

  for (let i = 0; i < Math.ceil(arrIds.length / 100); i++) {
    const currCount = await callApi(i + 1, arrIds.slice(i * 100, (i + 1) * 100));
    acceptedAnswerCount += currCount;

    await delay(0.1);
  }

  appendToLogs(`INFO: The total number of accepted answers is ${acceptedAnswerCount}`);
  document.querySelector('#submit-ids').disabled = false; // re-enable button
};

function appendToLogs(textToAppend) {
  document.querySelector('textarea').value += textToAppend + '\n';
}

document.querySelector('#submit-ids').addEventListener('click', getAcceptedAnswerCount);
<link rel="stylesheet" href="https://unpkg.com/@stackoverflow/stacks/dist/css/stacks.min.css">

<div class="m12">
  <!-- from https://stackoverflow.design/product/components/inputs/#appended-inputs -->
  <div class="d-flex gs4 gsy fd-column">
    <label class="flex--item s-label">Please enter semicolon-separated post ids</label>
    <div class="d-flex">
      <div class="d-flex ai-center order-last s-input-fill">
        <div class="d-flex gs4 gsx ai-center">
          <button class="s-btn s-btn__primary s-btn__sm flex--item" id="submit-ids" type="button">Submit</button>
        </div>
      </div>
      <div class="d-flex fl-grow1 ps-relative">
        <input class="flex--item s-input brr0" id="ids-input" type="text" placeholder="Enter ids here" />
      </div>
    </div>
  </div>
  <br/>
  <div class="grid ff-column-nowrap gs4 gsy">
    <label class="grid--cell s-label">Logs</label>
    <textarea class="grid--cell s-textarea" readonly style="resize: vertical;" rows="5"></textarea>
  </div>
</div>

References:

double-beep
  • 5,031
  • 17
  • 33
  • 41