2

I was looking for a solution where i have a PDF file and i want to search a particular text in that file and the result of that searched text should come in a list format along with its page number. I looked for online solution but was unable to find the perfect and proper solution to it...

Although there is same type of feature available in adobe reader which is called as "comments" where user can view all the searched items in a list format along with its page numbers.

Your answer would be really helpful for me and if possible please provide example too..

Thank you in advance.

Abhishek Solanki
  • 43
  • 1
  • 1
  • 8
  • You mentioned PDF.js in the tags, but did not describe why. There is example that prints text for each page (https://github.com/mozilla/pdf.js/blob/master/examples/node/getinfo.js), which can be adapted to do what you just asked. – async5 May 11 '17 at 16:09
  • Thank you @async5, well i am using PDF.js library by default and there's client's requirement that he wants to display all the text that are searched. So i was looking for the same, and well i am new to this PDF.js so can you please help me with a working example of the example you just mentioned may be a fiddle or any other example would be good, it would be really helpful for me. Thank you in advance. – Abhishek Solanki May 12 '17 at 05:57

1 Answers1

6

Here is the example that might help you to display found text grouped per page using PDF.js.

var searchText = "JavaScript";
function searchPage(doc, pageNumber) {
  return doc.getPage(pageNumber).then(function (page) {
    return page.getTextContent();
  }).then(function (content) {
    // Search combined text content using regular expression
    var text = content.items.map(function (i) { return i.str; }).join('');
    var re = new RegExp("(.{0,20})" + searchText + "(.{0,20})", "gi"), m;
    var lines = [];
    while (m = re.exec(text)) {
      var line = (m[1] ? "..." : "") + m[0] + (m[2] ? "..." : "");
      lines.push(line);
    }
    return {page: pageNumber, items: lines};
  });
}

var loading = PDFJS.getDocument("//cdn.mozilla.net/pdfjs/tracemonkey.pdf");
loading.promise.then(function (doc) {
  var results = [];
  for (var i = 1; i <= doc.numPages; i++)
    results.push(searchPage(doc, i));
  return Promise.all(results);
}).then(function (searchResults) {
  // Display results using divs
  searchResults.forEach(function (result) {
    var div = document.createElement('div'); div.className="pr"; document.body.appendChild(div);
    div.textContent = 'Page ' + result.page + ':';
    result.items.forEach(function (s) {
      var div2 = document.createElement('div'); div2.className="prl"; div.appendChild(div2);
      div2.textContent = s; 
    });
  });
}).catch(console.error);
.pr { font-family: sans-serif; font-weight: bold; }
.prl { font-style: italic; font-weight: normal; }
<script src="//npmcdn.com/pdfjs-dist/build/pdf.js"></script>
async5
  • 2,505
  • 1
  • 20
  • 27
  • Thank you very much for your answer.Its working but i was looking for something else output is different that's not what i was looking for. let me give you a example of what i needed theoretically first the user would search for a word lets say "hello",after searching for the word he would get a list of all the word "hello" present in PDF file along with index number and the page number on which it is. lets say there are total 4 "hello" in a particular document then i want to show list of all 4 "hello" and if i click on 3rd then it should redirect it to 3rd "hello" on page where it is present. – Abhishek Solanki May 13 '17 at 08:14
  • 1
    Heya @AbhishekSolanki did you find a solution? – Chris Tarasovs Aug 23 '19 at 13:15
  • 1
    Have to say this is really slick and clever - the regex also is so concise, thanks! – Edmond Tamas Apr 10 '21 at 07:07
  • Hey, this works like a charm! This is taking about 3.8 seconds to parse a 1000-page file in my system. Any way to make it faster? @EdmondTamas – Parth Kapadia Aug 11 '22 at 02:27