2

I have about 3 million documents that are pdfs, docs and images. I have build a website and if user search from website interface, I have to serve those hbase stored documents as required.

How can I do it?

Is it good to use hbase for serving web documents (in future these documents will be further increased) ?

My hadoop version is 1.2.1 and hbase 0.94.

Hafiz Muhammad Shafiq
  • 8,168
  • 12
  • 63
  • 121

1 Answers1

0

I prefer in this case to have a search server that index this data and the web will integrate with this search server api for example: Solr, is an open source search server.

Hope this helps.

Mostafa
  • 3,296
  • 2
  • 26
  • 43
  • Can we index images in solr and can be retrieved from solr? – Hafiz Muhammad Shafiq Dec 07 '15 at 07:03
  • Solr is a search product that can index any document type including images. – Mostafa Dec 08 '15 at 14:54
  • Can you provide me some link or reference to some solr example like my case ? – Hafiz Muhammad Shafiq Dec 09 '15 at 03:45
  • This is the Apache Solr ref on how to configure and index content in Solr: https://cwiki.apache.org/confluence/display/solr/Indexing+and+Basic+Data+Operations . This reference is very useful to get started and setup solr and configure it, and ingest data and all the way to advanced settings. Hope this helps. – Mostafa Dec 09 '15 at 17:17
  • Also, if you have scanned images that has text, this reference helps you step by step how to configure Solr to extract text within images. https://hortonworks.com/hadoop-tutorial/indexing-and-searching-text-within-images-with-apache-solr/ -- Hope this helps. – Mostafa Dec 09 '15 at 17:19