0

Scenario:

Blob storage: contains pdf, word, image files (about 70 files)

I used default fields and predefined skills to create an Azure search instance through the Azure Portal.

But the results for querying any text in these files is not very good. I made content and key phrases as searchable and retrievable. I tried to use Lucene analyzers but was not a great help.

The main concern is if I type even a letter for example "u" in the search explorer, it returns the file. as per my understanding, there is no such word in my files. so what is it doing?

How to refine the search? and also how to manipulate the result?

I am not an expert in document processing. So using the unstructured documents in the blob instead of JSON formatted documents.

another thing, how to define some field in the index, let's say chapter-name or title name which can relate to the PDF chapters/title name?

Please suggest me some ideas or some example links. I am using .net core to develop this.

Vivek Jain
  • 71
  • 1
  • 11
  • 1
    You could refer to this [article](https://devnet.kentico.com/articles/customizing-azure-search-fields) to Customizing Azure Search fields. – Joey Cai Feb 14 '19 at 09:26
  • 1
    Vivek, about your first set of questions about composing different search queries and manipulating the results, please refer to the following 3 documentation pages which should have detailed information: https://learn.microsoft.com/en-us/azure/search/search-query-overview https://learn.microsoft.com/en-us/azure/search/query-lucene-syntax https://learn.microsoft.com/en-us/azure/search/search-lucene-query-architecture – Arvind - MSFT Feb 14 '19 at 19:01
  • 1
    Here's documentation about how to create and define index fields from the official docs: https://learn.microsoft.com/en-us/azure/search/search-what-is-an-index – Arvind - MSFT Feb 14 '19 at 19:02
  • 1
    Having pointed you to all those links, I am just curious - what is your application's scenario? Document extraction via indexers, does not have the capability of "extracting specific" content from PDF files (like chapter names and titles).. Indexers extract the entire content of the PDF in one shot. What are you trying to do with your document collection? This will help me propose a path forward (if one exists at all) – Arvind - MSFT Feb 14 '19 at 19:04

1 Answers1

0

use custom skill set to extract the fields which you required and make sure those fields are defined in index.

Muni Chittem
  • 988
  • 9
  • 17