Indexing CSV blobs does not work in Azure Search

Question

I have a number of TSV files as Azure blobs that have following as the first four tab-separated columns:

metadata_path, document_url, access_date, content_type

I want to index them as described here: https://learn.microsoft.com/en-us/azure/search/search-howto-index-csv-blobs

My request for creating an indexer has the following body:

{   
    "name" : "webdata",
    "dataSourceName" : "webdata",  
    "targetIndexName" : "webdata",  
    "schedule" : { "interval" : "PT1H", "startTime" : "2017-01-09T11:00:00Z" }, 
    "parameters" : { "configuration" : { "parsingMode" : "delimitedText", "delimitedTextHeaders" : "metadata_path,document_url,access_date,content_type" , "firstLineContainsHeaders" : true, "delimitedTextDelimiter" : "\t" } }, 
    "fieldMappings" : [     { "sourceFieldName" : "document_url", "targetFieldName" : "id", "mappingFunction" : { "name" : "base64Encode", "parameters" : "useHttpServerUtilityUrlTokenEncode" : false } }   }, { "sourceFieldName" : "document_url", "targetFieldName" : "url" },   { "sourceFieldName" : "content_type", "targetFieldName" : "content_type" }  ]
}

I am receiving an error:

{
  "error": {
    "code": "",
    "message": "Data source does not contain column 'document_url', which is required because it maps to the document key field 'id' in the index 'webdata'. Ensure that the 'document_url' column is present in the data source, or add a field mapping that maps one of the existing column names to 'id'."
  }
}

What do I do wrong?

score 0 · Answer 1 · answered Jan 10 '18 at 07:29

What do I do wrong?

In your case, you supply the json format is invalid. The following is the request for creating an indexer. Detail info we could refer to this document

{   
        "name" : "Required for POST, optional for PUT. The name of the indexer",  
        "description" : "Optional. Anything you want, or null",  
        "dataSourceName" : "Required. The name of an existing data source",  
        "targetIndexName" : "Required. The name of an existing index",  
        "schedule" : { Optional. See Indexing Schedule below. },  
        "parameters" : { Optional. See Indexing Parameters below. },  
        "fieldMappings" : { Optional. See Field Mappings below. },
        "disabled" : Optional boolean value indicating whether the indexer is disabled. False by default.
 }

If we want to create an indexer with Rest API. We need 3 steps to do that. I also do a demo for it. If Azure search SDK is acceptable, you also could refer to another SO thread.

1.Create datasource.

POST https://[service name].search.windows.net/datasources?api-version=2015-02-28-Preview
Content-Type: application/json
api-key: [admin key]

{
    "name" : "my-blob-datasource",
    "type" : "azureblob",
    "credentials" : { "connectionString" : "DefaultEndpointsProtocol=https;AccountName=<account name>;AccountKey=<account key>;" },
    "container" : { "name" : "my-container", "query" : "<optional, my-folder>" }
}

2.Create an index

{
      "name" : "my-target-index",
      "fields": [
        { "name": "metadata_path","type": "Edm.String", "key": true, "searchable": true },
        { "name": "document_url", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "access_date",  "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false },
        { "name": "content_type", "type": "Edm.String", "searchable": true, "filterable": false, "sortable": false, "facetable": false }
      ]
}

3. Create an indexer.

I do have an index and a data source. Thus their absence is not the cause of the error. — Ekaterina Ermilova, Jan 10 '18 at 11:10
As you don't mention you have created data source and index, I just give a working demo. I also mentioned that `you supplied the json format is invalid` — Tom Sun - MSFT, Jan 10 '18 at 12:16

score 0 · Accepted Answer · answered Jan 10 '18 at 11:49

Below is the request body that works:

{   
    "name" : "webdata",
    "dataSourceName" : "webdata",  
    "targetIndexName" : "webdata",  
    "schedule" : 
    { 
        "interval" : "PT1H", 
        "startTime" : "2017-01-09T11:00:00Z" 
    }, 
    "parameters" : 
    { 
        "configuration" :
        { 
            "parsingMode" : "delimitedText", 
            "delimitedTextHeaders" : "document_url,content_type,link_text" , 
            "firstLineContainsHeaders" : true, 
            "delimitedTextDelimiter" : "\t",
            "indexedFileNameExtensions" : ".tsv"
        } 
    },
    "fieldMappings" : 
    [
        { 
            "sourceFieldName" : "document_url", 
            "targetFieldName" : "id", 
            "mappingFunction" : { 
                "name" : "base64Encode", 
                "parameters" : { 
                    "useHttpServerUtilityUrlTokenEncode" : false 
                }
            }
        },
        { 
            "sourceFieldName" : "document_url", 
            "targetFieldName" : "document_url" 
        },   
        { 
            "sourceFieldName" : "content_type", 
            "targetFieldName" : "content_type" 
        },   
        { 
            "sourceFieldName" : "link_text", 
            "targetFieldName" : "link_text" 
        }       
    ]
}

Indexing CSV blobs does not work in Azure Search

2 Answers2