3

I'd like to copy data from an S3 directory to the Amazon ElasticSearch service. I've tried following the guide, but unfortunately the part I'm looking for is missing. I don't know how the lambda function itself should look like (and all the info about this in the guide is: "Place your application source code in the eslambda folder."). I'd like ES to autoindex the files.

Currently I'm trying

for record in event['Records']:
    bucket = record['s3']['bucket']['name']
    key = urllib.unquote_plus(record['s3']['object']['key'])
    index_name = event.get('index_name', key.split('/')[0])
    object = s3_client.Object(bucket, key)

    data = object.get()['Body'].read()

    helpers.bulk(es, data, chunk_size=100)

But I get like a massive error stating elasticsearch.exceptions.RequestError: TransportError(400, u'action_request_validation_exception', u'Validation Failed: 1: index is missing;2: type is missing;3: index is missing;4: type is missing;5: index is missing;6: type is missing;7: ...

Could anyone explain to me, how can I set things up so that my data gets moved from S3 to ES where it gets auto-mapped and auto-indexed? Apparently it's possible, as mentioned in the reference here and here.

Mark B
  • 183,023
  • 24
  • 297
  • 295
lte__
  • 7,175
  • 25
  • 74
  • 131

2 Answers2

0

While mapping can automatically be assigned in Elasticsearch, the indexes are not automatically generated. You have to specify the index name and type in the POST request. If that index does not exist, then Elasticsearch will create the index automatically.

Based on your error, it looks like you're not passing through an index and type.

For example, here's how a simple POST request to add a record to the index MyIndex and type MyType which would first create the index and type if it did not already exist.

curl -XPOST 'example.com:9200/MyIndex/MyType/' \ 
    -d '{"name":"john", "tags" : ["red", "blue"]}'
John Veldboom
  • 2,049
  • 26
  • 29
0

I wrote a script to download a csv file from S3 and then transfer the data to ES.

  1. Made an S3 client using boto3 and downloaded the file from S3
  2. Made an ES client to connect to Elasticsearch.
  3. Opened the csv file and used the helpers module from elasticsearch to insert csv file contents into elastic search.

main.py

import boto3
from elasticsearch import helpers, Elasticsearch
import csv
import os
from config import *


#S3
Downloaded_Filename=os.path.basename(Prefix)
s3 = boto3.client('s3', aws_access_key_id=awsaccesskey,aws_secret_access_key=awssecretkey,region_name=awsregion)
s3.download_file(Bucket,Prefix,Downloaded_Filename)

#ES
ES_index = Downloaded_Filename.split(".")[0]
ES_client = Elasticsearch([ES_host],http_auth=(ES_user, ES_password),port=ES_port)

#S3 to ES
with open(Downloaded_Filename) as f:
    reader = csv.DictReader(f)
    helpers.bulk(ES_client, reader, index=ES_index, doc_type='my-type')

config.py

awsaccesskey = ""
awssecretkey = ""
awsregion = "us-east-1"
Bucket=""
Prefix=''
ES_host = "localhost"
ES_port = "9200"
ES_user = "elastic"
ES_password = "changeme"
Arshil
  • 33
  • 9
  • Thanks for helping out. Consider editing your post to highlight/explain important bits of your solution & how/why it solves the OP's issue. SO discourages code only Answers. Quality answers are upvoted over time as future visitors learn something from your post that gives them insight to their own coding issues. Code only dumps cater to "free coding service" vibes, which SO is not. People are less likely to read code dumps looking for insights, without you pointing out what they should focus on & why. – SherylHohman Jan 06 '21 at 15:16