extract text from field arrays

Question

One of the fields called "resources" has the following 2 inner documents.

  {
  "type": "AWS::S3::Object",
  "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema"
},
{
  "accountId": "934331768510612",
  "type": "AWS::S3::Bucket",
  "ARN": "arn:aws:s3:::sms_vild"
}

I need to split the ARN field and get the last part of it. i.e. "reports_201706.schema" preferably using scripted field.

What I have tried:

1) I checked the fileds list and found only 2 entries resources.accountId and resources.type

2) I tried with date-time field and it worked correctly in the scripted filed option (expression).

doc['eventTime'].value

3) But the same does not work with other text fields for e.g.

doc['eventType'].value

Getting this error:

"caused_by":{"type":"script_exception","reason":"link error","script_stack":["doc['eventType'].value","^---- HERE"],"script":"doc['eventType'].value","lang":"expression","caused_by":{"type":"illegal_argument_exception","reason":"Fielddata is disabled on text fields by default. Set fielddata=true on [eventType] in order to load fielddata in memory by uninverting the inverted index. Note that this can however use significant memory."}}},"status":500}

It means I need to change the mapping. Is there any other way to extract text from nested arrays in an object?

Update:

Please visit sample kibana here...

https://search-accountact-phhofxr23bjev4uscghwda4y7m.us-east-1.es.amazonaws.com/_plugin/kibana/

search for "ebs_attach.png" and then check resources field. You will see 2 nested arrays like this...

 {
  "type": "AWS::S3::Object",
  "ARN": "arn:aws:s3:::datameetgeo/ebs_attach.png"
},
{
  "accountId": "513469704633",
  "type": "AWS::S3::Bucket",
  "ARN": "arn:aws:s3:::datameetgeo"
}

I need to split ARN field and extract the last part that is again "ebs_attach.png"

If I can some-how display it as scripted field, then I can see the bucket name and the file name side-by-side on discovery tab.

Update 2

In other words, I am trying to extract the text shown in this image as a new field on discovery tab.

score 2 · Accepted Answer · answered Jul 19 '17 at 12:53

While you can use scripting for this, I highly encourage you to extract those kind of information at index time. I have provided two examples here, which are far from failsafe (you need to test with different path or with this field missing at all), but it should provide a base to start with

PUT foo/bar/1
{
  "resources": [
    {
      "type": "AWS::S3::Object",
      "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema"
    },
    {
      "accountId": "934331768510612",
      "type": "AWS::S3::Bucket",
      "ARN": "arn:aws:s3:::sms_vild"
    }
  ]
}

# this is slow!!!
GET foo/_search
{
  "script_fields": {
    "document": {
      "script": {
        "inline": "return params._source.resources.stream().filter(r -> 'AWS::S3::Object'.equals(r.type)).map(r -> r.ARN.substring(r.ARN.lastIndexOf('/') + 1)).findFirst().orElse('NONE')"
      }
    }
  }
}

# Do this on index time, by adding a pipeline
PUT _ingest/pipeline/my-pipeline-id
{
  "description" : "describe pipeline",
  "processors" : [
    {
      "script" : {
        "inline": "ctx.filename = ctx.resources.stream().filter(r -> 'AWS::S3::Object'.equals(r.type)).map(r -> r.ARN.substring(r.ARN.lastIndexOf('/') + 1)).findFirst().orElse('NONE')"
      }
    }
  ]
}

# Store the document, specify the pipeline
PUT foo/bar/1?pipeline=my-pipeline-id
{
  "resources": [
    {
      "type": "AWS::S3::Object",
      "ARN": "arn:aws:s3:::sms_vild/servers_backup/db_1246/db/reports_201706.schema"
    },
    {
      "accountId": "934331768510612",
      "type": "AWS::S3::Bucket",
      "ARN": "arn:aws:s3:::sms_vild"
    }
  ]
}

# lets check the filename field of the indexed document by getting it
GET foo/bar/1

# We can even search for this file now
GET foo/_search
{
  "query": {
    "match": {
      "filename": "reports_201706.schema"
    }
  }
}

score 0 · Answer 2 · answered Jul 17 '17 at 05:24

0

Note: Considered "resources" is kind of array

NSArray *array_ARN_Values = [resources valueForKey:@"ARN"];

Hope it will work for you!!!

answered Jul 17 '17 at 05:24

Sandip Patel - SM

3,346
29
27

How will I know if resources is kind of array? I do not see "resources" in the fields list. However type, ARN and accountid parameters from resources are indexed. – shantanuo Jul 18 '17 at 06:38

extract text from field arrays

2 Answers2