0

i have a below apache atlas audit logs:

[INFO] 2020-06-29 15:14:31,732 AUDIT logJSON - {"repoType":15,"repo":"atlas","reqUser":"varun","evtTime":"2020-06-29 15:14:29.967","access":"entity-read","resource":"AtlanColumn/[]/glue/78975568964/flights/default/flightsgdelt_100m_test_partition/c_11","resType":"entity","action":"entity-read","result":1,"agent":"atlas","policy":6,"enforcer":"ranger-acl","cliIP":"10.9.2.76","agentHost":"atlas-7d9dcdd6c5-lmfzj","logType":"RangerAudit","id":"87c9e862-910b-4ee2-86f8-cb174f4e7b76-863129","seq_num":1701441,"event_count":1,"event_dur_ms":0,"tags":[],"cluster_name":"","policy_version":54}

rite now i have below parse config:

        <parse>
          @type regexp
          expression ^\[(?<Level>.[^ ]*)\] (?<datetime>[0-9]{4}-[0-9]{2}-[0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2},[0-9]{3}) (?<Type>.[^ ]*) (?<Action>.[^ ]*) \- \{"repoType":(?<repoType>.[^ ]*)\,"repo":"(?<repo>.[^ ]*)\","reqUser":"(?<reqUser>.[^ ]*)\","evtTime":"(?<evtTime>.[^ ].*)\","access":"(?<access>.[^ ]*)\","resource":"(?<resource>.[^ ].*)\","resType":"(?<resType>.[^ ]*)\","action":"(?<action>.[^ ]*)\","result":(?<result>.[^ ]*)\,"agent":"(?<agent>.[^ ].*)\","policy":(?<policy>.[^ ]*)\,"enforcer":"(?<enforcer>.[^ ]*)\","cliIP":"(?<cliIP>.[^ ]*)\","agentHost":"(?<agentHost>.[^ ]*)\","logType":"(?<logType>.[^ ]*)\","id":"(?<id>.[^ ]*)\","seq_num":(?<seq_num>.[^ ]*)\,"event_count":(?<event_count>.[^ ]*)\,"event_dur_ms":(?<event_dur_ms>.[^ ]*)\,"tags":(?<tags>.[^ ].*)\,"cluster_name":(?<cluster_name>.[^ ].*),"policy_version":(?<policy_version>.[^ ]*)\}
        </parse>

now we want to further breakdown the resource field into multiple fields like below:

AssetType
Tags
Integration
Database
Schema
Table
Column

issue here is its not neccesary that resource field always has above combination. it can be AssetType/Tags/Integration or AssetType/Tags/Integration/Database or AssetType/Tags/Integration/Database/Schema or AssetType/Tags/Integration/Database/Schema/Table or AssetType/Tags/Integration/Database/Schema/Table/Column.

if any of the fields are missing then we should send null.

any suggestion or guidence on this would be highly appreciated.

Olaf Kock
  • 46,930
  • 8
  • 59
  • 90
chitender kumar
  • 394
  • 4
  • 21

1 Answers1

0

you can use the record_reformer plugin to parse the resource key and extract the needed values for each of the needed keys, Below is an example of the usage

 <match pattern.**>
    @type record_reformer
    tag new_tag.${tag_suffix[2]}
    renew_record false
    enable_ruby true
    <record>
      AssetType ${record['resource'].scan(/^([^\/]+\/){0}(?<param>[^\/]+)/).flatten.compact[0]}
      Tags ${record['resource'].scan(/^([^\/]+\/){1}(?<param>[^\/]+)/).flatten.compact[0]}
      Integration ${record['resource'].scan(/^([^\/]+\/){2}(?<param>[^\/]+)/).flatten.compact[0]}
      Database ${record['resource'].scan(/^([^\/]+\/){3}(?<param>[^\/]+)/).flatten.compact[0]}
      Schema ${record['resource'].scan(/^([^\/]+\/){4}(?<param>[^\/]+)/).flatten.compact[0]}
      Table ${record['resource'].scan(/^([^\/]+\/){5}(?<param>[^\/]+)/).flatten.compact[0]}
      Column ${record['resource'].scan(/^([^\/]+\/){6}(?<param>[^\/]+)/).flatten.compact[0]}
    </record>
  </match>
Al-waleed Shihadeh
  • 2,697
  • 2
  • 8
  • 22