1

I noticed that most blogs are talking about how to setup HDFS Audit Logs. But is there a source to identify what each operation / cmd stands for?

I found the following table in Hadoop HowToConfigure wiki: HDFS Audit Logs format

But what I don't know is that in all the operations, what do they stand for.

e.g. I was trying to categorize the operations by read / write operations but seems "open" is the general command for both read / write and the rest of them are more like DDL and access control operations.

I do understand that in different Hadoop distributions like Cloudera or HDP they have their own way to tell the audit logs, but what's the default operations stands for? e.g. create - might means create file / mkdirs might means mkdir for a hive table / hive partition.

And most importantly is there a way to differentiate read / write operations?

tk421
  • 5,775
  • 6
  • 23
  • 34
Tim Raynor
  • 733
  • 2
  • 12
  • 28

1 Answers1

0

If you think of most typical Hadoop jobs (Pig, Hive, MR, SQOOP, Spark), you rarely overwrite data so create implies write and open implies read. If you were to overwrite data you actually delete it, then recreate it.

To differentiate which service did the action, you also need to look at additional sources (Hive audit log, YARN RM audit logs) or infer the service from the user and directory (/usr/hive/warehouse/* is probably a hive query).

Overwrite / Append references:

How to force STORE (overwrite) to HDFS in Pig?

How does Sqoop append command will work in hadoop

Hive Audit logs:

https://cwiki.apache.org/confluence/display/Hive/GettingStarted#GettingStarted-AuditLogs

tk421
  • 5,775
  • 6
  • 23
  • 34