0

I am no RegEx expert. I am trying to understand if can use RegEx to find a block of data from a JSON file.

My Scenario:

I am using an AWS RDS instance with enhanced monitoring. The monitoring data is being sent to a CloudWatch log stream. I am trying to use the data posted in CloudWatch to be visible in log management solution Loggly.

The ingestion is no problem, I can see the data in Loggly. However, the whole message is contained in one big blob field. The field content is a JSON document. I am trying to figure out if I can use RegEx to extract only certain parts of the JSON document.

Here is an sample extract from the JSON payload I am using:

{
    "engine": "MySQL",
    "instanceID": "rds-mysql-test",
    "instanceResourceID": "db-XXXXXXXXXXXXXXXXXXXXXXXXX",
    "timestamp": "2017-02-13T09:49:50Z",
    "version": 1,
    "uptime": "0:05:36",
    "numVCPUs": 1,
    "cpuUtilization": {
        "guest": 0,
        "irq": 0.02,
        "system": 1.02,
        "wait": 7.52,
        "idle": 87.04,
        "user": 1.91,
        "total": 12.96,
        "steal": 2.42,
        "nice": 0.07
    },
    "loadAverageMinute": {
        "fifteen": 0.12,
        "five": 0.26,
        "one": 0.27
    },
    "memory": {
        "writeback": 0,
        "hugePagesFree": 0,
        "hugePagesRsvd": 0,
        "hugePagesSurp": 0,
        "cached": 505160,
        "hugePagesSize": 2048,
        "free": 2830972,
        "hugePagesTotal": 0,
        "inactive": 363904,
        "pageTables": 3652,
        "dirty": 64,
        "mapped": 26572,
        "active": 539432,
        "total": 3842628,
        "slab": 34020,
        "buffers": 16512
    },

My Question

My question is, can I use RegEx to extract, say a subset of the document? For example, CPU Utilization, or Memory etc.? If that is possible, how do I write the RegEx? If possible, I can use it to drill down into the extracted document to get individual data elements as well.

Many thanks for your help.

sadeq68
  • 131
  • 1
  • 4

1 Answers1

0

First I agree with Sebastian: A proper JSON parser is better.

Anyway sometimes the dirty approach must be used. If your text layout will not change, then a regexp is simple:

E.g. "total": (\d+\.\d+) gets the CPU usage and "total": (\d\d\d+) the total memory usage (match at least 3 digits not to match the first total text, memory will probably never be less than 100 :-).

If changes are to be expected make it a bit more stable: ["']total["']\s*:\s*(\d+\.\d+).

It may also be possible to match agains return chars like this: "cpuUtilization"\s*:\s*\{\s*\n.*\n\s*"irq"\s*:\s*(\d+\.\d+) making it a bit more stable (this time for irq value).

And so on and so on.

You see that you can get fast into very complex expressions. That approach is very fragile!

P.S. Depending of the exact details of the regex of loggy, details may change. Above examples are based on Perl.

Dirk Stöcker
  • 1,628
  • 1
  • 12
  • 23