3

Query:

index = test
| stats values(*) as * by ip_addr, location
| where location="USA"
| fields timestamp, user, ip, location, message

Result:

+--------------------------------------------------------------------+
| timestamp         | user   | ip          | location | message      |
+--------------------------------------------------------------------+
| 08/08/2020 17:00  | thomas | 10.10.10.10 | USA      | Hello, world!|
| 08/08/2020 17:05. | unknown|             |          | I love steak!|
| 08/08/2020 17:10. |        |             |          | I love soda! |
+--------------------------------------------------------------------+
| 08/08/2020 17:00  | jeffry | 10.10.10.20 | USA      | Hello, world!|
| 08/08/2020 17:35  | unknown|             |          | I love pancke|
| 08/08/2020 17:40  |        |             |          | I love waffle|
+--------------------------------------------------------------------+

I want to:

  1. make those multiple timestamps become one single timestamp
  2. remove the "unknown" value in the "user" field
  3. make "message" field to display only the "Hello, world!" - I dont care about the rest.

I tried to do:

index = test
| stats values(*) as * by ip_addr
| where location="USA"
| eval user=replace(user, "unknown", "")
| fields timestamp, user, ip, location, message

But it removes all the values under "user" field. Any advice? My number 2 and number 3 goals look similar. If I could crack either one of them, I think I could solve the other one easily.

ThomasWest
  • 485
  • 1
  • 7
  • 21
  • 1. Which timestamp do you want to display? 2. Do you want to remove just the "unknown" field or the entire event associated with it? 3. Do you want to display only "Hello, world" or retain only events containing that string? – RichG Nov 03 '20 at 13:26
  • @RichG 1. Is it possible to average it? If not, let's pick the middle one. 2. Yes, I only want to trim the "unknown" field. It sticks together with Thomas (or any other users) because sometimes the log (index=test) can't display the username correctly. 3. Likewise, I only wanna retain only events containing that "Hello, World" string. – ThomasWest Nov 03 '20 at 15:09
  • @RichG Basically, the raw log looks something like " | unknown | | USA | Hello, World!" and " | thomas | | USA | I love steak!" and " | thomas | | USA | I love soda!". I use stat() because I wanna to get a richer information, that is to get the username, without using join() function. – ThomasWest Nov 03 '20 at 15:11
  • you search should start `index=test location="USA"` - don't use the `| where` clause unless absolutely necessary – warren Nov 04 '20 at 12:54

1 Answers1

1

Yes, timestamps can be averaged, if they are in epoch (integer) form. The result of the values(*) function is a multi-value field, which doesn't work well with replace or most other commands and functions not designed for them. That's why I use the mvfilter and mvdedup commands below.

index = test
| where location="USA"
| stats earliest(timestamp) as timestamp, values(*) as * by ip_addr
| eval user=mvdedup(mvfilter(!match(user, "unknown")),
 message=mvdedup(mvfilter(match(message, "Hello, world!"))
| fields timestamp, user, ip, location, message
RichG
  • 9,063
  • 2
  • 18
  • 29