I am parsing my database logs using fluentd where I am parsing the logs using regular expression and indexing them to solr, I have a problem where few logs are in the below format
2023-08-24 08:28:20.278 UTC [26154] abcd@rst STATEMENT:SELECT
purpose,
COUNT(*) AS std_count
FROM
emp.database
LEFT JOIN LATERAL (
SELECT
unnest(string_to_array(db_platform, ',')) AS platform
UNION ALL
SELECT
'Not Specified'
WHERE
db_platform IS NULL
OR db_platform = ''
) AS platform_data ON true
LEFT OUTER JOIN std.server ON std.database.std_id = emp.server.emp_id
LEFT OUTER JOIN emp.database ON std.server std_id = emp_id
WHERE
1 = 1
AND std.server_customer IN ('EMPX')
AND std.server.business IN ('IT Services')
AND std.server.department IN ('Product Development')
GROUP BY
1
ORDER BY
1
unnest(string_to_array(s.purpose, ',')) AS platform
UNION ALL
SELECT
'Not Specified'
WHERE
s.purpose IS NULL
OR s.purpose = ''
) AS platform_data ON true
and few logs are in below format
2023-08-24 08:32:36.963 UTC [14006] abc@vx LOG: could not receive data from client: Connection reset by peer
2023-08-24 08:32:36.963 UTC [3403] xyz@mdp LOG: ould not receive data from client: Connection reset by peer
while using the below regular expression I am able to parse the logs like below,
**expression**:
(?<time_stamp>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})[\s]
UTC[\s][\[](?<qid>[^ ]*)[\]][\s]
(?<user>[^ ]*)[\@](?<schema>[^ ]*)[\s](?<level>[^:]*):(?<message>.*)
**Match 1**
time_stamp :2023-08-24 08:28:20.278
qid :26154
user: writeuser
schema: abc
level: STATEMENT
message: SELECT
**Match 2**
time_stamp: 2023-08-24 08:32:20.586
qid :13788
user: abc
schema: vx
level: LOG
message: could not receive data from client: Connection reset by peer
**Match 3**
time_stamp: 2023-08-24 08:32:20.586
qid: 13779
user: abc
schema: vx
level: LOG
message: could not receive data from client: Connection reset by peer
As we can see that the 1st log is only showing message as SELECT , any chance that I can capture the complete message.