CloudWatch logs acting weird

Question

I have two log files with multi-line log statements. Both of them have same datetime format at the begining of each log statement. The configuration looks like this:

state_file = /var/lib/awslogs/agent-state

[/opt/logdir/log1.0]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log1.0
log_stream_name = /opt/logdir/logs/log1.0
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group


[/opt/logdir/log2-console.log]
datetime_format = %Y-%m-%d %H:%M:%S
file = /opt/logdir/log2-console.log
log_stream_name = /opt/logdir/log2-console.log
initial_position = start_of_file
multi_line_start_pattern = {datetime_format}
log_group_name = my.log.group

The cloudwatch logs agent is sending log1.0 logs correctly to my log group on cloudwatch, however, its not sending log files for log2-console.log.

awslogs.log says:

2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196444000, 'start_position': 42330916L, 'end_position': 42331504L}, reason: timestamp is more than 2 hours in future.
2016-11-15 08:11:41,308 - cwlogs.push.batch - WARNING - 3593 - Thread-4 - Skip event: {'timestamp': 1479196451000, 'start_position': 42331504L, 'end_position': 42332092L}, reason: timestamp is more than 2 hours in future.

Though server time is correct. Also weird thing is Line numbers mentioned in start_position and end_position does not exist in actual log file being pushed.

Anyone else experiencing this issue?

I have the same effect and still looking for a solution. Restarting the service didn't help. BTW: start_position and end_position are not line numbers but byte positions. — Björn Weinbrenner, Dec 14 '16 at 09:56

score 18 · Accepted Answer · answered Dec 14 '16 at 12:44

I was able to fix this.

The state of awslogs was broken. The state is stored in a sqlite database in /var/awslogs/state/agent-state. You can access it via

sudo sqlite3 /var/awslogs/state/agent-state

sudo is needed to have write access.

List all streams with

select * from stream_state;

Look up your log stream and note the source_id which is part of a json data structure in the v column.

Then, list all records with this source_id (in my case it was 7675f84405fcb8fe5b6bb14eaa0c4bfd) in the push_state table

select * from push_state where k="7675f84405fcb8fe5b6bb14eaa0c4bfd";

The resulting record has a json data structure in the v column which contains a batch_timestamp. And this batch_timestamp seams to be wrong. It was in the past and any newer (more than 2 hours) log entries were not processed anymore.

The solution is to update this record. Copy the v column, replace the batch_timestamp with the current timestamp and update with something like

update push_state set v='... insert new value here ...' where k='7675f84405fcb8fe5b6bb14eaa0c4bfd';

Restart the service with

sudo /etc/init.d/awslogs restart

I hope it works for you!

But you get the warning "...reason: timestamp is more than 2 hours in future."? Does restarting the service with "sudo /etc/init.d/awslogs restart" help? — Björn Weinbrenner, Aug 02 '17 at 06:33
Hey do you have some way to force reset cloudwatch logs ? It seems I have this problem on several machines and I cannot really afford to log into every machine and do per-instance. I'm okay with losing previously non-synchronized logs. When such problems occur my disk space seems to fills by 1GB every hour so my web service just dies overnight... — Cyril Duchon-Doris, Mar 07 '18 at 08:52
This happens again and again. Can't do this everytime manually — Reyansh Kharga, Jul 06 '20 at 15:53

score 7 · Answer 2 · edited Aug 30 '17 at 21:33

7

We had the same issue and the following steps fixed the issue.

If log groups are not updating with latest events: Run These steps:

Stopped the awslogs service
Deleted file /var/awslogs/state/agent-state
Updated /var/awslogs/etc/awslogs.conf configuration from hostaname to instance ID Ex:
```
log_stream_name = {hostname} to log_stream_name = {instance_id}   
```
Started awslogs service.

edited Aug 30 '17 at 21:33

Sᴀᴍ Onᴇᴌᴀ

8,218
8
36
58

answered Aug 30 '17 at 21:13

Rajasekhar Vesangi

71
1
1

2

I don't know if this one is elegant, but it works for me and I consider it faster and easier to do than the accepted answer. I would like to add, that for me the agent-state is under /var/lib/awslogs/state/. You can see where is this file in you /etc/awslogs/awslogs.conf file – Simon Ernesto Cardenas Zarate Aug 16 '18 at 00:01
Following this does help and restart the process but this problem happens time to time and I have to it again and again. My concern is how do I prevent it happening in the first place? – Affan Shahab May 14 '20 at 19:29
This works for me. I think step 3 is not required. According to log of awslogs, the agent does not push log entries which are older than 14 days, when we do step 4. – t_motooka Mar 21 '21 at 06:27

score 0 · Answer 3 · answered Oct 31 '17 at 16:05

I was able to resolve this issue on Amazon Linux by:

sudo yum reinstall awslogs
sudo service awslogs restart

This method retained my config files in /var/awslogs/, though you may wish to back them up before a reinstall.

Note: In my troubleshooting, I had also deleted my Log Group via the AWS Console. The restart fully reloaded all historical logs, but at the present timestamp, which is of less value. I'm unsure if deleting the Log Group was this was necessary for this method to work. You might want to look at setting the initial_position config to end_of_file before you restart.

score 0 · Answer 4 · edited Sep 01 '20 at 10:23

0

I found the reason. The time zone in my docker container is inconsistent with the time zone of my host computer. After setting the two time zones to be consistent, the problem is solved

edited Sep 01 '20 at 10:23

Alon Eitan

11,997
8
49
58

answered Sep 01 '20 at 10:22

simon

1

CloudWatch logs acting weird

4 Answers4

Linked