Message appears as this string in S3 bucket: '\u00001\u00001\u0000/\u00002\u00003\u0000/\u00002\u00000\u00002\u00001\u0000 \u00001\u00007\u0000:\u00004\u00005\u0000,\u0000s\u0000e\u0000v\u0000e\u0000r\u0000i\u0000t\u0000y\u0000l\u0000e\u0000v\u0000e\u0000l\u0000,\u0000T\u0000h\u0000i\u0000s\u0000 \u0000i\u0000s\u0000 \u0000a\u0000 \u0000t\u0000e\u0000s\u0000t\u0000 \u0000m\u0000e\u0000s\u0000s\u0000a\u0000g\u0000e\u0000 \u0000'
<source>
@type tail
path PATH_TO_LOG_FILE
pos_file PATH_TO_LOG_FILE.pos
read_from_head true
tag test
<parse>
@type none
</parse>
</source>
<filter test>
@type record_transformer
enable_ruby true
<record>
message ${ record["message"].gsub(/(\\u\d{4})/, "") }
</record>
</filter>
<match test>
@type s3
aws_key_id KEY_ID
aws_sec_key SEC_KEY
s3_bucket S3_BUCKET
s3_region S3_REGION
#path logs/
<buffer tag,time>
@type file
path PATH_TO_BUFFER
timekey 60 # 1 hour partition
timekey_wait 10s
chunk_limit_size 256m
</buffer>
</match>
For some reason the filter isn't swapping out '\u0000' with ''. I've tried string interpolation, gsub!, and a couple other things. When I put that line into an online compiler, the gsub appears to work fine. I was inspired by: How to remove unicode in fluentd tail/s3 plugin