I want to fetch files from several remote machines. I want only files that match a specific regex and I want to keep the directory structure.
Considerations:
- The files I want to fetch are plain text log files
- There are probably many small files and some big ones
- I only need at maximum the first 1000 lines of each file. I only need them to test parsing.
- The storage on the remote machine is very limited
- OS on the remote site is RHEL7
- I have no control over the remote machine. There is just a (quite busy) collegue who can execute a shell-command or run a ansible-playbook on my request.
In a first approach I tried to fetch the files via ansible without limiting them to 1k lines. That failed, because of storage limitations on the remote machine. I used the following playbook:
---
- name: fetch logs
hosts: remotemachines
vars:
local_path: "/tmp/demologs"
remote_temp_path: "/tmp/logarchive.tgz"
regex: 'MYREGEX'
basedir: "MYBASEPATH"
tasks:
- name: looking for logfiles
shell: "find {{ basedir }} -type f | egrep -E '{{ regex }}'"
register: logfiles
- name: compressing log files
archive:
dest: "{{ remote_temp_path }}"
format: gz
path: "{{ logfiles.stdout_lines }}"
- name: fetching...
fetch:
src: "{{ remote_temp_path }}"
dest: '{{ local_path }}/demologs_{{ ansible_hostname }}.tgz'
flat: true
- name: delete remote archive
file:
path: "{{ remote_temp_path }}"
state: absent
Yes, I could skip the compression-part and fetch the files directly. But that might take a very long time, since there are many small files.
I would like to use a set of piped-together shell commands that take each file, limit it to 1k lines and add it to the compressed archive. But as far as I could find out, tar
does not support adding something to a compressed archive.
I cannot think of any solutions that works without - possibly huge - temporary files. Since the storage is limited, that is no option.