Copying files from remote, limited to 1k Lines and with compression

Question

I want to fetch files from several remote machines. I want only files that match a specific regex and I want to keep the directory structure.

Considerations:

The files I want to fetch are plain text log files
There are probably many small files and some big ones
I only need at maximum the first 1000 lines of each file. I only need them to test parsing.
The storage on the remote machine is very limited
OS on the remote site is RHEL7
I have no control over the remote machine. There is just a (quite busy) collegue who can execute a shell-command or run a ansible-playbook on my request.

In a first approach I tried to fetch the files via ansible without limiting them to 1k lines. That failed, because of storage limitations on the remote machine. I used the following playbook:

---
- name: fetch logs
  hosts: remotemachines
  vars:
      local_path: "/tmp/demologs"
      remote_temp_path: "/tmp/logarchive.tgz"
      regex: 'MYREGEX'
      basedir: "MYBASEPATH"
  tasks:
    - name: looking for logfiles
      shell: "find {{ basedir }} -type f | egrep -E '{{ regex }}'"
      register: logfiles
    - name: compressing log files
      archive:
        dest: "{{ remote_temp_path }}"
        format: gz
        path: "{{ logfiles.stdout_lines }}"
    - name: fetching...
      fetch:
        src: "{{ remote_temp_path }}"
        dest: '{{ local_path }}/demologs_{{ ansible_hostname }}.tgz'
        flat: true
    - name: delete remote archive
      file:
        path: "{{ remote_temp_path }}"
        state: absent

Yes, I could skip the compression-part and fetch the files directly. But that might take a very long time, since there are many small files.

I would like to use a set of piped-together shell commands that take each file, limit it to 1k lines and add it to the compressed archive. But as far as I could find out, tar does not support adding something to a compressed archive.

I cannot think of any solutions that works without - possibly huge - temporary files. Since the storage is limited, that is no option.

You can try to [`slurp`](https://docs.ansible.com/ansible/latest/collections/ansible/builtin/slurp_module.html) the files rather than fetching them and then cut the result from the in memory slurped result (after b64decoding it) to 1k lines and store it on the controller. I have no idea how big your files really are so that could end up not being convenient at all or even fail (for a lack of memory...). But the real solution in that situation is to directly push the logs from all relevant machines to a centralized log solution. — Zeitounator, Mar 28 '21 at 19:25
@Zeitounator Actually Splunk is already in place. But there is an issue with some new config and I need those logs to debug the new config in test environment. I have not understood how `slup` could help me in this context. Could you provide a MWE? — user406482, Mar 30 '21 at 11:41
Written on spot, not tested at all, but to give you a rough idea: https://gist.github.com/zeitounator/42404a19c05c42bb5d500c1b3512d9bd — Zeitounator, Mar 30 '21 at 12:24
Since I'm not sure I totally got your problem (or at least have 2 ways to interpret it....): you could also read the file on the server, cut it to 1000 lines, compress it and add it already compressed to an uncompressed tar. Compression will not be as efficient but you will still gain quite a lot of space. And if space is a real constraint and your are dealing with a majority of text files, you should use bzip rather than gz if available. — Zeitounator, Mar 30 '21 at 12:33

Copying files from remote, limited to 1k Lines and with compression

0 Answers0