1

I'm trying to copy the contents of a large (~350 files, ~40MB total) directory from a Kubernetes pod to my local machine. I'm using the technique described here.

Sometimes it succeeds, but very frequently the standard output piped to the tar xf command on my host appears to get truncated. When that happens, I see errors like: <some file in the archive being transmitted over the pipe>: Truncated tar archive

The files in the source directory don't change. The file in the error message is usually different (ie: it appears to be truncated in a different place).

For reference (copied from the document lined to above), this is the analog to what I'm trying to do (I'm using a different pod name and directory names): kubectl exec -n my-namespace my-pod -- tar cf - /tmp/foo | tar xf - -C /tmp/bar

After running it, I expect the contents of my local /tmp/bar to be the same as those in the pod.

However, more often than not, it fails. My current theory (I have a very limited understanding of how kubectl works, so this is all speculation) is that when kubectl determines that the tar command has completed, it terminates -- regardless of whether or not there are remaining bytes in transit (over the network) containing the contents of standard output.

I've tried various combinations of:

  1. stdbuf
  2. Changing tar's blocking factor
  3. Making the command take longer to run (by adding && sleep <x>)

I'm not going to list all combinations I've tried, but this is an example that uses everything: kubectl exec -n my-namespace my-pod -- stdbuf -o 0 tar -b 1 -c -f - -C /tmp/foo . && sleep 2 | tar xf - -C /tmp/bar

There are combinations of that command that I can make work pretty reliably. For example, forgetting about stdbuf and -b 1 and just sleeping for 100 seconds, ie: kubectl exec -n my-namespace my-pod -- tar -c -f - -C /tmp/foo . && sleep 100 | tar xf - -C /tmp/bar

But even more experimentation led me to believe that the block size of tar (512 bytes, I believe?) was still too large (the arguments of -b are a count of blocks, not the size of those blocks). This is the command I'm using for now: kubectl exec -n my-namespace my-pod -- bash -c 'dd if=<(tar cf - -C /tmp/foo .) bs=16 && sleep 10' | tar xf - -C /tmp/bar

And yes, I HAD to make bs that small and sleep "that big" to make it work. But this at least gives me two variables I can mess with. I did find that if I set bs=1, I didn't have to sleep... but it took a LONG time to move all the data (one byte at a time).

So, I guess my questions are:

  1. Is my theory that kubectl truncates standard output after it determines the command given to exec has finished correct?
  2. Is there a better solution to this problem?
Ed MacDonald
  • 126
  • 1
  • 5
  • Perhaps it would have been more accurate to say that my theory is: The last chunk of data sent to standard output by tar (before it exits) appears to be in a race against time to get back to kubectl before kubectl learns that tar has finshed. My command enables me to keep the "chunks" small while independently being able to change how long I have to wait for them. – Ed MacDonald Dec 07 '22 at 00:50
  • Possibly networking related. Have you tried adding `z` to the `tar`'s to compress? You don't describe your use-case but it may be preferable to mount a persistent volume (backed by NFS or cloud storage) into the Pod **or** have the Pod create the archive and then upload that to cloud storage. You may wish to file an [issue](https://github.com/kubernetes/kubectl/issues?q=is%3Aissue+cp) on the `kubectl` repo. – DazWilkin Dec 07 '22 at 18:03
  • ...and possibly (though I suspect it won't be helpful in this case) add `--v=8` to get full log verbosity on the `kubectl` command. – DazWilkin Dec 07 '22 at 18:04

1 Answers1

1

Maybe you haven't been specific enough for regarding what the full command that it must contend with really is. There might be ambiguity as to who should be responsible for the pipe process. The "--" probably doesn't direct kubectl to include that as part of the command. That is probably being intercepted by the shell.

Have you tried wrapping all of it in double-quotes ?

CMD="tar cf - /tmp/foo | tar xf - -C /tmp/bar"

kubectl exec -n my-namespace my-pod -- "${CMD}"

That way it would include the scope of saving at the target as part of the process to monitor for completion.

Eric Marceau
  • 1,601
  • 1
  • 8
  • 11