4

Let’s say I have two tar archives generated in slightly different way:

$ tar tvf archive1.tar 
-rw-r--r-- root/root 567 2016-09-18 14:28 member1
-rw-r--r-- root/root 1696 2016-09-18 14:28 member2
$ tar tvf archive2.tar 
-rw-r--r-- root/root 567 2016-09-18 14:28 ./member1
-rw-r--r-- root/root 1696 2016-09-18 14:28 ./member2

How to extract member1 reliably from either of two archives? I’m receiving the tar over a pipe, the generator isn’t under my control, and while I can run the pipeline multiple times, I’d really like to avoid doing so without necessity.

P.S.:

$ tar xOvf archive1.tar member1 > /dev/null
member1
$ tar xOvf archive2.tar member1 > /dev/null
tar: member1: Not found in archive
tar: Exiting with failure status due to previous errors
andrewsh
  • 141
  • 3
  • BSD tar, apparently, doesn't have the problem: it worked for me both ways. When I needed a GNU tar format, I ended up mentioning both `member1` and `./member1`. It exists with an error code, but does the extraction; so I have no guarantee extraction worked. Did you find a solution? – Victor Sergienko Dec 11 '18 at 21:20
  • No, I did not; in fact, I haven’t touched that code since that day :D – andrewsh Dec 22 '18 at 14:14
  • What I was trying to do was write a script which would dump the contents of `.deb` and `.ipk` of all existing subformats without requiring `dpkg-deb`. There is such a script part of MC, but it only supports standard-compliant `.deb` files and requires dpkg. Many `.ipk` files do not follow the standard in multiple ways (e.g. using tar instead of ar, packing members in a different order, not providing `debian-binary`), which makes that script fail. – andrewsh Dec 22 '18 at 14:15
  • Unfortunately, it turned out, the root of the archive is another thing lots of packages (and also `ar` and `tar`) cannot agree on: many on them come with `./`, while others don’t have this prefix. – andrewsh Dec 22 '18 at 14:18
  • Probably writing such things in shell is a bad idea anyway. – andrewsh Dec 22 '18 at 14:19

2 Answers2

3

It looks like the command line switch --no-anchored may do what you want. From the tar(1) man page (they really are very useful to read or a least scan)

--no-anchored
patterns match after any '/' (default for exclusion)

tar -tvf test
-rw-rw-r-- iain/iain         0 2016-09-18 16:14 ./member1
-rw-rw-r-- iain/iain         0 2016-09-18 16:14 ./member2

Then

tar -xvf test  member1
tar: member1: Not found in archive
tar: Exiting with failure status due to previous errors

And then

tar -xvf test --no-anchored member1
./member1
user9517
  • 115,471
  • 20
  • 215
  • 297
  • 3
    I thought about this, but this will work only when the archive is guaranteed to not have `member1` anywhere else in the hierarchy, which covers only one of the two cases I have. And sure, obviously I have studied the manpage before posting the question :) – andrewsh Sep 18 '16 at 15:31
  • 1
    You did not state that member1 could appear anywhere in the path or multiple times. – user9517 Sep 18 '16 at 15:48
  • It's at least worth mentioning that a solution will work for MCVE, but not in a general case. – Victor Sergienko Dec 11 '18 at 21:17
1

I've been in the same boat. tar is so bafflingly underdocumented for a de-facto standard utility! Most of us do not have Dan around, so unfortunately...

My use case involves tar archives, some of which produced by software out of my control, and the task at hand is extract the top-level directory named either './opt/' or 'opt/'. This is part of an automated VM imaging process, that I obviously want to work robustly.

What I came up with is far from a general solution, but does the thing I need, in my limited scope. It won't generalize well if you need to extract a large number of top-level directories, and dealing with cases where the names of the top-level directories are not known in advance is tricky.

GNU tar has a --transform= option that takes a sed s operator, but unfortunately, command-line pattern match is performed before the transform is applied. However, building upon @user9517's answer which I gladly upvote, I came up with the following elegant solution horrible hack that does the job anyway. The idea is to both match opt/ anywhere in path and use the transform to redirect files that happen to spuriously match opt/ anywhere except the top level into a temporary path with a transform, to be deleted after extracting. The full command is like this:

tar xvf $tarfile --no-anchored \
                 --transform='s:^\(\./\)\?[^o][^p][^t]/:.deleteme/&:' \
                 --show-transformed-names \
        opt/

The sed-like transform breaks down like

^              only at the path start
\(\./\)\?      match ./ if present
[^o][^p][^t]   followed by any three characters that do not make "opt"
/              followed by /

The & in the substitution part stands for the original path, to avoid any possible clashes. The v and --show-transformed-names are only for logging the extracted names.


POC. Note the directory sys/opt/ that is spurious and unwanted. The cwd was cleaned before each extraction.

A tar packaged with the ./ prefix:

kkm@buba:~/.tmp/tarext$ tar tvf ../tar.withdot.tar
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 ./
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 ./dev/
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./dev/file1.dev
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./dev/file2.dev
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./dev/file3.dev
drwxr-xr-x kkm/kkm           0 2020-01-07 21:29 ./opt/
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./opt/file1.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./opt/file2.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./opt/file3.opt
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 ./sys/
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./sys/file1.sys
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./sys/file2.sys
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 ./sys/file3.sys
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 ./sys/opt/
-rw-r--r-- kkm/kkm         100 2020-01-07 20:45 ./sys/opt/file1.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 20:45 ./sys/opt/file2.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 20:45 ./sys/opt/file3.opt
kkm@buba:~/.tmp/tarext$ tar xvf ../tar.withdot.tar --no-anchored --transform='s:^\(\./\)\?[^o][^p][^t]/:.deleteme/&:' --show-transformed-names opt/
./opt/
./opt/file1.opt
./opt/file2.opt
./opt/file3.opt
.deleteme/sys/opt/
.deleteme/sys/opt/file1.opt
.deleteme/sys/opt/file2.opt
.deleteme/sys/opt/file3.opt
kkm@buba:~/.tmp/tarext$ rm -rf .deleteme/ ; ls -RA
.:
opt

./opt:
file1.opt  file2.opt  file3.opt

A tar packaged without a prefix:

kkm@buba:~/.tmp/tarext$ tar tvf ../tar.nodot.tar
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 dev/
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 dev/file1.dev
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 dev/file2.dev
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 dev/file3.dev
drwxr-xr-x kkm/kkm           0 2020-01-07 21:29 opt/
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 opt/file1.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 opt/file2.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 opt/file3.opt
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 sys/
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 sys/file1.sys
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 sys/file2.sys
-rw-r--r-- kkm/kkm         100 2020-01-07 19:09 sys/file3.sys
drwxr-xr-x kkm/kkm           0 2020-01-07 21:26 sys/opt/
-rw-r--r-- kkm/kkm         100 2020-01-07 20:45 sys/opt/file1.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 20:45 sys/opt/file2.opt
-rw-r--r-- kkm/kkm         100 2020-01-07 20:45 sys/opt/file3.opt
kkm@buba:~/.tmp/tarext$ tar xvf ../tar.nodot.tar --no-anchored --transform='s:^\(\./\)\?[^o][^p][^t]/:.deleteme/&:' --show-transformed-names opt/
opt/
opt/file1.opt
opt/file2.opt
opt/file3.opt
.deleteme/sys/opt/
.deleteme/sys/opt/file1.opt
.deleteme/sys/opt/file2.opt
.deleteme/sys/opt/file3.opt
kkm@buba:~/.tmp/tarext$ rm -rf .deleteme/ ; ls -RA
.:
opt

./opt:
file1.opt  file2.opt  file3.opt