3

I've set up a packer template to generate vagrant base image of FreeBSD 10.3 and it was working well at least Mon Oct 3 00:34:41 2016 +0300.

Yesterday I was going to continue my work on this project and it turned out this is not working anymore. So here come details.

Packer does what it have to do, then runs my script to install FreeBSD by using bsdinstall(8) with the following script:

PARTITIONS="ada0 { 29G freebsd-ufs /, 5G freebsd-swap, 10G freebsd-ufs /var }"
DISTRIBUTIONS="base.txz kernel.txz"
#!/bin/sh
echo 'WITHOUT_X11="YES"' >> /etc/make.conf
echo 'OPTIONS_UNSET=X11' >> /etc/make.conf
echo 'nameserver 8.8.8.8' >> /etc/resolv.conf
cat >> /etc/rc.conf <<EOF
ifconfig_em0="DHCP"
sshd_enable="YES"
dumpdev="NO"
EOF

env ASSUME_ALWAYS_YES=1 pkg bootstrap #       <<stops here
pkg update    
pkg install -y sudo

[.....snip.....]

reboot

This stops at bootstrapping pkg with the message:

Bootstrapping pkg from pkg+http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly, please wait...
Signature for pkg not available.
pkg: Error fetching http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz.sig: Connection reset by peer
A pre-built version of pkg could not be found for your system.
Consider changing PACKAGESITE or installing it from ports: 'ports-mgmt/pkg'.

If I stop the bsdinstall script and chroot /mnt /bin/sh I can fetch pkg.txz.sig from the above URL without any problems.

Any ideas what could be the reason of the "connection reset by peer"? Something was changed on the pkg.FreeBSD.org recently?

I couldn't find anything about the issue.


UPD1

Looking at the captured traffic -- the site really answers 200OK and then drops the connection for the pkg.txz.sig file.

But this 200OK packet contains the signature file and they are identical for both manual fetch (which succeeds) and pkg bootstrap (which fails)

Both sessions are identical, so this is likely not a networking problem.


UPD2

The truss was not helpful either.

So as a workaround I've just modified my bsdinstall script to fetch files manually:

[.....snip.....]

#env ASSUME_ALWAYS_YES=1 pkg bootstrap
fetch http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz
fetch http://pkg.FreeBSD.org/FreeBSD:10:amd64/quarterly/Latest/pkg.txz.sig
pkg add pkg.txz
pkg update

[.....snip.....]

PS: The only thing that I can suspect now is the virtualbox version update... anyway downgrading is not an option. (ISO checksum is hardcoded into the template, the template and scripts are in git repository, so accidential changes are impossible)


UPD3

I've set up a debugging environment, for the moment I only isolated the function where the error is raised.

It's the second buffer refill from the http connection (while the first one has already read 727 bytes - it should be EOF)...

Here is small gdb log with backtrace and breakpoints to get there. Added tcpdump capture made on the system (wireshark compatible).

Eugene Petrov
  • 1,578
  • 1
  • 10
  • 22
  • Maybe ask on freebsd-pkg@FreeBSD.org They maintain pkg.FreeBSD.org and the pkg tools – arved Oct 31 '16 at 14:10
  • @arved this seems to be either libfetch or `em` driver issue(+ virtualbox networking probably) , yes I'll get in contact with maintainers – Eugene Petrov Oct 31 '16 at 15:07

1 Answers1

1

As I found out, partially the problem was with pkg -- they try to read 10240 bytes from the connection, expecting the EOF if file will be smaller, but somehow on my system EOF is not set when whole remote file was already read out.

# /release/10.3.0/usr.sbin/pkg/pkg.c

185 char buf[10240];

242 while ((r = fread(buf, 1, sizeof(buf), remote)) > 0) { 

and the following loops twice -- first time reading the file, second time getting connection reset error instead of EOF

# /release/10.3.0/lib/libc/stdio/fread.c

94          resid = count * size;                        # == 10240 here

100         while (resid > (r = fp->_r)) { 
101                 (void)memcpy((void *)p, (void *)fp->_p, (size_t)r);
102                 fp->_p += r;
103                 /* fp->_r = 0 ... done in __srefill */
104                 p += r;
105                 resid -= r;
106                 if (__srefill(fp)) {
107                         /* no more input: return partial result */
108                         return ((total - resid) / size);
109                 }
110         }

While manual fetch succeeds because the size is adjusted for small chunks and they only ask 727 bytes to read:

# /release/10.3.0/usr.bin/fetch/fetch.c

720                 if (us.size != -1 && us.size - count < B_size &&
721                     us.size - count >= 0)
722                         size = us.size - count;
723                 else
724                         size = B_size;

733                 if ((readcnt = fread(buf, 1, size, f)) < size) { 

...but why EOF is not set is still a question.

Posted this to freebsd-pkg mailing list.


UPD1

Downgraded Virtualbox from 5.028 to 5.026 and EOF is set, _sread() on libc/stdio/refill.c:135 returns 0 and it sets EOF on line 138.

So something was changed in Virtualbox networking too. Added pcap file for Virtualbox 5.026 to the gist. 5.028 really was the culprit of connection reset - here is captures comparison.

Virtualbox 5.1.8 has this bug too. Version 5.1.6 works ok.

Opened ticket #16141 in their bugtracker.

Eugene Petrov
  • 1,578
  • 1
  • 10
  • 22
  • 1
    I wanted to leave my VB alone, so I dled ports.txz, and tried to compile a simple port. Pkg was bootstrapped as a part of this, solving the problem. So this can be a workaround other than changing versions – Marco van de Voort Nov 13 '16 at 14:19
  • There will be lots of bugs with nat adapter on these versions. But it seems they've fixed it in 5.1.9 r111846 (available on the [test builds](https://www.virtualbox.org/wiki/Testbuilds) page ). Not tried it yet, but in the bugtracker people say it works ok. – Eugene Petrov Nov 13 '16 at 14:33