2

I am using syncoid from sanoid project to create copies of ZFS filesystems on a different machine in my test environment (a couple of Raspberry Pi)

I messed up with a snapshot on origin machine: one server panicked during a snapshot transfer and later I deleted the snapshot that was being transferred.

I manually created a new snapshot and successfully restored it on target.

Now, when I run syncoid on target server using:

 ${SYNCOID} --sshkey="${SSH_KEY}" root@${REMOTE_SERVER}:${SRC_POOL}/${SAMPLE_FILESYSTEM} ${DEST_POOL}

it complains it cannot resume a send/receive transaction

During normal operations, syncoid retrieves the receive_resume_token on target machine:

/usr/local/sbin/zfs get -H receive_resume_token 'destpool/samplefs'

If it finds one, it tries to retrieve the snapshot corresponding to that token on source machine:

ssh sourceserver zfs send -t (token stored in receive_resume_token retrieved above) | (network stuff...) | zfs receive -s -F 'destpool/samplefs'

cannot resume send: 'sourcepool/samplefs@samplesnap' used in the initial send no longer exists

The only way to have it working is adding the "--no-resume" flag to syncoid command. This is not what i want, since some filesystems are very large and systems crashes are likely in this invironment.

I tried to clear that token by running:

 zfs recv -A 'srcpool/samplefs'

on source machine, and:

 zfs recv -A 'destpool/samplefs'

on target machine, i get:

srcpool/samplefs does not have any resumable receive state to abort

(on target machine it is destpool/samplefs)

Question is: Is there a way to clear the receive_resume_token attribute on target filesystem?

Please note that this problem is only present with ONE filesystem. There are many other working transfers on both machines in both directions using the same commands set.

Qippur
  • 135
  • 1
  • 10

1 Answers1

1

If zfs recv -A does not help, you can try destroying (or renaming) the destination dataset and resynching it.

Please also note the using syncoid with the --no-resume option should not be a problem: even on large datasets, incremental updates are generally quite small and do not benefit from resume support (which, on contrary, can be useful for the first, full sync).

shodanshok
  • 47,711
  • 7
  • 111
  • 180
  • There are cases when datasets and/or deltas are very large. The chance to resume would be helpful. The question is about how to regain the availability of the resume feature in a situation as described. Your answer bypasses the problem in two ways: either by starting from scratch (not viable if the filesystem is large enough to require longer than available time) or by using --no-resume (not viable if deltas are large enough to require longer than available time) – Qippur Jun 28 '18 at 15:14