I ran across this problem just now, while setting up rsnapshot, which uses rsync.
I can now reliably repeat the problem when rsnapshot invokes rsync on a very large directory (47,616 directory entries) for the second repeat rsnapshot of the same directory. The rsync was going from, and to, a local disk formatted using ext4.
The first time I saw it, the usual /usr/bin/rsync binary, stripped of symbols, "rsync version 3.2.3 protocol version 31" that is in my Xubuntu "22.04.1 LTS (Jammy Jellyfish)" release was invoked, and I did not have it under strace. So gdb backtraces were undecipherable to me. Running strace on the three "hung" rsync processes (parent, sender,and receiver) showed all three processes, after printing the "expand file_list pointer array", were looping on select(2) system calls, with 60 second timeouts.
So I rebuilt that same 3.2.3 version of rsync from source, and repeated the same test (after I figured out how to reproduce this), and got the following three gdb stack backtraces:
============== Backtrace #1 ==============
#0 0x00007f69b109b74d in __GI___select (nfds=nfds@entry=5, readfds=readfds@entry=0x7ffcd48d9c90, writefds=writefds@entry=0x7ffcd48d9d90, exceptfds=exceptfds@entry=0x7ffcd48d9d10, timeout=timeout@entry=0x7ffcd48d9c60)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000564b6ef0f544 in perform_io (needed=needed@entry=18, flags=flags@entry=2) at io.c:741
#2 0x0000564b6ef115f5 in write_buf (f=f@entry=4, buf=0x7ffcd48da019 "3dec1fdb7cfc0341_0", len=len@entry=18) at io.c:2125
#3 0x0000564b6eee9088 in send_file_entry (first_ndx=<optimized out>, ndx=<optimized out>, symlink_len=0, symlink_name=0x0, file=0x564b70627350,
fname=0x7ffcd48d9fd0 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", f=4) at flist.c:565
#4 send_file_name (f=f@entry=4, flist=flist@entry=0x564b6f6ae160, fname=fname@entry=0x7ffcd48dc150 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", stp=stp@entry=0x0, flags=flags@entry=65540,
filter_level=filter_level@entry=2) at flist.c:1604
#5 0x0000564b6eeea49d in send_directory (f=f@entry=4, flist=flist@entry=0x564b6f6ae160, fbuf=fbuf@entry=0x7ffcd48dc150 ".config/_some_application__that_I_use_/Cache/3dec1fdb7cfc0341_0", len=len@entry=72,
flags=flags@entry=65540) at flist.c:1839
#6 0x0000564b6eeea9b6 in send1extra (f=f@entry=4, file=file@entry=0x564b6f454a50, flist=flist@entry=0x564b6f6ae160) at flist.c:1992
#7 0x0000564b6eeeb21f in send_extra_file_list (f=f@entry=4, at_least=at_least@entry=1000) at flist.c:2078
#8 0x0000564b6eef8add in send_files (f_in=f_in@entry=5, f_out=f_out@entry=4) at sender.c:245
#9 0x0000564b6ef02b3c in client_run (f_in=5, f_out=4, pid=pid@entry=2780890, argc=argc@entry=1, argv=argv@entry=0x564b6efe65b0) at main.c:1317
#10 0x0000564b6eee2227 in start_client (argv=<optimized out>, argc=1) at main.c:1580
#11 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
============== Backtrace #2 ==============
#0 0x00007f69b109b74d in __GI___select (nfds=nfds@entry=2, readfds=readfds@entry=0x7ffcd48d7cd0, writefds=writefds@entry=0x7ffcd48d7dd0, exceptfds=exceptfds@entry=0x7ffcd48d7d50, timeout=timeout@entry=0x7ffcd48d7ca0)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000564b6ef0f544 in perform_io (needed=76, flags=flags@entry=4) at io.c:741
#2 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=len@entry=69, convert=<optimized out>) at io.c:966
#3 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=69, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#4 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3f368 "[%s] expand file_list pointer array to %s bytes, did%s move\n") at log.c:442
#5 0x0000564b6eee4445 in flist_expand (extra=<optimized out>, flist=0x564b6f41b3e0) at flist.c:309
#6 flist_expand (flist=0x564b6f41b3e0, extra=<optimized out>) at flist.c:287
#7 0x0000564b6eeec9bd in recv_file_list (f=3, dir_ndx=dir_ndx@entry=2960) at flist.c:2584
#8 0x0000564b6ef13243 in wait_for_receiver () at io.c:1699
#9 0x0000564b6ef0f934 in wait_for_receiver () at io.c:1677
#10 perform_io (needed=89, flags=flags@entry=4) at io.c:862
#11 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=len@entry=82, convert=<optimized out>) at io.c:966
#12 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=82, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#13 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3fba5 "%s is uptodate\n") at log.c:442
#14 0x0000564b6eeee433 in set_file_attrs (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, sxp=<optimized out>, sxp@entry=0x7ffcd48dc010,
fnamecmp=fnamecmp@entry=0x0, flags=<optimized out>) at rsync.c:661
#15 0x0000564b6eef3b78 in recv_generator (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, ndx=44459, itemizing=itemizing@entry=1, code=code@entry=FLOG,
f_out=f_out@entry=1) at generator.c:1805
#16 0x0000564b6eef4c95 in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#17 0x0000564b6ef01f7c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1106
#18 0x0000564b6ef026cc in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1219
#19 start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1253
#20 0x0000564b6ef0281b in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1226
#21 0x0000564b6ef232b9 in local_child (argc=2, argv=argv@entry=0x7ffcd48df490, f_in=f_in@entry=0x7ffcd48df3f0, f_out=f_out@entry=0x7ffcd48df3f4, child_main=child_main@entry=0x564b6ef02800 <child_main>) at pipe.c:166
#22 0x0000564b6eee21d2 in do_cmd (f_out_p=0x7ffcd48df3f4, f_in_p=0x7ffcd48df3f0, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:651
#23 start_client (argv=<optimized out>, argc=1) at main.c:1569
#24 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
============== Backtrace #3 ==============
#0 0x00007f69b109b74d in __GI___select (nfds=nfds@entry=2, readfds=readfds@entry=0x7ffcd48d7cd0, writefds=writefds@entry=0x7ffcd48d7dd0, exceptfds=exceptfds@entry=0x7ffcd48d7d50, timeout=timeout@entry=0x7ffcd48d7ca0)
at ../sysdeps/unix/sysv/linux/select.c:69
#1 0x0000564b6ef0f544 in perform_io (needed=76, flags=flags@entry=4) at io.c:741
#2 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=len@entry=69, convert=<optimized out>) at io.c:966
#3 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48d87e0 "[generator] expand file_list pointer array to 524288 bytes, did move\n", len=69, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#4 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3f368 "[%s] expand file_list pointer array to %s bytes, did%s move\n") at log.c:442
#5 0x0000564b6eee4445 in flist_expand (extra=<optimized out>, flist=0x564b6f41b3e0) at flist.c:309
#6 flist_expand (flist=0x564b6f41b3e0, extra=<optimized out>) at flist.c:287
#7 0x0000564b6eeec9bd in recv_file_list (f=3, dir_ndx=dir_ndx@entry=2960) at flist.c:2584
#8 0x0000564b6ef13243 in wait_for_receiver () at io.c:1699
#9 0x0000564b6ef0f934 in wait_for_receiver () at io.c:1677
#10 perform_io (needed=89, flags=flags@entry=4) at io.c:862
#11 0x0000564b6ef1070a in send_msg (code=code@entry=MSG_INFO, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=len@entry=82, convert=<optimized out>) at io.c:966
#12 0x0000564b6ef05d63 in rwrite (code=<optimized out>, code@entry=FCLIENT, buf=buf@entry=0x7ffcd48da8d0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0 is uptodate\n", len=82, is_utf8=<optimized out>, is_utf8@entry=0) at log.c:339
#13 0x0000564b6ef063e5 in rprintf (code=code@entry=FCLIENT, format=format@entry=0x564b6ef3fba5 "%s is uptodate\n") at log.c:442
#14 0x0000564b6eeee433 in set_file_attrs (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, sxp=<optimized out>, sxp@entry=0x7ffcd48dc010,
fnamecmp=fnamecmp@entry=0x0, flags=<optimized out>) at rsync.c:661
#15 0x0000564b6eef3b78 in recv_generator (fname=fname@entry=0x7ffcd48de1c0 ".config/_some_application__that_I_use_/Cache/Cache_Data/7543a11227262604_0", file=file@entry=0x564b702ccae8, ndx=44459, itemizing=itemizing@entry=1, code=code@entry=FLOG,
f_out=f_out@entry=1) at generator.c:1805
#16 0x0000564b6eef4c95 in generate_files (f_out=f_out@entry=1, local_name=local_name@entry=0x0) at generator.c:2318
#17 0x0000564b6ef01f7c in do_recv (f_in=<optimized out>, f_in@entry=0, f_out=f_out@entry=1, local_name=local_name@entry=0x0) at main.c:1106
#18 0x0000564b6ef026cc in do_server_recv (argv=<optimized out>, argc=<optimized out>, f_out=1, f_in=0) at main.c:1219
#19 start_server (f_in=f_in@entry=0, f_out=f_out@entry=1, argc=<optimized out>, argv=<optimized out>) at main.c:1253
#20 0x0000564b6ef0281b in child_main (argc=<optimized out>, argv=<optimized out>) at main.c:1226
#21 0x0000564b6ef232b9 in local_child (argc=2, argv=argv@entry=0x7ffcd48df490, f_in=f_in@entry=0x7ffcd48df3f0, f_out=f_out@entry=0x7ffcd48df3f4, child_main=child_main@entry=0x564b6ef02800 <child_main>) at pipe.c:166
#22 0x0000564b6eee21d2 in do_cmd (f_out_p=0x7ffcd48df3f4, f_in_p=0x7ffcd48df3f0, remote_argc=<optimized out>, remote_argv=<optimized out>, user=0x0, machine=<optimized out>, cmd=<optimized out>) at main.c:651
#23 start_client (argv=<optimized out>, argc=1) at main.c:1569
#24 main (argc=<optimized out>, argv=<optimized out>) at main.c:1812
===
Notice that the expected error message is being printed in Stack Backtrace #2, and that recv_file_list
() and flist_expand
() are on that stack.
These rsync instances were being invoked with, more or less, the following command line options, the way I had rsnapshot configured to invoke it:
rsync -av --delete-during --archive --verbose --one-file-system --hard-links --xattrs --sparse src_dir dest_dir
I will see if I can make further progress debugging this, and make another post here if I do.