We have a merge publication which uses a range of replication methods across ~80 articles (host_name filtering on one table, join filtering for several others; some tables using bi-directional synchronization direction, others with 'download only, prohibit subscriber changes'; some tables using identity range management, others not needing it).
We are using push replication to subscriber databases which already have all the necessary tables, so we are using 'delete data' for the 'action if name is in use'. The tables have identical schemas on both the subscriber and publication databases, but are empty until replication has initiated.
The issue is that sometimes initialization of the subscription takes ~3 minutes, but other times it is timing out after ~20 minutes, using identically templated subscriber databases, and identical starting datasets (~10,000 records). And after initialization, when synchronizing, instead of taking ~5 seconds, it's again taking ~20 minutes. And looking at the history in the replication monitor, the synchronization history says it's making 1000's of schema changes (even when there have been no data changes).
I turned on verbose logging to see what the schema changes are, and it seems to be looping repeatedly through all the tables, turning all the constraints off, and then back on again. I'm at a loss as to why it's doing this.
Note in case it's relevant: I have been using a 100 character unicode string (created randomly from the full unicode range of characters) as the host_name for different subscribers. I suspected this might be causing the issue, but I have since reproduced the problem using a 50 character lowercase letter string.
Finally, all servers are hosted in the same data center, so I do not think network latency is an issue.