It's easy to find well regarded references stating that HDFS should not be stretched across data centers [1], while Kafka should be stretched [2].
What specific issues make HDFS ill-suited to being stretched?
I'm considering stretching HDFS across two DCs that are less than 50km apart, with an average latency of less than 1ms. I'm planning on running a soak test spanning a couple of weeks, with representative read and write workloads, but with volumes of a few hundred GB - orders of magnitude less than the cluster will store in a few years.
If the tests succeed, what level of confidence does this provide that stretching HDFS is likely to succeed? Specifically, are issues related to the relatively long inter-host latency likely to be hidden; that such issues would only be exposed with far larger volumes e.g. a couple of hundred TB?
Finally, if the inter-DC latency spikes e.g. to 10ms for a few minutes, what issues I am likely to encounter?
[1] Tom White: Hadoop: The Definitive Guide