Never successfully built a large hadoop&spark cluster

Question

I was wondering if anybody could help me with this issue in deploying a spark cluster using the bdutil tool. When the total number of cores increase (>= 1024), it failed all the time with the following reasons:

Some machine is never sshable, like "Tue Dec 8 13:45:14 PST 2015: 'hadoop-w-5' not yet sshable (255); sleeping"
Some nodes fail with an "Exited 100" error when deploying spark worker nodes, like "Tue Dec 8 15:28:31 PST 2015: Exited 100 : gcloud --project=cs-bwamem --quiet --verbosity=info compute ssh hadoop-w-6 --command=sudo su -l -c "cd ${PWD} && ./deploy-core-setup.sh" 2>>deploy-core-setup_deploy.stderr 1>>deploy-core-setup_deploy.stdout --ssh-flag=-tt --ssh-flag=-oServerAliveInterval=60 --ssh-flag=-oServerAliveCountMax=3 --ssh-flag=-oConnectTimeout=30 --zone=us-central1-f"

In the log file, it says:

hadoop-w-40: ==> deploy-core-setup_deploy.stderr <==

hadoop-w-40: dpkg-query: package 'openjdk-7-jdk' is not installed and no information is available

hadoop-w-40: Use dpkg --info (= dpkg-deb --info) to examine archive files,

hadoop-w-40: and dpkg --contents (= dpkg-deb --contents) to list their contents.

hadoop-w-40: Failed to fetch http://httpredir.debian.org/debian/pool/main/x/xml-core/xml-core_0.13+nmu2_all.deb Error reading from server. Remote end closed connection [IP: 128.31.0.66 80]

hadoop-w-40: E: Unable to fetch some archives, maybe run apt-get update or try with --fix-missing?

I tried 16-core 128-nodes, 32-core 64-nodes, 32-core 32-nodes and other over 1024-core configurations, but either the above Reason 1 or 2 will show up.

I also tried to modify the ssh-flag to change the ConnectTimeout to 1200s, and change bdutil_env.sh to set the polling interval to 30s, 60s, ..., none of them works. There will be always some nodes which fail.

Here is one of the configurations that I used:

time ./bdutil \ --bucket $BUCKET \ --force \ --machine_type n1-highmem-32 \ --master_machine_type n1-highmem-32 \ --num_workers 64 \ --project $PROJECT \ --upload_files ${JAR_FILE} \ --env_var_files hadoop2_env.sh,extensions/spark/spark_env.sh \ deploy

score 0 · Answer 1 · answered Dec 10 '15 at 02:51

To summarize some of the information that came out from a separate email discussion, as IP mappings change and different debian mirrors get assigned, there can be occasional problems where the concurrent calls to apt-get install during a bdutil deployment can either overload some unbalanced servers or trigger DDOS protections leading to deployment failures. These do tend to be transient, and at the moment it appears I can deploy large clusters in zones like us-east1-c and us-east1-d successfully again.

There are a few options you can take to reduce the load on the debian mirrors:

Set MAX_CONCURRENT_ASYNC_PROCESSES to a much smaller value than the default 150 inside bdutil_env.sh, such as 10 to only deploy 10 at a time; this will make the deployment take longer, but would lighten the load as if you just did several back-to-back 10-node deployments.
If the VMs were successfully created but the deployment steps fail, instead of needing to retry the whole delete/deploy cycle, you can try ./bdutil <all your flags> run_command -t all -- 'rm -rf /home/hadoop' followed by ./bdutil <all your flags> run_command_steps to just run through the whole deployment attempt.
Incrementally build your cluster using resize_env.sh; initially set --num_workers 10 and deploy your cluster, and then edit resize_env.sh to set NEW_NUM_WORKERS=20, and run ./bdutil <all your flags> -e extensions/google/experimental/resize_env.sh deploy and it will only deploy the new workers 10-20 without touching those first 10. Then you just repeat, adding another 10 workers to NEW_NUM_WORKERS each time. If a resize attempt fails, you simply ./bdutil <all your flags> -e extensions/google/experimental/resize_env.sh delete to only delete those extra workers without affecting the ones you already deployed successfully.

Finally, if you're looking for more reproducible and optimized deployments, you should consider using Google Cloud Dataproc, which lets you use the standard gcloud CLI to deploy cluster, submit jobs, and further manage/delete clusters without needing to remember your bdutil flags or keep track of what clusters you have on your client machine. You can SSH into Dataproc clusters and use them basically the same as bdutil clusters, with some minor differences, like Dataproc DEFAULT_FS being HDFS so that any GCS paths you use should fully-specify the complete gs://bucket/object name.

Never successfully built a large hadoop&spark cluster

1 Answers1