5

With the standard dataproc image 1.5 (Debian 10, Hadoop 2.10, Spark 2.4), a dataproc cluster cannot be created. Region is set to europe-west-2.

The stack-driver log says:

"Failed to initialize node <name of cluster>-m: Component hdfs failed to activate See output in: gs://.../dataproc-startup-script_output"

Scanning through the output (gs://.../dataproc-startup-script_output), I can see the hdfs activation has failed:

Aug 18 13:21:59 activate-component-hdfs[2799]: + exit_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -ne 0 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + log_and_fail hdfs 'Component hdfs failed to activate' 1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local component=hdfs
Aug 18 13:21:59 activate-component-hdfs[2799]: + local 'message=Component hdfs failed to activate'
Aug 18 13:21:59 activate-component-hdfs[2799]: + local error_code=1
Aug 18 13:21:59 activate-component-hdfs[2799]: + local client_error_indicator=
Aug 18 13:21:59 activate-component-hdfs[2799]: + [[ 1 -eq 2 ]]
Aug 18 13:21:59 activate-component-hdfs[2799]: + echo 'StructuredError{hdfs, Component hdfs failed to activate}'
Aug 18 13:21:59 activate-component-hdfs[2799]: StructuredError{hdfs, Component hdfs failed to activate}
Aug 18 13:21:59 activate-component-hdfs[2799]: + exit 1

What am I missing?

EDIT

As @Dagang suggested, I ssh-ed into the master node and ran grep "activate-component-hdfs" /var/log/dataproc-startup-script.log. The output is here.

tak
  • 85
  • 6
  • A few questions: Does it happen in a consistent manner? What is the size of the cluster and which machines are you using? Are there any additional initialization actions you have added? – David Rabinowitz Aug 18 '20 at 16:27
  • For this, I'm using all the default options, except the image. `n1-standard-4` for the master and the 2 workers. 500GB standard persistent disks for all the nodes. No custom initialization. The default image is version 1.3 but I want to use version 1.5. I've tried a handful of times but all of them failed with the same error. – tak Aug 18 '20 at 16:45
  • You should be able to find the failure reason in the log, just filter by "activate-component-hdfs". You can also ssh into the master node then run `/var/log/dataproc-startup-script.log`. – Dagang Aug 18 '20 at 17:49
  • I tried, but couldn't reproduce the problem with 1.5. – Dagang Aug 18 '20 at 18:30
  • Hi @tak, I'm afraid I was not able to reproduce this on a 1.5 cluster. Can you please add the log that Dagang had asked to the question? – David Rabinowitz Aug 18 '20 at 18:54
  • I believe the problem is you have a user name called "pete{" on which the `mkdir` command failed. Could you double check if this user is created intentionally or accidentally? – Henry Gong Aug 19 '20 at 18:26
  • Thank you @HenryGong. After removing the user, I can start a dataproc cluster without a problem. Please turn your comment to an answer and I will accept it. – tak Aug 20 '20 at 09:10
  • @HenryGong Where should we remove this user from? We are facing similar problems but we dont have these users in IAM. – Sourabh Jain Feb 19 '21 at 16:28

1 Answers1

3

So the problem is there is an user name called "pete{" on which the hadoop fs -mkdir -p command failed. These kind of user names with special chars especially open parenthesis e,g,"()[]{}" will potentially fail the HDFS activation step during cluster creation.

So the easy solution is just to remove those accidentally created user.

Henry Gong
  • 306
  • 1
  • 3