1

When DataFusion runs a data pipeline, it persists in the provisioning state and then stops.

As a result, Dataproc cannot be created.

Dataproc's settings are as follows:

- Master
    - Number of masters : 1
    - Master Cores : 2
    - Master Memory(GB) : 4
    - Master Disk Size(GB) : 1000
- Worker
    - Number of Workers : 2
    - Worker Cores : 4
    - Worker Memory(GB) : 16
    - Worker Disk Size(GB) : 1500

In the data pipeline, the driver and executor are as follows:

- Executor
    - CPU : 2
    - Memory : 4
- Driver
    - CPU : 2
    - Memory : 4

If I actually look at dataproc in Google Cloud Console window, it will be provisioned and then disappear. Please share your opinion on how to solve this problem.

Quack
  • 680
  • 1
  • 8
  • 22
  • 1
    It seems you might not have followed the official documentation on granting the Dataproc Service Account User role to Cloud Data Fusion to allow it to provision and run pipelines on Dataproc clusters. Check it [here](https://cloud.google.com/data-fusion/docs/how-to/granting-service-account-permission). If you have, check the logs for leads on the potential error. Look for logs like `Failed to create cluster`, `Failed to initialize node`, `Failed to register node`,... and share any warning or error logs that you can find. – aemon4 Nov 16 '20 at 11:19

0 Answers0