1

My first post here and I'm new to Data Fusion and I'm with low to no coding skills.

I want to get data from ZohoCRM to BigQuery. Module from ZohoCRM (e.g. accounts, contacts...) to be a separate table in BigQuery.

To connect to Zoho CRM I obtained a code, token, refresh token and everything needed as described here https://www.zoho.com/crm/developer/docs/api/v2/get-records.html. Then I ran a successful get records request as described here via Postman and it returned the records from Zoho CRM Accounts module as JSON file.

I thought it will be all fine and set the parameters in Data Fusion DataFusion_settings_1 and DataFusion_settings_2 it validated fine. Then I previewed and ran the pipeline without deploying it. It failed with the following info from the logs logs_screenshot. I tried to manually enter a few fields in the schema when the format was JSON. I tried changing the format to csv, nether worked. I tried switching the Verify HTTPS Trust Certificates on and off. It did not help.

I'd be really thankful for some help. Thanks.

Update, 2020-12-03

I got in touch with Google Cloud Account Manager, who then took my question to their engineers and here is the info

The HTTP plugin can be used to "fetch Atom or RSS feeds regularly, or to fetch the status of an external system" it does not seems to be designed for APIs At the moment a more suitable tool for data collected via APIs is Dataflow https://cloud.google.com/dataflow "Google Cloud Dataflow is used as the primary ETL mechanism, extracting the data from the API Endpoints specified by the customer, which is then transformed into the required format and pushed into BigQuery, Cloud Storage and Pub/Sub." https://www.onixnet.com/insights/gcp-101-an-introduction-to-google-cloud-platform

So in the next weeks I'll be looking at Data Flow.

2 Answers2

0

Can you please attach the complete logs of the preview run? Make sure to redact any PII data. Also what is the version of CDF you are using? Is CDF instance private or public?

Thanks and Regards,

Sagar

Sagar Kapare
  • 196
  • 4
  • I can't attach the logs, as I deleted the instance a few hours ago. I was just staying there with no activity and I saw the billing going up by $40 per day :) I just updated the description with some info from GCP team. – Pavel Petrov Dec 03 '20 at 14:07
0

Did you end up using Dataflow?

I am also experiencing the same issue with the HTTP plugin, but my temporary way to go around it was to use a cloud scheduler to periodically trigger a cloud function that fetches my data from the API and exports them as a JSON to GCS, which can then be accessed by Data Fusion.

My solution is of course non-ideal, so I am still looking for a way to use the Data Fusion HTTP plugin. I was able to make it work to get sample data from public API end-points, but for a reason still unknown to me I can't get it to work for my actual API.

Khawaja Asim
  • 1,327
  • 18
  • 38
  • No, I'm not using Dataflow. However, Dataflow led me to the Jupiter notebooks or Colab notebooks in Google. And I actually started using them. The problem with replicating data from Zoho CRM wasn't solved that way, though. For that I saw that cData offer replication from Zoho to MySQL and I took that. It was before I discover the notebooks and cloud functions. Now I have set colabs to load the initial data from databases to BigQuery and then Cloud Functions to append daily changes, where applicable. That was way easier, stable and more cost efficient in my case. – Pavel Petrov Feb 03 '21 at 07:05