1

I am trying to stream twitter data into a Big Query table using GCP Data Fusion. I've added my twitter credentials to the twitter component and validate with no errors. I also validate the Big Query component with no errors. When I run the preview it stops after around 30 seconds and I get the following error:

java.lang.NoClassDefFoundError: org/apache/spark/Logging

Here is an image of my Data Fusion job

enter image description here

Any help would be greatly appreciated.

12/15/2020: Adding some basic information about my Data Fusion Instance

Logs: https://pastebin.com/PxKpqfCp

  • Hello @John Grieco, What is your current Data Fusion instance version? Can you share the full Java stack trace that might help evaluating the error? – Nick_Kh Dec 14 '20 at 12:26
  • @Nick_Kh I've added some basic information about my Data Fusion Instance. Let me know if this helps or if you need more. Thanks! – John Grieco Dec 16 '20 at 05:20
  • Have you switched pipeline mode from batch processing to realtime? From the error stack trace I saw that problem occurred with launching Spark Streaming engine. Its strange that I'm not able to find Twitter plugin throughout the batch source plugins, furthermore the plugin name is showing Twitter Tweet Stream in realtime mode with the same developer edition instance version. – Nick_Kh Dec 17 '20 at 09:15
  • @Nick_Kh Yes, the pipeline mode is realtime. I am unable to add the Twitter component without being in realtime mode. Any other thoughts? – John Grieco Dec 18 '20 at 20:42

1 Answers1

0

Seems to be well known to developers issue, looking through the CDAP issue tracker, I've found the relevant thread PLUGIN-194 describing the same problem in particular affecting Twitter Tweet Stream plugin functionality. Let's track out this case for any further updates occurred.

Nick_Kh
  • 5,089
  • 2
  • 10
  • 16