Questions tagged [gobblin]

Apache Gobblin is a distributed data integration framework. It simplifies common aspects of big data integration such as data ingestion, replication, organization and lifecycle management for both streaming and batch data ecosystems.

44 questions
1
vote
1 answer

Apache gobblin build failed

I'm new to gobblin. I try to build a distribution using master branch of the project. I'm getting bellow error while following the instruction. FAILURE: Build failed with an exception. * Where: Script…
GihanDB
  • 591
  • 2
  • 6
  • 23
1
vote
1 answer

Gobblin ERROR: Unable to convert field:derivedwatermarkcolumn for value:"abc" for record:

I am tring to ingest data from mysql table to hdfs. but it is giving me below error IST ERROR [TaskExecutor-0] org.apache.gobblin.runtime.Task [demo_user_1582873318919_0] 504 - Processing record incurs an unexpected…
Chhaya Vankhede
  • 316
  • 2
  • 14
1
vote
1 answer

Gobblin: java.lang.ClassNotFoundException: org.apache.gobblin.source.extractor.extract.jdbc.MysqlSource

I am trying mysql to hdfs data ingestion using gobblin. While running mysql-to-gobblin.pull using steps below: 1) start hadoop: sbin\start-all.cmd 2) start mysql service: sudo service mysql start 3) set GOBBLIN_WORK_DIR: export…
Chhaya Vankhede
  • 316
  • 2
  • 14
1
vote
1 answer

Error with KafkaHDFS example: java.lang.NoSuchMethodError

I have trouble trying out the Kafka-HDFS data ingestion example . I have tried both 0.10.0 and 0.14.0 version. For the 0.10.0 version i use the ready distribution and for the 0.14.0 version i made a build by myself following the instructions in the…
1
vote
0 answers

Gobblin MapReduce convert from protobuf to Parquet

Trying to find an example of how to convert protobuf messages to parquet using Gobblin. Unable to find any. Scenario: - Kafka messages are in Protobuf - Gobblin Consumer: consumes protobuf from kafka and writes them as parquet into HDFS Gobblin…
Pritam
  • 929
  • 1
  • 7
  • 16
1
vote
0 answers

Kafka to kafka using Gobblin behind krb5 security

Everything works if run a simple job with kafka to kafka without kerberos security. I need do same but behind kerberos security. Take a look at my job code below: job.name=Kafka2KafkaSimple job.group=Kafka job.description=This is a job that runs…
Bruno Wego
  • 2,099
  • 3
  • 21
  • 38
1
vote
2 answers

Gobblin Kafka to HDFS gobblin-api-***.jar FileNotFoundException

I want to collect kafka message and store it in hdfs by gobblin, when i run the gobblin-mapreduce.sh, the script throws a exception: 2017-10-19 11:49:18 CST ERROR [main] gobblin.runtime.AbstractJobLauncher 442 - Failed to launch and run job…
user1978965
  • 99
  • 1
  • 9
1
vote
2 answers

Gobblin QuickStart sample exception:ClassNotFoundException: org.apache.gobblin.example.wikipedia.WikipediaSource

I'm learning gobblin following the quickstart , sub section "Running Gobblin as a Daemon". I do it step by step as the guide: create config dir and set the environment variable GOBBLIN_JOB_CONFIG_DIR, and put wikipedia.pull in it; create work dir…
user1978965
  • 99
  • 1
  • 9
1
vote
0 answers

Gobblin grouping workunits for Kafka source

In https://gobblin.readthedocs.io/en/latest/case-studies/Kafka-HDFS-Ingestion/#grouping-workunits section of Gobblin documentation we can read about Single-level packing with following desc The single-level packer uses a worst-fit-decreasing…
Purple
  • 711
  • 2
  • 10
  • 19
1
vote
1 answer

Gobblin - how to get post from Facebook

I have been investigating Gobblin for awhile and currently I am experiencing difficulties in using Gobblin to get post from Facebook. I could not find any connection example on the internet or I may have searched it wrongly. I am looking at…
Leo
  • 265
  • 1
  • 4
  • 18
1
vote
1 answer

Gobblin Kafka to HDFS pull job error

I'm trying to pull data from Kafka to HDFS using Gobblin. Gobblin version (compiled from github source code with command sudo ./gradlew clean build -PuseHadoop2 -PhadoopVersion=2.7.1 -x test): 0.6.2-546-g431188b Hadoop version: Hadoop…
Dmitry
  • 123
  • 1
  • 6
1
vote
1 answer

How do I use Java to read AVRO data in Spark 1.3.1?

I am trying to develop a Java Spark Application that reads AVRO records (https://avro.apache.org/) from HDFS put there by a technology called Gobblin (https://github.com/linkedin/gobblin/wiki). A sample HDFS AVRO data…
Mark
  • 66
  • 1
  • 5
0
votes
1 answer

No AbstractFileSystem configured for scheme: gs

I am getting below error while running a gobblin job. My core-site.xml looks fine and it has the required value. core-site.xml fs.AbstractFileSystem.gs.impl com.google.cloud.hadoop.fs.gcs.GoogleHadoopFS
1stenjoydmoment
  • 229
  • 3
  • 14
0
votes
1 answer

How to setup gobblin in windows ? What should be the version of gradle and gobblin?

I am trying to setup gobblin in my system, but facing issue while building gradle. Which verion of gobblin and gradle do I need to use ? Error :- Caused by: org.gradle.api.plugins.UnknownPluginException: Plugin with id 'pegasus' not found.
0
votes
1 answer

Gradle sync failed : Cannot cast object 'main classesDirs' with class 'org.gradle.api.internal.file.collections.DefaultConfigurableFileCollection'

I am facing below while building gradle. I am using gradle 6.5 and gobblin apache-gobblin-incubating-sources-0.14.0 version. I have added build.gradle file and idesSetup.gradle…