0

I am trying to convert my Json file to Parquet format.

Following is my pom file.

<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>
    <groupId>com.mypackage</groupId>
    <artifactId>JSONToParquet</artifactId>
    <version>1.0-SNAPSHOT</version>
    <packaging>jar</packaging>

    <repositories>
        <repository>
            <id>wso2</id>
            <url>http://dist.wso2.org/maven2/</url>
        </repository>
    </repositories>
    <dependencies>
        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-data-core</artifactId>
            <version>1.1.0</version>
        </dependency>

        <dependency>
            <groupId>org.kitesdk</groupId>
            <artifactId>kite-morphlines-all</artifactId>
            <version>1.0.0</version> <!-- or whatever the latest version is -->
            <type>pom</type>
        </dependency>

        <!-- https://mvnrepository.com/artifact/ua_parser/ua-parser -->
        <dependency>
            <groupId>ua_parser</groupId>
            <artifactId>ua-parser</artifactId>
            <version>1.3.0</version>
            <type>pom</type>
        </dependency>

    </dependencies>

    <properties>
        <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
        <maven.compiler.source>1.8</maven.compiler.source>
        <maven.compiler.target>1.8</maven.compiler.target>
    </properties>


</project>

Following is the code for conversion :

Schema jsonSchema = JsonUtil.inferSchema(inputstream, "Movie", 10);
try (JSONFileReader<Movie> reader = new JSONFileReader<>(
        inputstream, jsonSchema, Movie.class)) {

    reader.initialize();

    ParquetWriter parquetWriter
            = new AvroParquetWriter(outputPath, jsonSchema, compressionCodecName, ParquetWriter.DEFAULT_BLOCK_SIZE, ParquetWriter.DEFAULT_PAGE_SIZE);

    for (Movie record : reader) {
        parquetWriter.write(record);
    }

In the above code Movie is my POJO class.

When I run the program I am facing the following exception :

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/hadoop/mapreduce/RecordReader
    at com.mypackage.jsontoparquet.JsonToParquet.main(JsonToParquet.java:34)
Caused by: java.lang.ClassNotFoundException: org.apache.hadoop.mapreduce.RecordReader
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:338)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
    ... 1 more

I am using JDK : 8.

I don't have any background of hadoop, so I am unable to understand it's root cause.

What is the issue ?

Shivkumar Mallesappa
  • 2,875
  • 7
  • 41
  • 68

2 Answers2

2

Based on Kite-SDK Documentation, JSONFileReader,ParquetWriter and AvroParquetWriter use Hadoop libraries to work. It is needed to add Hadoop dependencies in your pom. You need at least below dependencies. Add them in your pom.xml:

<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-core</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-common</artifactId>
    <version>2.6.0</version>
</dependency>
<dependency>
    <groupId>org.apache.hadoop</groupId>
    <artifactId>hadoop-mapreduce-client-jobclient</artifactId>
    <version>2.6.0</version>
</dependency>
Mobin Ranjbar
  • 1,320
  • 1
  • 14
  • 24
0

Your kite is missing hadoop dependencies

there are some cases where you may have to provide the relevant Hadoop component dependencies yourself, and Kite has grouping dependencies for this purpose.

For Haddop2 (default) add to your pom:

 <dependency>
   <groupId>org.kitesdk</groupId>
   <artifactId>kite-hadoop2-dependencies</artifactId>
    <version>1.0.0</version>
   <type>pom</type>
   <scope>compile</scope>
 </dependency>
Ori Marko
  • 56,308
  • 23
  • 131
  • 233