0

I am able to read delta table which is created in Amazon S3 using standalone api, but unable to create delta table and insert data to it. In below link for delta lake it is mentioned to use Zappy reader and writer which is fictitious and used as reference.

I tried using avro parquet writer but ran into issues in getting all the data needed for AddFile object, Could you please share any example for writer which can be used in scala and how to commit meta data to delta table?

https://docs.delta.io/latest/delta-standalone.html#-azure-blob-storage

ZappyDataFrame correctedSaleIdToTotalCost = ...;
ZappyDataFrame invalidSales = ZappyReader.readParquet(filteredFiles);
ZappyDataFrame correctedSales = invalidSales.join(correctedSaleIdToTotalCost, "id")

ZappyWriteResult dataWriteResult = ZappyWritter.writeParquet("/data/sales", correctedSales);

"Please note that this example uses a fictitious, non-Spark engine Zappy to write the actual parquet data, as Delta Standalone does not provide any data-writing APIs. Instead, Delta Standalone Writer lets you commit metadata to the Delta log after you’ve written your data"

Dependencies used - pom.xml


 <dependency>
      <groupId>org.scala-lang</groupId>
      <artifactId>scala-library</artifactId>
      <version>2.12.12</version>
  </dependency>

 <dependency>
      <groupId>io.delta</groupId>
      <artifactId>delta-standalone_2.12</artifactId>
      <version>0.5.0</version>
    </dependency>
    <dependency>
      <groupId>org.apache.hadoop</groupId>
      <artifactId>hadoop-client</artifactId>
      <version>3.3.1</version>
    </dependency>

I tried using avro parquet writer but ran into issues in getting all the data needed for AddFile object, Could you please share any example for writer which can be used in scala and how to commit meta data to delta table?

Venom
  • 1
  • 1

1 Answers1

0

As far as I can tell Delta standalone doesn't appear to support the actual creation of a Delta Table. From the documentation, we can see they state that:

The Delta Standalone library is a single-node Java library that can be used to read from and write to Delta tables

This is a real shame and rather limits the functionality of the standanlone library. For example, if we look at the supported Hive connector, we can see the following in the README file:

Right now the connector supports only EXTERNAL Hive tables. The Delta table must be created using Spark before an external Hive table can reference it.

Having said that, you can still use DeltaTable to create a table from existing Parquet files and then use Delta standalone to further interact with the table.

delta-rs the Rust implementation has full support for writing Delta Tables and interacting with them.

Ahmed Riza
  • 13
  • 1
  • 5