2

I am learning apache Arrow and wanted to learn more about how to create a schema and an arrow record. For this I referenced some material but so far all of them just use the primitive types for building a schema like this:`

schema := arrow.NewSchema(
    []arrow.Field{
        {Name: "f1-i32", Type: arrow.PrimitiveTypes.Int32},
        {Name: "f2-f64", Type: arrow.PrimitiveTypes.Float64},
    },
    nil,
)

There are some datatypes not present in PrimitiveTypes that I want to work with. For example, I want to use bool or decimal128. I was looking through Golang arrow library and came across file datatype.go which has all possible datatypes that I want to use. But the type here is not of type DataType which is required when constructing the schema.

So, I have the following three questions:

  1. How can I use these datatypes from datatype.go, if possible, for constructing my schema?
  2. How can I specify a precision and scale if I want to use a decimal type?
  3. An example of using extension type.
A Beginner
  • 393
  • 2
  • 12

1 Answers1

0

These data type named constants defined in the datatype.go are used already for a part of making new types that you want. Some of them are type Decimal128Type struct and type BooleanType struct if you inspect source code of these structs' ID methods, they return the constant defined in the datatype.go whose name is similar to struct's name. And these structs have already implemented the DataType interface means you can assign them to the arrow.Field.Type because that field's type is DataType.
With they I mean:
The BOOL constant defined in the datatype.go is used as type BooleanType struct's ID method's return value in datatype_fixedwidth.go.
func (t *BooleanType) ID() Type { return BOOL }
Same thing valid for the type Decimal128Type struct too.
func (*Decimal128Type) ID() Type { return DECIMAL128 }.

Methods of one of these structs to show they are implement the DataType interface:

func (*Decimal128Type) BitWidth() int
func (t *Decimal128Type) Fingerprint() string
func (*Decimal128Type) ID() Type
func (*Decimal128Type) Name() string
func (t *Decimal128Type) String() string

Those methods are for type Decimal128Type struct.
And definition of the DataType interface:

type DataType interface {
    ID() Type
    // Name is name of the data type.
    Name() string
    Fingerprint() string
}

type BooleanType struct also implements it.

Hence you can use them for the Type field of:

type Field struct {
    Name     string   // Field name
    Type     DataType // The field's data type
    Nullable bool     // Fields can be nullable
    Metadata Metadata // The field's metadata, if any
}

A demonstrative example:

package main

import (
    "fmt"

    "github.com/apache/arrow/go/arrow"
)

func main() {
    booltype :=  &arrow.BooleanType{}
    decimal128type := &arrow.Decimal128Type{Precision: 1, Scale: 1}

    schema := arrow.NewSchema(
        []arrow.Field{
            {Name: "f1-bool", Type: booltype},
            {Name: "f2-decimal128", Type: decimal128type},
        },
        nil,
    )

    fmt.Println(schema)
}

Output:

schema:
  fields: 2
    - f1-bool: type=bool
    - f2-decimal128: type=decimal(1, 1)

You can find them in the documentation.
There are also somethings which are related to the extension type.
But I am not familiar with the extension type hence I could not show an example from it. But if you are familiar with it, you can solve it easily.

  • It's true that boolean type struct has `Datatype` implemented. But I don't see the same for `Decimal128Type.` – A Beginner Jun 05 '23 at 06:57
  • @ABeginner I have updated my answer to show `type Decimal128Type struct` also implements the `DataType` interface. –  Jun 05 '23 at 07:42
  • Since your answer almost answers all of my question, I am accepting this. Thanks :) – A Beginner Jul 18 '23 at 08:34
  • @ABeginner You are welcome. Sorry, while I writing the answer I was using old version of the arrow package by mistake but anyways even in the new version, my answer is still valid. Also arrow library released some examples of the extension type in internal package which is un-importable by default in other modules but when you check its codes, you can figure it out how it works. They are here: [examples of extension type](https://github.com/apache/arrow/blob/de8df23a8cd9737b4df5bb1b68fc12a54f252d0d/go/internal/types/extension_types.go) –  Jul 18 '23 at 17:16
  • It should also possible use those ready-made extension types for your scheme. But I don't know how. –  Jul 18 '23 at 17:47