3

I'm new to Go and trying to read a table from BigQuery and publish as messages using PubSub. I searched online and came up with the below code.

package main

import (
    "context"
    "flag"
    "reflect"

    "github.com/apache/beam/sdks/v2/go/pkg/beam"
    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/bigqueryio"
    "github.com/apache/beam/sdks/v2/go/pkg/beam/io/pubsubio"
    "github.com/apache/beam/sdks/v2/go/pkg/beam/log"
    "github.com/apache/beam/sdks/v2/go/pkg/beam/options/gcpopts"
    "github.com/apache/beam/sdks/v2/go/pkg/beam/x/beamx"
)

// CommentRow models 1 row of HackerNews comments.
type CommentRow struct {
    Text string `bigquery:"text"`
}

const query = `SELECT text
FROM ` + "`bigquery-public-data.hacker_news.comments`" + `
WHERE time_ts BETWEEN '2013-01-01' AND '2014-01-01' and text IS NOT NULL
LIMIT 1000
`

func main() {
    flag.Parse()
    beam.Init()

    ctx := context.Background()
    p := beam.NewPipeline()
    s := p.Root()
    project := gcpopts.GetProject(ctx)

    // Build a PCollection<CommentRow> by querying BigQuery.
    rows := bigqueryio.Query(s, project, query,
        reflect.TypeOf(CommentRow{}), bigqueryio.UseStandardSQL())

    pc := beam.ParDo(s, func(row CommentRow, emit func(CommentRow)) {
        emit(row)
    }, rows)

    pubsubio.Write(s, project, "projects/PROJECTNAME/topics/test-topic", pc)

    if err := beamx.Run(ctx, p); err != nil {
        log.Exitf(ctx, "Failed to execute job: %v", err)
    }
}

But I get the below error message.

panic: pubsubio.Write only accepts PCollections of *pubsub.PubsubMessage and []uint8, received main.CommentRow

How can I covert the PCollection to type as PubsubMessage? I couldn't find much information about this.

My use case is to read multiple columns from a BigQuery table and publish the contents to a PubSub topic.

Ashok KS
  • 659
  • 5
  • 21

1 Answers1

3

A pubsub message is basically this

message PubsubMessage {
  bytes data = 1;
  map<string, string> attributes = 2;
  string message_id = 3;
  google.protobuf.Timestamp publish_time = 4;
}

as defined in the proto.

You can pass in PCollection of []byte type to pubsubio.Write() and it will wrap it into PubsubMessage type (Doc). It uses this DoFn to do it.

  • 2
    Just to elaborate: the model when something leaves the pipeline is that the user is responsible for the encoding/decoding. It's better than relying on internal Beam details in the durable storage. – Robert Burke Dec 12 '22 at 18:03
  • Sorry for the late response on this. How can I convert the PCollection to []byte type? – Ashok KS Feb 01 '23 at 06:45
  • You can use [json.Marshal](https://pkg.go.dev/encoding/json#Marshal) in a ParDo – Ritesh Ghorse Feb 01 '23 at 21:00