Questions tagged [beam-sql]

BeamSQL is built on top of Apache Beam Java SDK, as a relational API for unified batch and streaming data processing.

BeamSQL is built on top of Apache Beam Java SDK, as a relational API for unified batch and streaming data processing.

BeamSQL features

  1. Connect heterogeneous storage systems - access data from different systems with ease.
  2. Pure SQL pipelines in SQL shell - lower the barrier to write data processing pipelines.
  3. Embedded SQL in pipelines - more flexibility and productivity.
  4. Unified bath and streaming semantics - towards one SQL for batch, streaming and mixed use cases.

Resources

45 questions
0
votes
1 answer

How do I use Apache Beam to trigger an aggregation based on a new incoming event?

Problem: I'm building a mobile game app with real-time scoring. Each time a player performs an action, it sends a message to Pub/Sub with the following keys: {"event_ts","event_id", "player_id", "score"} As soon as the Pub/Sub message is received, I…
Sid
  • 1
  • 1
0
votes
2 answers

Exception while writing multipart empty csv file from Apache Beam into netApp Storage Grid

Problem Statement We are consuming multiple csv files into pcollections -> Apply beam SQL to transform data -> write resulted pcollection. This is working absolutely fine if we have some data in all the source pCollections and beam SQL generates new…
0
votes
1 answer

Dataflow / Beam Accumulator coder

I am developing a Dataflow pipeline that uses the SqlTransform Library and also the beam aggregation function defined in org.apache.beam.sdk.extensions.sql.impl.transform.agg.CountIf . Here a slide of code: import…
0
votes
1 answer

How to output nested Row from Beam SQL (SqlTransform)?

I want to have Row with nested Row from output of Beam SQL (SqlTransform), but failing. Questions: What is the proper way to output Row with nested Row from SqlTransform? (Row type is described in the docs, so I believe it's supported) If this is a…
0
votes
2 answers

TypeError: expected bytes, str found [while running 'Writing to DB/ParDo(_WriteToRelationalDBFn) while writing to db from using beam-nuggets

@mohaseeb I am trying below example to write data from pub\sub to postgresql.Getting below error while writing pub\sub data into postgresql. "/usr/local/lib/python3.7/site-packages/sqlalchemy/engine/result.py", line 545, in…
0
votes
1 answer

How to cast int to boolean when doing SQL transform in Apache Beam

I'm trying to do a SQL transform with Apache Beam using Calcite SQL syntax. I'm doing an int to boolean cast. My sql looks like this: ,CASE WHEN cast(IsService as BOOLEAN) THEN CASE WHEN IsEligible THEN 1 ELSE 0 END ELSE NULL END AS Reported Where…
artofdoe
  • 167
  • 2
  • 14
0
votes
2 answers

How to specify BeamSQL UDF for Numeric Types

I'm trying to add a User Defined Function (UDF) to a SqlTransform in a Beam pipeline, and the SQL parser doesn't seem to understand the function's type. The error i get is: No match found for function signature IF(, ,…
Mark P Neyer
  • 1,009
  • 2
  • 8
  • 19
0
votes
1 answer

How to integrate Beam SQL windowing query with KafkaIO?

First, we have a kafka input source in JSON format: {"event_time": "2020-08-23 18:36:10", "word": "apple", "cnt": 1} {"event_time": "2020-08-23 18:36:20", "word": "banana", "cnt": 1} {"event_time": "2020-08-23 18:37:30", "word": "apple", "cnt":…
0
votes
0 answers

Beam SQL CURRENT_TIMESTAMP

My Unix Spark Server timezone is CDT but when I'm running Beam SQL CURRENT_TIMESTAMP as below it is always coming as UTC. I tried locally also but it is always displaying UTC. I want this to be CDT same as server zone in CURRENT_TIMESTAMP function.…
Syed Mohammed Mehdi
  • 183
  • 2
  • 5
  • 15
0
votes
2 answers

How to select a set of fields from input data as an array of repeated fields in beam SQL

Problem Statement: I have an input PCollection with following fields: { firstname_1, lastname_1, dob, firstname_2, lastname_2, firstname_3, lastname_3, } then I execute a Beam SQL operation such that output of…
0
votes
1 answer

row_number in Apache Beam SQL

I'm trying to generate row_number using Apache Beam SQL with below code: PCollection rwrtg = PCollectionTuple.of(new TupleTag<>("trrtg"), rrtg) .apply(SqlTransform.query("select appId, row_number() over…
Syed Mohammed Mehdi
  • 183
  • 2
  • 5
  • 15
0
votes
1 answer

How can I increase the thread stack size on Apache Beam pipeline workers with Google Cloud Dataflow?

I'm getting a StackOverflowError on my Beam workers due to running out the thread stack, and because it's deep within the running of a SqlTransform it's not straightforward to reduce the number of calls being made. Is it possible to change the JVM…
wrp
  • 99
  • 6
0
votes
2 answers

Errors trying to start ZetaSQL planner

I'm trying to run a Beam pipeline with SQL transforms, parsed with ZetaSQL. I begin with setting options with options.setPlannerName("org.apache.beam.sdk.extensions.sql.zetasql.ZetaSQLQueryPlanner"); When I try creating my SqlTransform with any…
wrp
  • 99
  • 6
0
votes
2 answers

Apache beam get kafka data execute SQL error:Cannot call getSchema when there is no schema

I will input data of multiple tables to kafka, and beam will execute SQL after getting the data, but now there are the following errors: Exception in thread "main" java.lang.IllegalStateException: Cannot call getSchema when there is no schema …
smarctor
  • 3
  • 1
0
votes
1 answer

ZetaSQL Sample Using Apache beam

I am Facing Issues while Using ZetaSQL in Apache beam Framework (2.17.0-SNAPSHOT). After Going through documentation of the apache beam I am not able to find any sample for ZetaSQL. I tried to add the Planner: …
BackBenChers
  • 304
  • 2
  • 15