Questions tagged [beam-sql]

BeamSQL is built on top of Apache Beam Java SDK, as a relational API for unified batch and streaming data processing.

BeamSQL is built on top of Apache Beam Java SDK, as a relational API for unified batch and streaming data processing.

BeamSQL features

  1. Connect heterogeneous storage systems - access data from different systems with ease.
  2. Pure SQL pipelines in SQL shell - lower the barrier to write data processing pipelines.
  3. Embedded SQL in pipelines - more flexibility and productivity.
  4. Unified bath and streaming semantics - towards one SQL for batch, streaming and mixed use cases.

Resources

45 questions
0
votes
1 answer

How can we use row_number() in apache beam sql

I tried that but getting following error. eg: SELECT RELEASE_ORDER_KEY,ORDER_LINE_KEY,ORDER_HEADER_KEY,ROW_NUMBER() OVER (PARTITION BY ORDER_LINE_KEY ORDER BY RELEASE_ORDER_KEY) row_num FROM OMS_DATALAKE_ORDER_RELEASE_TAX ORDER BY ORDER_LINE_KEY…
0
votes
3 answers

Delete Big query table using Apache Beam java

Is it possible to delete a table available in bigQuery using Apache beam using Java? p.apply("Delete Table name", BigQueryIO.readTableRows().fromQuery("DELETE FROM Table_name where condition"));
0
votes
1 answer

Apache calcite: cast integer to datetime

I am using Beam SQL and trying to cast integer to datetime field. Schema resultSchema = Schema.builder() .addInt64Field("detectedCount") .addStringField("sensor") .addInt64Field("timestamp") .build(); …
Akshata
  • 1,005
  • 2
  • 12
  • 22
0
votes
1 answer

How to remove duplicates in sliding window - Apache Beam

I have implemented a data pipeline with multiple unbounded sources & side inputs, join data with sliding window (30s & every 10s) and emit the transformed output into a Kafka Topic. The issue i have is, the data received in the first 10 seconds of…
Gowtham
  • 87
  • 1
  • 14
0
votes
3 answers

Apache Beam SQLTransform: Cannot call getSchema when there is no schema

I am trying to apply SQLTransform on a PCollection. Here, CustomSource transform generates a Pojo at runtime.Hence, the type of the Object on which the SQLTransform runs is not known at compile time. Pipeline p =…
Akshata
  • 1,005
  • 2
  • 12
  • 22
0
votes
1 answer

Beam SQL - SqlValidatorException: Object 'PCOLLECTION' not found

I am doing some experiments with Beam SQL. I get a PCollection from the transform SampleSource and pass its output to a SqlTransform. String sql1 = "select c1, c2, c3 from PCOLLECTION where c1 > 1"; The code below runs without any…
Akshata
  • 1,005
  • 2
  • 12
  • 22
0
votes
2 answers

What is the equivalent Data type for Numeric in apache.beam.sdk.schemas.Schema.FieldType

Trying to write the data into BigQuery table using BeamSQL. To write the data we need schema of that data. Used org.apache.beam.sdk.schemas for defining schema of the data collection. We have Numeric data type column in that data collection. Want to…
lourdu rajan
  • 329
  • 1
  • 5
  • 24
0
votes
1 answer

Unnest the nested PCollection using BeamSQL

Try to use BeamSQL for unnest the nested type of PCollection. Lets assume the PCollection which have the Employees and its details. Here details are in nested collection. So if we use the BeamSQL like "SELECT PCOLLECTION.details FROM PCOLLECTION"…
lourdu rajan
  • 329
  • 1
  • 5
  • 24
0
votes
1 answer

Can't call `ApproximateDistinct.ApproximateDistinctFn` from ApacheBeam sql

Trying to use aggregate function ApproximateDistinct.ApproximateDistinctFn from apache beam sql, this failed. my SQL: SELECT ApproximateDistinct(user_id) as distinct_count, profile, country_code, FROM PCOLLECTION GROUP BY…
Brachi
  • 637
  • 9
  • 17
0
votes
1 answer

RexCall cannot be cast to RexInputRef exception in Apache Beam SQL

I'm trying to do a simple join using Beam SQL but I'm getting an exception while compilation: Exception in thread "main" java.lang.ClassCastException: org.apache.beam.repackaged.beam_sdks_java_extensions_sql.org.apache.calcite.rex.RexCall cannot be…
rish0097
  • 1,024
  • 2
  • 18
  • 39
0
votes
1 answer

Apache beam SqlTransforms schema issue

I'm trying to perform ETL which involves loading files from HDFS, apply transforms and write them to Hive. While using SqlTransforms for performing transformations by following this doc, I'm encountering below issue. Can you please…
Bluecrow
  • 559
  • 1
  • 4
  • 13
0
votes
2 answers

How does Calcite deal with data conversion?

I am trying to convert a date that's stored as a string to a date, e.g. YYYYMMDD (string) to YYYY-MM-DD (date) As far as I know there is no conversion function that checks input format and output format, I tried manual logic, e.g. CASE WHEN…
Agni
  • 19
  • 1
  • 4
0
votes
1 answer

Beam SQL won't work when using aggregation in statement: "Cannot plan execution"

I have a basic Beam pipeline that reads from GCS, does a Beam SQL transform and writes the results to BigQuery. When I don't do any aggregation in my SQL statement it works fine: .. PCollection outputStream = sqlRows.apply( …
Graham Polley
  • 14,393
  • 4
  • 44
  • 80
0
votes
1 answer

Beam SQL / Apache Beam is Slower when Running Multiple Joins

While doing joins on 2 tables using Beam SQL then it's working properly provide expected performance but as my Joining Tables increases then the performance become worst. Below is my snippet which might help you to debug my Joining condition in Beam…
0
votes
1 answer

Is there a work around for 'LIKE' in BeamSQL?

We have an Apache Beam 2.4.0 pipeline that runs BeamSql queries. In BeamSql the SQL statement 'LIKE' throws an exception 'LIKE is not implemented yet'. Is there a work around for 'LIKE' in BeamSql? We need to be able to perform wildcard queries on…
Anna Kasikova
  • 37
  • 1
  • 4
1 2
3