Scio is a Scala API for Google Cloud Dataflow and Apache Beam inspired by Spark and Scalding.
Questions tagged [spotify-scio]
80 questions
1
vote
3 answers
How do I deploy an Apache Beam/Spotify Scio Pipeline?
I've created a Pipeline using the Scio wrapper for Apache Beam. I want to deploy it in Google Dataflow.
I want for there to be a specific button or endpoint or Function that will execute this job regularly.
All of the instructions I can find…

Gabriel Curio
- 613
- 4
- 7
1
vote
1 answer
What to pass as arguments while creating scioContext using ContextAndArgs in Scio Spotify
I am new to Scio and was trying to learn more about it.
I saw some examples in the Scio source code and wanted to run it. But it asks for some argument which I am unaware and are not specified in Docs.
val (sc, args) =…

Ankit Agrahari
- 13
- 3
1
vote
1 answer
How does Scio fallback to Kryo
I see that Scio fallsback to Kryo coder rather than Java Serializer which is default coder used for Dataflow when coder cannot be inferred/found via CoderRegistry. I don't see any reference to setFallbackCoderProvider anywhere, how does Scio…

user_1357
- 7,766
- 13
- 63
- 106
1
vote
1 answer
Scio/Apache beam, how to map grouped results
I have a simple pipeline that reads from pubsub within a fixed window, parses messages and groups them by a specific property. However if I map after the groupBy my function doesn't seem to get executed.
Am I missing…

Fabio Epifani
- 151
- 8
1
vote
1 answer
Scio saveAsTypedBigQuery write to a partition for SCollection of Typed Big Query case class
I'm trying to write a SCollection to a partition in Big Query using:
import java.time.LocalDate
import java.time.format.DateTimeFormatter
val date = LocateDate.parse("2017-06-21")
val col =…

Andrew Cassidy
- 2,940
- 1
- 22
- 46
1
vote
1 answer
scio type safe BigQuery classes - annotation issue
I am trying to use type safe BigQuery classes. I have also installed intellij scio plugin. But i get the error,
Error:(37, 21) type arguments [RowElement] do not conform to method typedBigQuery's type parameter bounds [T <:…

Akhilesh
- 11
- 2
1
vote
1 answer
Why in Scio do you prefer aggregate over groupByKey?
From:
https://github.com/spotify/scio/wiki/Scio-data-guideline
"Prefer combine/aggregate/reduce transforms over groupByKey. Keep in mind that a reduce operation must be associative and commutative."
Why in particular would one prefer an aggregate…

Andrew Cassidy
- 2,940
- 1
- 22
- 46
1
vote
1 answer
Scio JobTest, PubSubIO, pubsubSubscriptionWithAttributes, timestampAttribute and Windowing issue
I'm building a pipeline to backup data from PubSub into GCS and wanted to create a test using JobTest and I'm struggling to get the PubSubIO to properly get the event time.
PubSub is read using…

Carlos
- 2,883
- 2
- 18
- 19
1
vote
1 answer
Scio testing not accessed counters
I'm building some tests around my pipeline and particularly I have two branches (one where errors are considered, another where successes), on the errors side I have an incrementing counter (ScioMetrics.counter("MetricName").inc()) and when building…

Carlos
- 2,883
- 2
- 18
- 19
1
vote
1 answer
Establishing singleton connection with Google Cloud Bigtable in Scala similar to Cassandra
I am trying to implement a real-time recommendation system using the Google Cloud Services. I've already build the engine using Kafka, Apache Storm and Cassandra but I want to create the same engine in Scala using Cloud Pub/Sub, Cloud Dataflow and…

billiout
- 695
- 1
- 8
- 22
1
vote
0 answers
Simple Dataflow job stuck when using Scio and BigQuery
(4ea13f859044f090): Workflow failed. Causes: (4ea13f859044f04d): The Dataflow appears to be stuck. Please reach out to the Dataflow team at http://stackoverflow.com/questions/tagged/google-cloud-dataflow.
I have no idea why my job failed, is there…

Gösta Forsum
- 61
- 3
1
vote
1 answer
process XML files with Spotify Scio (scala wrapper for apache beam)
Apache beam java sdk supports reading large xml input files, with org.apache.beam.sdk.io.xml.XmlIO (i looked at 2.1.0 version)
Does anyone know if Scio allows you to do the same or have an example? I have a set of very large xml files that i'd like…

ASingh
- 53
- 5
1
vote
1 answer
Read file in order in Google Cloud Dataflow
I'm using Spotify Scio to read logs that are exported from Stackdriver to Google Cloud Storage. They are JSON files where every line is a single entry. Looking at the worker logs it seems like the file is split into chunks, which are then read in…

Idrees Khan
- 107
- 2
- 12
0
votes
0 answers
Deploy Scio jobs with custom containers
I would like to deploy a Scio Job on Dataflow with custom containers https://beam.apache.org/documentation/runtime/environments/
Is it possible to package the app with sbt-pack for example, and extend the base image to build an image with all the…
0
votes
0 answers
SCIO DataFlow: Error message from worker: java.lang.ClassCastException: class cannot be cast to class org.apache.avro.generic.IndexedRecord
tried with scio 0.12.5 both with beamVersion = "2.45.0" and "2.46.0" and scio scio 0.12.8 (beam "2.46.0") and reading from BigQuery (TypeSafe Annotations https://spotify.github.io/scio//io/BigQuery.html#type-annotations ). Java version 11.0.17…

Alberto Serna
- 1
- 3