Questions tagged [flinkml]

FlinkML is the machine learning library for the Apache Flink distributed streaming engine.

FlinkML is the Machine Learning (ML) library for Flink. It is a new effort in the Flink community, with a growing list of algorithms and contributors. FlinkML aims to provide scalable ML algorithms, an intuitive API, and tools that help minimize glue code in end-to-end ML systems.

Getting Started

If you want to jump right in, you have to set up a Flink program. Next, you have to add the FlinkML dependency to the pom.xml of your project.

<dependency>
  <groupId>org.apache.flink</groupId>
  <artifactId>flink-ml</artifactId>
  <version>1.0-SNAPSHOT</version>
</dependency>

Now you can start solving your analysis task. The following code snippet shows how easy it is to train a multiple linear regression model.

// LabeledVector is a feature vector with a label (class or real value)
val trainingData: DataSet[LabeledVector] = ...
val testingData: DataSet[Vector] = ...

val mlr = MultipleLinearRegression()
  .setStepsize(1.0)
  .setIterations(100)
  .setConvergenceThreshold(0.001)

mlr.fit(trainingData, parameters)

// The fitted model can now be used to make predictions
val predictions: DataSet[LabeledVector] = mlr.predict(testingData)

Learn more about FlinkML here.

34 questions
1
vote
1 answer

Using Flink window and fold function, element missing?

When I try to aggregate elements using window and fold function, some of the elements are missed from getting aggregated. Consuming elements from Kafka (value:0, value:1, value:2, value:3) and aggregating them as odd and even values. Output is:…
Sharath
  • 407
  • 5
  • 16
1
vote
0 answers

Trigger execution model LinearRegression in Flink -> Slower than Spark?

I've develop a Multiple Linear Regression and Kmeans in both Spark and Flink to compare their performance in batch (I'm using Zeppelin to programming and execute, and Ganglia to measure). I read in the answer of this post that I've to trigger the…
Borja
  • 194
  • 1
  • 3
  • 17
1
vote
1 answer

Is there a Apache Flink machine learning tutorial in Java language

I am in search of a tutorial that tells us to setup a basic apache flink machine learning. Current available material is in scala language.
Narendra Pandey
  • 514
  • 4
  • 11
  • 26
1
vote
2 answers

FlinkML 0.10.1 Multiple Linear Regression with Sparse Vectors for Training

All, I'm trying to test out Flink ML 0.10.1 by doing a linear regression as described here: https://ci.apache.org/projects/flink/flink-docs-master/libs/ml/multiple_linear_regression.html I'm using SparseVectors instead of DenseVector, but…
0
votes
0 answers

Flink-ML shows "Failed to fetch next result"

I am totally new to flink and when i was trying the flink-ML by following docs. So, when I entered $FLINK_HOME/bin/flink run -c org.apache.flink.ml.examples.clustering.KMeansExample $FLINK_HOME/lib/flink-ml-examples*.jar after looking in the…
0
votes
1 answer

Apply anomaly detection on Flink sliding windows

I am new to Flink, so I hope what I am saying makes sense. I would like to apply sliding windows to a DataStream, and then for each of those Windows to perform anomaly detection, using FlinkML or maybe FlinkCEP (in fact I want to use both). My…
0
votes
1 answer

Flink ML DenseVector API missing functionality

I’m new to Flink(and to Java) and I come from ML/DS background, so decided to implement something related to what I know - a linear regression learner. For that I figured I’d use DenseVector primitives available in flink.ml.*. This is where I’m…
drsealks
  • 2,282
  • 1
  • 17
  • 34
0
votes
2 answers

Installing FlinkML DenseVector dependency - why are there two different implementations?

I'm a bit confused as to how to install the dependencies I actually need. I'm new to both Java and Flink, and I think I'm missing something minor here. I'm doing a basic exercise where I need the DenseVector class, that supports basic mathematical…
drsealks
  • 2,282
  • 1
  • 17
  • 34
0
votes
1 answer

Can I extract the Linear SVC model coefficient and intercept in Apache Flink ML?

I have trained a Linear SVC model using Flink ML library. I wish to extract the SVM hyperplane so I can use the rules in Pattern Matching API of Flink CEP. This is possible when using the sklearn library in python but is there a way to extract the…
Ashok Arora
  • 531
  • 1
  • 6
  • 17
0
votes
3 answers

Using a pre-trained ML model in Apache Flink

I am new to Flink and am trying to use a pre-trained classifier in Flink to detect Hate Speech on Twitter. I have an SVM classifier that I trained on Python, but I have no idea how to use it in the Flink code. One of the posts here talks about Async…
Vishnu Prasad
  • 73
  • 1
  • 9
0
votes
1 answer

ALS Real-Time Recommendation Apache Flink

I want to implement a real-time recommendation on Apache Flink with the ALS algorithm. The model can be previously trained ready with Batch and then just loaded into Flink. Then an input stream of Data should be processed and used for the…
jkempter
  • 3
  • 2
0
votes
1 answer

I got an error for flink k8s ha. job 00000000000000000000000000000000 is not in state RUNNING but SCHEDULED instead. Aborting checkpoint

When I apply flink job to k8s zookeeper ha, I get below error. Our structure is job cluster. 1 job and 1 task. We want to implement while we delete job pod the task still can continue work. job 00000000000000000000000000000000 is not in state…
Jeff
  • 117
  • 10
0
votes
1 answer

how can I get job submitting time and use it in Flink application?

I'm currently developing a stream processing application, one of the functionality is to take events that happen in the time zone [time of submitting the job, time of submitting the job + T ]. how can access to that particular variable (time of…
0
votes
1 answer

Apache Flink - svm predictions on streaming data

I am using Apache Flink to predict streams from Twitter. Code is implemented in Scala My Problem is, that my trained SVM-Model from the DataSet API needs a DataSet as an input for the predict()-Method. I saw already a Question here, where a user…
IboJaan
  • 63
  • 7
0
votes
1 answer

Flink: ERROR parse numeric value format

I'm trying to develop a K-means model in Flink (Scala), using Zeppelin. This is part of my simple code: //Reading data val mapped : DataSet[Vector] = data.map {x => DenseVector (x._1,x._2) } //Create algorithm val knn = KNN() .setK(3) …
Borja
  • 194
  • 1
  • 3
  • 17