0

I am new to Flink and am trying to use a pre-trained classifier in Flink to detect Hate Speech on Twitter. I have an SVM classifier that I trained on Python, but I have no idea how to use it in the Flink code.

One of the posts here talks about Async operations, but it goes way over my head. I have also tried using PMML but am facing an issue that I have detailed in a separate question.

Are there other methods or simple examples that can help me resolve this doubt?

P.S I am using Flink in Java (not PyFlink).

Vishnu Prasad
  • 73
  • 1
  • 9

3 Answers3

1

You can check Stateful Functions which provides a connection between Python and Java.

I think the documentation is not clear enough, you can check this thread as well.

0

Implemented a solution to this problem by creating a REST API using Flask and setting up a POST method which calls the pre-trained model.

enter image description here

The server exposes the model to clients.

enter image description here

In the Flink end, I added a map function which acts as a client, sends the input as a JSON through the post method to my server, and receives the response, I.e. the prediction.

enter image description here

Worked splendidly!

Vishnu Prasad
  • 73
  • 1
  • 9
0

If you prefer the micro-service approach, you can implement it similarly to the Flask example above, but more efficiently by using Flink's Async IO operator:

https://nightlies.apache.org/flink/flink-docs-master/docs/dev/datastream/operators/asyncio/

This way you're not blocking your pipeline waiting for the HTTP call to return.

Rafi Aroch
  • 386
  • 3
  • 7