0

We have a huge existing application in php which

  1. Accepts a log file
  2. Initialises all the database, in-memory store resources
  3. Processes every line
  4. Creates a set of output files

Above process happens per input file. Input files are written by a kafka consumer. Is it possible to fit this application in spark streaming by somehow not porting all the code in java? For example in following manner

  1. get a message from kafka topic
  2. Pass this message to spark streaming
  3. Spark streaming somehow interacts with legacy app and generates output
  4. spark then writes output again in kafka

Whatever I have just mentioned is too high level. I just want to know whether there's a possibility of doing this by not recoding existing app in java? And can anyone please tell me roughly how this can be done?

Shades88
  • 7,934
  • 22
  • 88
  • 130

2 Answers2

1

I think there is no possibility to use PHP in Spark directly. According to documentation (http://spark.apache.org/) and my knowledge it supports only Java, Scala, R and Python.

However you can change an architecture of your app and create some external services (ws, rest etc) and use them from Spark (you can use whichever library you want) - not all modules from old app must be rewritten to Java. I would try to go in that way :)

mariusz-s
  • 1,756
  • 2
  • 13
  • 14
  • So for every message streamed out from Kafka should be fed to a web service. Current production rate of those messages is upwards of 20K. Could you please guide me a little towards a scalable solution? I mean if I implement a web service will it be able to handle such loads? – Shades88 Jun 21 '16 at 12:38
  • Can thrift be used in any ways for cross language communication? Also can Storm fit the bill? – Shades88 Jun 21 '16 at 12:39
  • @Shades88 when you mention "20K" what units are we talking about? 20k files/second? lines/second? records/hour? – maasg Jun 21 '16 at 13:05
  • 20K lines per second – Shades88 Jun 22 '16 at 19:29
0

I think Storm is an excellent choice in this case because it offers non-jvm language integration through Thrift. Also I am sure that there is a PHP Thrift client.

So basically what you have to do is finding a ShellSpout and ShellBolt written in PHP (this is the integration part needed to interact with Storm in your application) and then write your own spouts and bolts which are consuming Kafka and processing each line.

You can use this library for your need: https://github.com/Lazyshot/storm-php

Then you will also have to find a PHP Thrift client to interact with the Storm cluster.

The Storm Thrift definition can be found here: https://github.com/apache/storm/blob/master/storm-core/src/storm.thrift

And a PHP Thrift client example can be found here: https://thrift.apache.org/tutorial/php

Now putting these things together you can write your own Apache Storm app in PHP.

Information sources: http://storm.apache.org/about/multi-language.html http://storm.apache.org/releases/current/Using-non-JVM-languages-with-Storm.html

Alma Alma
  • 1,641
  • 2
  • 16
  • 19