2

John Snow Labs’ NLP library built on top of Apache Spark and Spark ML library. All its examples are provided in scala and python. Does it support java? If yes where can I find the related guides? If not is there any plan to support java?

Maziyar
  • 1,913
  • 2
  • 18
  • 37
Mahesha999
  • 22,693
  • 29
  • 116
  • 189
  • 1
    Scala is compatible with Java, sometimes - I'm looking at closures - it's not easy, but still possible – T. Gawęda Mar 23 '18 at 14:22
  • 1
    There are some Java specific classes in spark core api such as JavaRDD, JavaSparkContext. So I feel its not 100% compatible. Also there is an [open issue](https://github.com/JohnSnowLabs/spark-nlp/issues/31) raised on github tagged as "new feature". I guess its not possible. – Mahesha999 Mar 23 '18 at 17:42
  • 1
    JavaRDD returns Java List instead of Scala List. That is the difference ;) – T. Gawęda Mar 23 '18 at 17:47
  • 1
    Yes thats correct but I didnt get what u meant to say exactly. Does that menas Spark NLP is not yet supported with Java? – Mahesha999 Mar 23 '18 at 20:13
  • 2
    I don't know if it has dedicated API, if not you will have to use JavaConverters from Scala – T. Gawęda Mar 23 '18 at 21:12
  • It does not have a dedicated Java API - only Scala and Python. – Glennie Helles Sindholt Mar 27 '18 at 07:10
  • There is an example for Java: https://github.com/JohnSnowLabs/spark-nlp-workshop/tree/master/java/annotation Also, it is a good idea to follow this issue for Java documentation: https://github.com/JohnSnowLabs/spark-nlp/issues/576 – Maziyar Nov 11 '19 at 11:35

1 Answers1

2

In general, Scala libraries only need a dedicated Java API if their API (not the implementation) exposes functionality with no Java equivalent. Unfortunately, standard Scala function types are an example, at least until Scala 2.12 and Java 8. E.g. Spark makes a lot of use of ClassTags and implicits, which makes it hard to use directly from Java.

But this library is based on Spark ML, which doesn't have a separate Java API, and from a quick look, doesn't seem to need one (at least for the new DataFrame-based API). You can see its examples in Java at https://spark.apache.org/docs/2.3.0/ml-pipeline.html.

So the NLP library just creates instances of Transformer, Pipeline and other Spark ML types, and the code for creating them is trivially translatable to Java. You just need to know that Array(...) corresponds to new T[] { ... } (where T is the type of arguments). From this it doesn't seem to need a Java API, even if it could benefit from giving examples in Java. Unfortunately, it doesn't appear to provide even a Scaladoc link so I could see whether there is something in the API which is problematic to use from Java.

Alexey Romanov
  • 167,066
  • 35
  • 309
  • 487