0

I am using spark sql with java api.I am trying to broadcast a dataset and use the boradcasted datset. Here is the same peice of code which is causing an issue.

Dataset<Rules> rulesDS= loadTrustRulesAsDataset("Rules.csv");
final Broadcast<Dataset<Rules>> broadcastTrustRulesDS = sqlcontext.broadcast(rulesDS);

Dataset<Rules> ds = broadcastTrustRulesDS.getValue();
ds.show();

As mentioned in the comments section, I have updated code as below

Dataset<Rules> broadcastTrustRulesDS = org.apache.spark.sql.functions.broadcast(rulesDS);

Dataset<Rules> ds = broadcastTrustRulesDS.value();
ds.show();

Is throwing Nullpointer Exception at ds.show()

this ds.show() is not giving any result.

When I run in the eclipse ,below message is being shown in console.

18/05/03 09:51:31 WARN NettyUtil: Found Netty's native epoll transport, but not running on linux-based operating system. Using NIO instead.
18/05/03 09:51:32 INFO Cluster: New Cassandra host localhost/127.0.0.1:9042 added
18/05/03 09:51:32 INFO CassandraConnector: Connected to Cassandra cluster: Test Cluster
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
[Stage 16:=======================================>                 (7 + 3) / 10]
zero323
  • 322,348
  • 103
  • 959
  • 935
hasha
  • 304
  • 1
  • 3
  • 11
  • As in the duplicate it is not how you broadcast `Datasets`, Also `Broadcast` has no `getValue` method AFAIK and broadcasting won't have any impact on data which is not used with `joins`. – zero323 May 03 '18 at 17:15
  • Would you please explain in detail on what it means when you state "Broadcast has no getValue method AFAIK".Any pointers for this concept would really be helpful.I am trying to understand if broadcasting of dataset would only work when it his used with joins ? – hasha May 03 '18 at 17:19
  • I mean [`Broadcast`](https://spark.apache.org/docs/latest/api/java/org/apache/spark/broadcast/Broadcast.html) literally doesn't have `getValue` method. [It has `value` method](https://spark.apache.org/docs/latest/api/java/org/apache/spark/broadcast/Broadcast.html#value--) :) And as linked, to broadcast dataset used in `join` use `org.apache.spark.sql.functions.broadcast` function. – zero323 May 03 '18 at 17:22
  • I have updated the code as suggested, but it is still not working.Does broadcast ever work for distributed data like Datasets. – hasha May 03 '18 at 17:41
  • This code wouldn't even compile as `broadcast` function doesn't return `Broadcast` object and as a result doesn't have `value` method. Please read carefully the linked question, and you have further question, make sure you post [mcve]. – zero323 May 03 '18 at 18:36

0 Answers0