0

I have written a program whose purpose is to read from Aerospike and convert it into RDD in spark.

public void sparkTest () throws UnsupportedDataTypeException{

        log.debug("TESTING SPARK WITH AEROSPIKE");
        String host = "localhost";
        int port = 3000;

        String namespace = "mynamespace";
        String inputSet = "myset";


        AerospikeDeepJobConfig inputConfigCell = AerospikeConfigFactory.createAerospike().host(host).port(3000)
                .namespace(namespace)
                .set(inputSet)
                ;

        log.debug("Print inputConfigCell ......");
        log.debug(inputConfigCell.getNamespace());
        log.debug(inputConfigCell.getSet());
        log.debug(inputConfigCell.getAerospikePort());
        log.debug(inputConfigCell.getHost());

        JavaRDD inputRDDCell = sparkContext.createJavaRDD(inputConfigCell);
        log.debug("Print RDD .............");
        log.debug(inputRDDCell);
}

I know that there are many records in my Aerospike set but could not access RDD nature of 'inputRDDCell'. Even logs for namespace, set, port host is completely correct. I am trying to use inputRDDCell.first() but it gives exception but when I simply print the RDD object, it gives me very wiered output.

Please guide me that how can I properly generate usable and functional RDDs from it. I am using this link as guidance: http://www.programcreek.com/java-api-examples/index.php?source_dir=deep-examples-master/deep-aerospike/src/main/java/com/stratio/deep/examples/java/factory/ReadingCellFromAerospike.java

I have used RDD, JAVARDD everything but get same output.

The output of logs are :

[2016-03-10 15:58:05.812] boot - 13535 DEBUG [main] --- PushAnalysisService: TESTING SPARK WITH AEROSPIKE
[2016-03-10 15:58:05.825] boot - 13535 DEBUG [main] --- PushAnalysisService: Print inputConfigCell ......
[2016-03-10 15:58:05.827] boot - 13535 DEBUG [main] --- PushAnalysisService: mynamespace
[2016-03-10 15:58:05.829] boot - 13535 DEBUG [main] --- PushAnalysisService: myset
[2016-03-10 15:58:05.831] boot - 13535 DEBUG [main] --- PushAnalysisService: 3000
[2016-03-10 15:58:05.832] boot - 13535 DEBUG [main] --- PushAnalysisService: localhost
[2016-03-10 15:58:06.025] boot - 13535  INFO [main] --- MemoryStore: ensureFreeSpace(552) called with curMem=0, maxMem=539724349
[2016-03-10 15:58:06.035] boot - 13535  INFO [main] --- MemoryStore: Block broadcast_0 stored as values in memory (estimated size 552.0 B, free 514.7 MB)
[2016-03-10 15:58:06.161] boot - 13535  INFO [main] --- MemoryStore: ensureFreeSpace(901) called with curMem=552, maxMem=539724349
[2016-03-10 15:58:06.165] boot - 13535  INFO [main] --- MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 901.0 B, free 514.7 MB)
[2016-03-10 15:58:06.196] boot - 13535  INFO [sparkDriver-akka.actor.default-dispatcher-5] --- BlockManagerInfo: Added broadcast_0_piece0 in memory on localhost:49368 (size: 901.0 B, free: 514.7 MB)
[2016-03-10 15:58:06.205] boot - 13535  INFO [main] --- SparkContext: Created broadcast 0 from broadcast at DeepRDD.java:65
[2016-03-10 15:58:06.294] boot - 13535 DEBUG [main] --- PushAnalysisService: Print RDD .............
[2016-03-10 15:58:06.302] boot - 13535 DEBUG [main] --- PushAnalysisService: DeepRDD[0] at RDD at DeepRDD.java:62
Hafsa Asif
  • 371
  • 1
  • 5
  • 11

1 Answers1

-1

There is a significant difference between the community project sasha-polev/aerospark and the fork supported by Aerospike called aerospike/aerospark.

The community one is pretty dormant, and only provides basic RDD support. The one supported by Aerospike supports RDD, DataFrames, and SparkSQL. You should try using your existing code with it.

Ronen Botzer
  • 6,951
  • 22
  • 41