0

I am trying to test out Spark so I can summarize some data I have in Cassandra. I've been through all the DataStax tutorials and they are very vague as to how you actually enable spark. The only indication I can find is that it comes enabled automatically when you select "Analytics" node during install. However, I have an existing Cassandra node and I don't want to have to use a different machine for testing as I am just evaluating everything on my laptop.

Is it possible to just enable Spark on the same node and deal with any performance implications? If so how can I enable it so that it can be tested?

I see the folders there for Spark (although I'm not positive all the files are present) but when I check to see if it's set to Spark master, it says that no spark nodes are enabled.

dsetool sparkmaster

I am using Linux Ubuntu Mint.

I'm just looking for a quick and dirty way to get my data averaged and so forth and Spark seems like the way to go since it's a massive amount of data, but I want to avoid having to pay to host multiple machines (at least for now while testing).

KingOfHypocrites
  • 9,316
  • 9
  • 47
  • 69

2 Answers2

3

Yes, Spark is also able to interact with a cluster even if it is not on all the nodes.

Package install

Edit the /etc/default/dse file, and then edit the appropriate line 
to this file, depending on the type of node you want:
...

Spark nodes:
SPARK_ENABLED=1
HADOOP_ENABLED=0
SOLR_ENABLED=0

Then restart the DSE service

http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseServ.html

Tar Install

Stop DSE on the node and the restart it using the following command

From the install directory:
...
Spark only node: $ bin/dse cassandra -k - Starts Spark trackers on a cluster of Analytics nodes.

http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseStandalone.html

Community
  • 1
  • 1
RussS
  • 16,476
  • 1
  • 34
  • 62
  • I just want to clarify that I am trying to install Spark on the same machine as my database where I am doing the logging. In other words, I am using one machine for everything. Is still inline with what you are saying? – KingOfHypocrites May 09 '15 at 00:39
  • 2
    Yep, DSE always starts C* on any node you run it on. – RussS May 09 '15 at 00:40
  • I think that worked! 'll probably end up creating a second issue, but any idea why DSE_ENV could not be determined pops up? I get it when running demos and simple things like nodetool status. I noticed that echo $DSE_HOME returns nothing in the console as well and this appears to get set in the DSE_ENV file. – KingOfHypocrites May 09 '15 at 01:25
  • In case you want to take a look, I created another issue: http://stackoverflow.com/questions/30135081/dse-env-could-not-be-determined – KingOfHypocrites May 09 '15 at 01:42
0

Enable spark by changing SPARK_ENABLED=1 using the command: sudo nano /usr/share/dse/resources/dse/conf/dse.default

Vasseurth
  • 6,354
  • 12
  • 53
  • 81
Manoj
  • 1