Enable Spark on Same Node As Cassandra

Question

I am trying to test out Spark so I can summarize some data I have in Cassandra. I've been through all the DataStax tutorials and they are very vague as to how you actually enable spark. The only indication I can find is that it comes enabled automatically when you select "Analytics" node during install. However, I have an existing Cassandra node and I don't want to have to use a different machine for testing as I am just evaluating everything on my laptop.

Is it possible to just enable Spark on the same node and deal with any performance implications? If so how can I enable it so that it can be tested?

I see the folders there for Spark (although I'm not positive all the files are present) but when I check to see if it's set to Spark master, it says that no spark nodes are enabled.

dsetool sparkmaster

I am using Linux Ubuntu Mint.

I'm just looking for a quick and dirty way to get my data averaged and so forth and Spark seems like the way to go since it's a massive amount of data, but I want to avoid having to pay to host multiple machines (at least for now while testing).

score 3 · Accepted Answer · edited Jun 20 '20 at 09:12

3

Yes, Spark is also able to interact with a cluster even if it is not on all the nodes.

Package install

Edit the /etc/default/dse file, and then edit the appropriate line 
to this file, depending on the type of node you want:
...

Spark nodes:
SPARK_ENABLED=1
HADOOP_ENABLED=0
SOLR_ENABLED=0

Then restart the DSE service

http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseServ.html

Tar Install

Stop DSE on the node and the restart it using the following command

From the install directory:
...
Spark only node: $ bin/dse cassandra -k - Starts Spark trackers on a cluster of Analytics nodes.

http://docs.datastax.com/en/datastax_enterprise/4.5/datastax_enterprise/reference/refDseStandalone.html

edited Jun 20 '20 at 09:12

Community

1
1

answered May 09 '15 at 00:35

RussS

16,476
1
34
62

I just want to clarify that I am trying to install Spark on the same machine as my database where I am doing the logging. In other words, I am using one machine for everything. Is still inline with what you are saying? – KingOfHypocrites May 09 '15 at 00:39
2

Yep, DSE always starts C* on any node you run it on. – RussS May 09 '15 at 00:40
I think that worked! 'll probably end up creating a second issue, but any idea why DSE_ENV could not be determined pops up? I get it when running demos and simple things like nodetool status. I noticed that echo $DSE_HOME returns nothing in the console as well and this appears to get set in the DSE_ENV file. – KingOfHypocrites May 09 '15 at 01:25
In case you want to take a look, I created another issue: http://stackoverflow.com/questions/30135081/dse-env-could-not-be-determined – KingOfHypocrites May 09 '15 at 01:42

score 0 · Answer 2 · edited Aug 18 '15 at 01:31

0

Enable spark by changing SPARK_ENABLED=1 using the command: sudo nano /usr/share/dse/resources/dse/conf/dse.default

edited Aug 18 '15 at 01:31

Vasseurth

6,354
12
53
81

answered Aug 18 '15 at 01:25

Manoj

1

Enable Spark on Same Node As Cassandra

2 Answers2

Package install

Tar Install