1

Our current Datastax datacenter setup contain 6 nodes in which both Solr and graph enabled

root@ip-10-10-5-36:~# cat /etc/default/dse | grep -E 'SOLR_ENABLED|GRAPH_ENABLED'

GRAPH_ENABLED=1
SOLR_ENABLED=1

root@ip-10-10-5-36:~# nodetool status

Datacenter: SearchGraph
=======================
Status=Up/Down
|/ State=Normal/Leaving/Joining/Moving
--  Address     Load       Tokens       Owns    Host ID                               Rack
UN  10.10.5.56  456.58 MiB  1            ?       936a1ac0-6d5e-4a94-8953-d5b5a2016b92  rack1
UN  10.10.5.46  406.24 MiB  1            ?       3f41dc2a-2672-47a1-90b5-a7c2bf17fb50  rack1
UN  10.10.5.76  392.99 MiB  1            ?       29f8fe44-3431-465e-b682-5d24e37d41d7  rack2
UN  10.10.5.66  414.16 MiB  1            ?       1f7de531-ff51-4581-bdb8-d9a686f1099e  rack2
UN  10.10.5.86  424.3 MiB   1            ?       27d37833-56c8-44bd-bac0-7511b8bd74e8  rack2
UN  10.10.5.36  511.44 MiB  1            ?       0822145f-4225-4ad3-b2be-c995cc230830  rack1

We are planning to implement spark in our existing datacenter. My question is

1) Will enabling spark affect existing data and service in datastax ?.

2) Or instead of enabling SPARK_ENABLED=1, did we need to setup separate datacenter for Spark ?

Updated :

3) How DC1 and DC2 connect each other in ring, is it based on same Cluster name specified in cluster_name: parameter. Conf file : /etc/dse/cassandra/cassandra.yaml

4) Is there any separate configuration need to specify spark master in data
center.

5) Did i need to specify SearchGraph (DC1) seed ip in Spark(DC2) seed
configuration section ? Or just Spark seed ip only need to specify in DC2 Configuration section(cassandra:yaml)

Sreeraju V
  • 535
  • 2
  • 5
  • 19

2 Answers2

0

It's recommended to create separate datacenter for DSE Analytics. The full process is described in documentation.

Alex Ott
  • 80,552
  • 8
  • 87
  • 132
  • Thanks, We had created separate DC for Spark work load. Our Cluster now contain 6 SearchGraph nodes in DC1 and 3 Spark nodes in DC2. We had altered the keyspace [as described in document](https://docs.datastax.com/en/dse/5.1/dse-dev/datastax_enterprise/spark/dseAnalyticsSolo.html) and configured **cluster_name:** same in both Data-center. Now both datacenter is showing in cluster. – Sreeraju V Feb 02 '18 at 05:38
  • I need some clarification on Data center connection in Cluster. **(1)** How DC1 and DC2 connect each other in ring, is it based on same Cluster name specified in **cluster_name:** parameter. **(2)** Is there any separate configuration need to specify spark master in data center. **(3)** Did i need to specify SearchGraph (DC1) seed ip in Spark(DC2) seed configuration section ? Or just Spark seed ip only need to specify in DC2 Configuration section(Cassandra:yaml) – Sreeraju V Feb 02 '18 at 07:09
  • At least one node from each DC should be in the seed list for all nodes in both DCs (https://docs.datastax.com/en/cassandra/3.0/cassandra/initialize/initMultipleDS.html). – Alex Ott Feb 02 '18 at 08:50
0

to augment Alex's answer, this will depend if you'd like to execute Graph Analytics or not. What type of Spark work will be preformed when it's enabled?

jlacefie
  • 614
  • 3
  • 5