1

I need to get the data in a column of a table Cassandra Database. I am using RCassandra for this. After getting the data I need to do some text mining on it. Please suggest me how do connect to cassandra, and get the data into my R Script using RCassandra

My RScript :

library(RCassandra)
connect.handle <- RC.connect(host="127.0.0.1", port=9160)
RC.cluster.name(connect.handle)
RC.use(connect.handle, 'mykeyspace')

sourcetable <- RC.read.table(connect.handle, "sourcetable")
print(ncol(sourcetable))
print(nrow(sourcetable))
print(sourcetable)

This will print the output as:

> print(ncol(sourcetable))
[1] 1
> print(nrow(sourcetable))
[1] 18
> print(sourcetable)

144 BBC News
158 IBN Live
123  Reuters
131 IBN Live

But my cassandra table contains four columns, but here its showing only 1 column. I need to get each column values separated. So how do I get the individual column values(Eg.each feedurl) What changes should I make in my R script?

My cassandra table, named sourcetable

2 Answers2

0

I have used Cassandra and R with the correct Cran Jar files, but RCassandra is easier. RCassandra is a direct interface to Cassandra without the use of Java. To connect to Cassandra you will use RC.connect to return a connection handle like this.

RC.connect(host = <xxx>, port = <xxx>)
RC.login(conn, username = "bar", password = "foo")

You can then use a RC.get command to retrieve data or RC.ReadTable command to read table data.

BUT, First you should read THIS

apesa
  • 12,163
  • 6
  • 38
  • 43
  • But, When I am using RC.read.table, I am getting a dataframe as output with ncol = 1. I have 4 columns in my cassandra database table and only two of them is shown in that dataframe as a single column. Why it happens so? Also, if I am using RC.get, I will get a list of rows of the corresponding table. But how do I get each column value? I am totally confused here! – Lal Krishna Apr 02 '16 at 05:08
0

I am confused as well. Table demo.emp has 4 row and 4 columns ( empid, deptid, first_name and last_name). Neither RC.get nor RC.read.table gets the all the data.

cqlsh:demo> select * from emp;

empid | deptid | first_name | last_name
-------+--------+------------+-----------
 1 |      1 |       John |       Doe
 1 |      2 |        Mia |     Lewis
 2 |      1 |       Jean |       Doe
 2 |      2 |      Manny |     Lewis

> RC.get.range.slices(c, "emp", limit=10)
[[1]]
key value           ts
1           1.474796e+15
2      John 1.474796e+15
3       Doe 1.474796e+15
4           1.474796e+15
5       Mia 1.474796e+15

[[2]]
 key value           ts
1           1.474796e+15
2      Jean 1.474796e+15
3       Doe 1.474796e+15
4           1.474796e+15
5     Manny 1.474796e+15