0

I have some code which updates a table periodically. It everytime should delete from the table and then insert new records.

The problem is that dse search has a gap indexing the table.

This is the code:

session_statis.execute('DELETE FROM statistics WHERE source = %s', [source])

timeone = datetime.now(tz) - timedelta(hours=1)

channels_rdd = channels.map(lambda x:(x.id,{'author':x.name,'category':x.category}))

article_rdd=rdd.map(lambda x:(x[1][0]['channel'],{'source':x[1][0]['source'],'id':x[1][0]['id'],'title':x[1][0]['title'],'thumbnail':x[1][0]['thumbnail'],'url':x[1][0]['url'],'created_at':x[1][0]['created_at'],'genre':x[1][0]['genre'],'reads':0,'likes':x[1][1]['attitudes'],'comments':x[1][1]['comments'],'shares':x[1][1]['reposts'],'shares':x[1][1]['reposts']})) \
                .join(channels_rdd).map(lambda x:{'source':x[1][0]['source'],'id':x[1][0]['id'],'title':x[1][0]['title'],'thumbnail':x[1][0]['thumbnail'],'url':x[1][0]['url'],'created_at':parse(x[1][0]['created_at']),'genre':x[1][0]['genre'],'reads':0,'likes':x[1][0]['likes'],'comments':x[1][0]['comments'],'shares':x[1][0]['shares'],'speed':x[1][0]['shares'],'category':x[1][1]['category'],'author':x[1][1]['author']})

result1=article_rdd.filter(lambda x:x['created_at']>=timeone).filter(lambda x:x['speed']>0).map(lambda x:{'timespan':'1','source':x['source'],'id':x['id'],'title':x['title'],'thumbnail':x['thumbnail'],'url':x['url'],'created_at':x['created_at'],'genre':x['genre'],'reads':0,'likes':x['likes'],'comments':x['comments'],'shares':x['shares'],'speed':x['shares'],'category':x['category'],'author':x['author']})

for rdd in result1.collect():
    dt article xxxx
        session_statis.execute('INSERT INTO statistics(source, timespan, id, title, thumbnail, url, created_at, category, genre, author, reads, likes, comments, shares, speed) values(%s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s, %s)', (rdd['source'],rdd['timespan'],rdd['id'],rdd['title'],rdd['thumbnail'],rdd['url'],rdd['created_at'],rdd['category'],rdd['genre'],rdd['author'],rdd['reads'],rdd['likes'],rdd['comments'],rdd['shares'],rdd['speed']))

Thanks for your replies.

Mykola Yashchenko
  • 5,103
  • 3
  • 39
  • 48
peter
  • 674
  • 1
  • 12
  • 33

1 Answers1

1

probably you may have to consider various CONSISTENCY LEVEL according to your usage pattern. like setting up 1 will yield good result.

Gomes
  • 3,330
  • 25
  • 17
  • Thanks for your reply. What level do you think will make delete and insert atomic? I am just not sure. – peter Feb 25 '16 at 04:04
  • 1
    you can try 1 for read, also try executing these stuff under transaction scope. Whatever it is solr usage should be very carefully considered coz it can bring some latency to write operation. – Gomes Feb 25 '16 at 04:11
  • I don't have read just writes. I am not sure how to do it it with transaction scope? – peter Feb 25 '16 at 04:55
  • 1
    pls refer here http://docs.datastax.com/en//cql/3.1/cql/cql_reference/batch_r.html – Gomes Feb 25 '16 at 05:02
  • 1
    also read this nice answer for frequent delete http://stackoverflow.com/questions/35392430/cassandra-delete-not-working compaction and read repair play vital role with respect to delete – Gomes Feb 25 '16 at 05:10