4

I am using Cassandra 0.8.7, Aquiles as C# client and Thrift 0.7 and I am trying to get a quite big amount of data out of a SuperColumnFamily that has the following definition:

create column family SCF with column_type=Super and comparator=TimeUUIDType and subcomparator=AsciiType;

I want to insert the data fetched from Cassandra into a DataTable so i would be able to filter the rows and generate some reports based on that, but I am always getting an OutOfMemoryException.

[OutOfMemoryException: Exception of type 'System.OutOfMemoryException' was thrown.]
   Thrift.Transport.TFramedTransport.ReadFrame() +191
   Thrift.Transport.TFramedTransport.Read(Byte[] buf, Int32 off, Int32 len) +101
   Thrift.Transport.TTransport.ReadAll(Byte[] buf, Int32 off, Int32 len) +76
   Thrift.Protocol.TBinaryProtocol.ReadAll(Byte[] buf, Int32 off, Int32 len) +66
   Thrift.Protocol.TBinaryProtocol.ReadI32() +47
   Thrift.Protocol.TBinaryProtocol.ReadMessageBegin() +75
   Apache.Cassandra.Client.recv_multiget_slice() in D:\apache-cassandra-0.8.0-beta2\interface\gen-csharp\Apache\Cassandra\Cassandra.cs:304
   Apache.Cassandra.Client.multiget_slice(List`1 keys, ColumnParent column_parent, SlicePredicate predicate, ConsistencyLevel consistency_level) in D:\apache-cassandra-0.8.0-beta2\interface\gen-csharp\Apache\Cassandra\Cassandra.cs:286

I tried several approaches to optimize my code, my final version was to split the period of time (and the number of keys if they exceed a prefixed number) I am using to slice the SuperColumn in smaller ranges but nothing, eventually I always get the same exception.

Can it be a bug of the Thrift library? When I get the exception it always point to the following portion of the code inside Thrift.Transport.TFramedTransport:

private void ReadFrame()
        {
            byte[] i32rd = new byte[header_size];
            transport.ReadAll(i32rd, 0, header_size);
            int size =
                ((i32rd[0] & 0xff) << 24) |
                ((i32rd[1] & 0xff) << 16) |
                ((i32rd[2] & 0xff) <<  8) |
                ((i32rd[3] & 0xff));

            byte[] buff = new byte[size]; //Here the exception is thrown
            transport.ReadAll(buff, 0, size);
            readBuffer = new MemoryStream(buff);
        }

Following is the code I am trying to run:

    string columnFamily = "SCF";
    ICluster cluster = AquilesHelper.RetrieveCluster(ConfigurationManager.AppSettings["CLUSTERNAME"].ToString());
    ColumnParent columnParent = new ColumnParent()
        {
            Column_family = columnFamily
        };
    List<byte[]> keys = //Function that return the list of the key i want to query

    SlicePredicate predicate = new SlicePredicate();
    foreach (DateTime[] dates in dateList)
    {
       from = GuidGenerator.GenerateTimeBasedGuid(dates[0]);
       to = GuidGenerator.GenerateTimeBasedGuid(dates[1]);
       predicate = new SlicePredicate()
       {
          Slice_range = new SliceRange()
          {


     Count = int.MaxValue,
         Reversed = false,
         Start = Aquiles.Helpers.Encoders.ByteEncoderHelper.UUIDEnconder.ToByteArray(from),
         Finish = Aquiles.Helpers.Encoders.ByteEncoderHelper.UUIDEnconder.ToByteArray(to)
      },
   };
   cluster.Execute(new ExecutionBlock(delegate(CassandraClient client)
   {
      int maxKeys = Convert.ToInt32(ConfigurationManager.AppSettings["maxKeys"]);
      CassandraMethods.TableCreator(ref dt, columnParent, predicate, keys, client, maxKeys);
      return null;
   }), ConfigurationManager.AppSettings["KEYSPACE"].ToString());
}

And this is the function that is supposed to insert the data from cassandra into the DataTable:

public static DataTable TableCreator(ref DataTable dt, ColumnParent columnParent, SlicePredicate predicate, List<byte[]> keys, CassandraClient client, int maxKeys)
{
   int keyCount = keys.Count;
   if (keyCount < maxKeys)
      CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys, columnParent, predicate, ConsistencyLevel.ONE));
   else
   {
      int counter = 0;
      while (counter < keyCount)
      {
         if (counter + maxKeys <= keyCount)
            CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys.GetRange(counter, maxKeys), columnParent, predicate, ConsistencyLevel.ONE));
         else
            CassandraMethods.CassandraToDataTable(ref dt, client.multiget_slice(keys.GetRange(counter, keyCount - counter), columnParent, predicate, ConsistencyLevel.ONE));
         counter += maxKeys;
      }
   }
   return dt;
}

Am I missing anything? What am I doing wrong?

Update 1: I tried also with Cassandra 1.0, Aquiles 1.0, both version 0.6 and 0.7 of Thrift but nothing, still same exception.

Update 2: Problem solved, read my answer below

Dennis
  • 14,264
  • 2
  • 48
  • 57
kefer9
  • 334
  • 4
  • 7

2 Answers2

1

Problem solved :) I played around with memory usage and garbage collector and I fixed the problem.

What happened was that whenever my application reached 1.5 GB of Ram the exception was thrown due to the fact that visual studio compiled it as a 32bit application.

Compiling and running as x64 solved the issued, to make sure to not use too much memory now i added the following 3 lines of code before each Cassandra multiget_slice call.

GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);
GC.WaitForPendingFinalizers();
GC.Collect(GC.MaxGeneration, GCCollectionMode.Forced);

Thanks, N.

kefer9
  • 334
  • 4
  • 7
0

How big is the data in your SuperColumnFamily? Thrift has a default maximum frame size of 15 Mb. This is set in /etc/cassandra/conf/cassandra.yaml - you could try increasing this?

Note that it's not possible to split your data smaller than a single supercolumn.

Theodore Hong
  • 1,747
  • 12
  • 11
  • I already tried to increase that parameter. I don't think it's a matter of the size of the data, if it was i should always get the exception when i query a row that is too big. For Example if I am querying from 2011-01-01 to 2011-03-31 it might happen that i get the exception lets say the 28th of February; if it was a matter of big data i should always get the exception the 28 of February right? Instead if i start my slice that day everything goes smoothly for a while and i will get the exception later. – kefer9 Oct 19 '11 at 15:15