0

I am creating a c++ implementation of HDFS, using ONCRPC and Google Protobuf. The issue I'm facing is that I'm sending a protobuf object with multiple fields populated (sending the serialized string, parsing from it at the receving end), however, at the receiving end it erroneously says that the one of the fields has not been set/does not exist.

This is a part of my hdfs.proto file :

message AssignBlockRequest {
  optional int32 handle = 1; // obtain using call to OpenFile
}

message AssignBlockResponse {
  optional int32 status = 1;
  optional BlockLocations newBlock = 2;
       }

message BlockLocations {
  optional int32 blockNumber = 1;
  repeated DataNodeLocation locations = 2;
}

message DataNodeLocation {
  optional string ip = 1;
  optional int32 port = 2;
}

I'm using this in the "client" application to query the "namenode server" for a new block and a list of datanodelocations to which it can send data to write.

So, in my client :

AssignBlockResponse assignnewblock_ ( int fhandle, CLIENT* clnt ) {
  AssignBlockRequest req;
  req.set_handle(fhandle);

  //send request to nn
  string str;
  req.SerializeToString(&str);
  static char *cstr = new char[str.length() + 1];
  memcpy(cstr, str.c_str(), str.length()+1);
  char **result_abreq;
  result_abreq = assignblock_1( &cstr, clnt );

  //handle response
  AssignBlockResponse rsp;
  string str_arg (*result_abreq);
  rsp.ParseFromString(str_arg);
  cout << "NN RETURNED : " << rsp.status() << " " << rsp.has_newblock() << endl;

  return rsp;
}

while in my namenode server.cc

char **
assignblock_1_svc(char **argp, struct svc_req *rqstp)
{

  AssignBlockRequest req;
  string str_arg (*argp);
  req.ParseFromString(str_arg);

  AssignBlockResponse rsp;

  if ( DataNodeList.empty() ) { // no DN available
    rsp.set_status (1);
  }
  else {
    rsp.set_status (0);

    int BL_NUM = 0;
    vector<int> shuf;

    BlockLocations bl;// = new BlockLocations;
    bl.set_blocknumber(BL_NUM);

    rsp.mutable_newblock()->CopyFrom(bl);
  }
  cout << "NN RETURNED : " << rsp.status() << " " << rsp.has_newblock() << endl;


  string str;
  rsp.SerializeToString(&str);
  static char *cstr = new char[str.length() + 1];
  memcpy(cstr, str.c_str(), str.length()+1);

  return &cstr;
}

The NN outputs "0 1" while the client upon receiving this AssignBlockResponse type request shows "0 0" i.e. it gets the status right (tested by varying the status set in the AssignBlockResponse message), but never detects the "newblock" field sent by the server.cc to it.

Any help would be greatly appreciated.

-- EDIT 1 --

Protocol buffer serializing with inheritance. Derived classes are empty

This may be of interest. I still can't get my code to work, howvever.

Community
  • 1
  • 1
lip
  • 114
  • 1
  • 11

1 Answers1

2

I've come across this in my early work with protocol buffers.

Don't serializeToString. serializeToArray having first built a vector big enough (call ByteSize() on the Message)

The problem is that your serialised byte stream contains a zero byte, which is interpreted as an end-of-string when converting the char* to a string.

This means you end up parsing an incomplete message, hence missing fields.

Richard Hodges
  • 68,278
  • 7
  • 90
  • 142
  • I changed it to use SerializeToArray and ParseFromArray, but to no avail. I'm convinced that your suggestion should work, but my code fails still. – lip Mar 26 '15 at 22:00
  • I did get it to work, but only when hard-coding the value of ByteSize() call to the server side object in the ParseFromArray of the Client size object. Is there a good way to send the value of ByteSize() over the network / figure out ByteSize from the received c_str? I could just use the status field to store it as well. Thanks ! – lip Mar 26 '15 at 22:07
  • Ah that's another thing. protobuf has no concept of a 'frame'. you have to send the message in your own framing protocol. The Parse functions expect to work on a complete frame. – Richard Hodges Mar 26 '15 at 22:18
  • something very simple like a 4-byte length (in network byte order) followed by the serialised bytes followed by (if you're paranoid) a crc or checksum of the data would be a reasonable framing protocol over tcp. If you're sending over http then of course the http protocol has the Content-Length: header. – Richard Hodges Mar 26 '15 at 22:20
  • `AssignBlockResponse rsp; rsp.ParseFromString(*result_abreq); int size = rsp.status(); rsp.ParseFromArray(*result_abreq, size);` I set the ByteSize() result as the status field of AssignBlockResponse, and it seems to work. Please let me know if this has any potential caveats. – lip Mar 26 '15 at 22:26
  • well, as I mentioned, my results from using ParseFromString required that the serialised message did not contain any 0 bytes. This is not something you can reliably predict in all cases. I would send the message length prior to the serialised message and then use this value to create an array buffer for parsing. – Richard Hodges Mar 26 '15 at 23:16