2

I surfed a lot of questions on the board, about tcp sockets, big-endian and little-endian format but to me nothing apllies to my case.

And I'm sorry for my bad English, I'm working on it :)

I'm loosing my mind on an unexpected behaviour in a simple client-server configuration. Here's the scenario:

Server (C++) <--- TCP socket ---> Client(Java).

Here's the client code:

package NetServ.apps.bigServer.NSLPClient;

import java.io.DataInputStream;
import java.io.DataOutputStream;
import java.io.IOException;
import java.net.Socket;
import java.net.SocketException;
import java.net.UnknownHostException;
import java.nio.charset.Charset;
import java.util.ArrayList;


public class Communicator {

    private Socket sock;
    private final int port = 6666;
    private final String address="127.0.0.1";
    private DataOutputStream out;
    private DataInputStream in;
    public Communicator(){

    System.out.println("Creating communicator. Trying to bind to the tcp socket");

    try {
        sock = new Socket(address, port);
        out=new DataOutputStream(sock.getOutputStream());
        in=new DataInputStream(sock.getInputStream());
    } catch (UnknownHostException e) {
        System.out.println("Unable to resolv host");
        e.printStackTrace();
    } catch (IOException e) {
        System.out.println("Generic I/O exception");
        e.printStackTrace();
    }
    System.out.println("Communicator created");
}


  public void sendRequest(Request req) throws IOException{
    int cmd=0;
    if(req.getCmd().equals(CommandType.tg_setup_message))
        cmd=0;
    if(req.getCmd().equals(CommandType.tg_remove_message))
        cmd=1;
    if(req.getCmd().equals(CommandType.tg_trigger_message))
        cmd=2;
    if(req.getCmd().equals(CommandType.tg_probe_message))
        cmd=3;
    byte[] buff;
    Charset charset = Charset.forName("ISO-8859-1");

    out.writeInt(cmd);

    //out.writeUTF(req.getDstAddr().toString().substring(1));
    buff = req.getDstAddr().toString().substring(1).getBytes(charset);
    out.writeShort((short)buff.length);
    out.write(buff, 0, buff.length);

    out.writeInt(req.getProtocol());
    out.writeInt(req.getSecure());

    //out.writeUTF(req.getDataId());
    buff = req.getDataId().getBytes(charset);
    out.writeShort((short)buff.length);
    out.write(buff, 0, buff.length);

    //out.writeUTF(req.getUser());
    buff = req.getUser().getBytes(charset);
    out.writeShort((short)buff.length);
    out.write(buff, 0, buff.length);


    out.flush();
    out.writeInt(req.getOffpath_type());
    if(req.getOffpath_type()!=-1){
        out.writeInt(req.getMetric_type());

        String tmp = "" + req.getMetric();

        //out.writeUTF(tmp);
        buff = tmp.getBytes(charset);
        out.writeShort((short)buff.length);
        out.write(buff, 0, buff.length);

    }

    switch (req.getCmd()){
    case tg_setup_message:
        out.writeUTF(req.getUrl());         
        out.writeInt(req.getLifetime());
        out.writeUTF(req.getParameters().toString());
        break;
    case tg_remove_message:
        //TODO
        break;
    case tg_trigger_message:
        //TODO
        break;
    case tg_probe_message:
        for (Short s : req.getProbes()){
            //System.out.println("Writing probe code " + s.shortValue());
                out.writeShort(s.shortValue());
        }
        break;
    }   


    if(req.getSignature()!=null){
        out.writeInt(1);
        out.writeUTF(req.getSignature());
    }else{          
        out.writeInt(0);
    }

    if(req.getDep()!=null){
        out.writeInt(1);
        out.writeUTF(req.getDep());
    }else{
        out.writeInt(0);
    }

    if(req.getNotif()!=null){
        out.writeInt(1);
        out.writeUTF(req.getNotif());
    }else{
        out.writeInt(0);
    }

    if(req.getNode()!=null){
        out.writeInt(1);
        out.writeUTF(req.getNode());
    }else{
        out.writeInt(0);
    }
    out.flush();
    //out.close();
    System.out.println("request sent");
}

public ArrayList<String> rcvProbeResponse() throws IOException, SocketException{
    ArrayList<String> response= new ArrayList<String>();
    System.out.println("Waiting for response...");
    boolean timeout=false;

    int responseCode=-1;
    responseCode=in.readInt();
    //responseCode = in.readInt();
    //System.out.println("Response code "+responseCode);
    if(responseCode==1){ //response is ready! !
        System.out.println("Response arriving from NSLP (code 1 )");

        int responseCmdCode = in.readInt();
        if(responseCmdCode!=2)
            return null;
        //System.out.println("Response Command Code " + responseCmdCode );
        int probeSize = in.readInt();
        //System.out.println("Number of probes " + probeSize);
        for(int i=0; i<probeSize; i++){
            //System.out.println("i: "+i);
            String out = in.readUTF();
            response.add(out);
        }
    }
    in.close();
    if(timeout)
        return null;
    else
        return response;
}

}

Nothing special about that: the protocol between the entities is simply an exchange of integers, shorts and strings, that triggers the server to execute some signaling tasks (the server is the daemon of a signaling protocol).

On the other side the server is legacy code that I modified to comunicate with java. Here's the relevant code:

[...]
// Set the current socket
communicator->setSocket(sockfd);

// FSM data structure
NetservNslpFsmData * data = new NetservNslpFsmData();

//give the address list of this node to all FSMs created by the client
data->nodeAddressList = &(param.addresses);
// Read from socket the parameters and use them
int ret;
NetservNslpCommunicator::command cmd;
ret = communicator->recvCommandFromJava(&cmd);
if (ret <= 0) {
    logSocketError(sockfd, "Command");
    // free up the memory allocated
    delete data;
    return;
}

switch(cmd){
case NetservNslpCommunicator::tg_setup_message:
    DLog(param.name, "cmd set: setup");
    break;
case NetservNslpCommunicator::tg_remove_message:
    DLog(param.name, "cmd set: remove");
    break;
case NetservNslpCommunicator::tg_probe_message:
    DLog(param.name, "cmd set: probe");
    break;
case NetservNslpCommunicator::tg_trigger_message:
    DLog(param.name, "cmd set: trigger");
    break;
}
ret = communicator->recvIPFromJava(&(data->destAddr));
DLog(param.name, "Dst Address set: "<< data->destAddr.get_ip_str());
if (ret <= 0) {
    logSocketError(sockfd, "Destination IP");
    // free up the memory allocated
    delete data;
    return;
}

[...]
int reliable = communicator->recvIntFromJava();
data->reliability = (reliable == NetservNslpCommunicator::TCP);
DLog(param.name, "Reliability set : "<< data->reliability);
int secure = communicator->recvIntFromJava();
data->security = (secure == NetservNslpCommunicator::TCP);
DLog(param.name, "Security set : "<< data->security);

data->dataId = communicator->recvStringFromJava();
DLog(param.name, "DataId : "<< data->dataId);
if (data->dataId == NULL) {
    logSocketError(sockfd, "dataId");
    // free up the memory allocated
    delete data;
    return;
}
data->user = communicator->recvStringFromJava();
DLog(param.name, "User : "<< data->user);
if (data->user == NULL) {
    logSocketError(sockfd, "user");
    // free up the memory allocated
    delete data;
    return;
}

//Receiving OffPath parameters
data->offpath_type=communicator->recvIntFromJava();
DLog(param.name, "OffType : "<< data->offpath_type);
if(data->offpath_type != -1){

    data->metric_type=communicator->recvIntFromJava();
    DLog(param.name, "MetricType : "<< data->metric_type);
    if(data->metric_type>3|| data->metric_type<1){
        logSocketError(sockfd, "metric type");
        // free up the memory allocated
        delete data;
        return;
    }
    char * tmpStr = communicator->recvStringFromJava();
    if (tmpStr == NULL) {
        logSocketError(sockfd, "metric");
        // free up the memory allocated
        delete data;
        return;
    }
    data->metric = tmpStr;
    DLog(param.name, "MetricValue : "<< data->metric);
    DLog(param.name, "MetricLength : "<< data->metric.length());
}

// check if socket is still alive or some errors occured
if (!communicator->isAlive(sockfd)) {
    logSocketError(sockfd, "Socket not alive!");
    // free up the memory allocated
    delete data;
    return;
}
DLog(param.name,"Reading command-specific configuration");
switch(cmd)
{
case NetservNslpCommunicator::tg_setup_message:
    data->urlList.push_back(communicator->recvString());
    //check if the service data is exchanged together with signaling messages
    if (data->urlList.front() != NULL && (strncmp(data->urlList.front(), "file://", 7) == 0))
        data->data_included = true;
    data->lifetime = communicator->recvIntFromJava();
    data->setupParams = communicator->recvStringFromJava();
    break;
case NetservNslpCommunicator::tg_remove_message:
    break;
case NetservNslpCommunicator::tg_probe_message:
{
    DLog(param.name, "Reading probe codes list.");
    short probe = 0;
    do {
        probe = communicator->recvShortFromJava();
        DLog(param.name,"Probe Code " << probe);
        data->probes.push_back(probe);
    } while (probe != 0);
    data->probes.pop_back(); //delete the last 0
    if (data->probes.empty()) {
        logSocketError(sockfd, "Probe list is empty!");
        return;
    }
    break;
}
case NetservNslpCommunicator::tg_trigger_message:
    data->triggerType = communicator->recvInt();

    switch (data->triggerType){
    case NETSERV_MESSAGETYPE_SETUP:
        data->urlList.push_back(communicator->recvString());
        data->lifetime = communicator->recvInt();
        data->setupParams = communicator->recvString();
        break;
    case NETSERV_MESSAGETYPE_REMOVE:
        break;
    case NETSERV_MESSAGETYPE_PROBE:
    {
        short probe = 0;
        do {
            probe = communicator->recvShortFromJava();
            data->probes.push_back(probe);
        } while (probe != 0);
        data->probes.pop_back(); //delete the last 0
        break;
    }
    default:
        ERRLog(param.name, "Trigger type not supported");
        closeSocket(sockfd);
        return;
    }
    break;
    default:
        logSocketError(sockfd, "Trigger type not supported!");
        return;
}
DLog(param.name,"Reading optional parameters.");
// Optional parameters passing
bool addParam = 0;
addParam = communicator->recvIntFromJava();
if (addParam) {
    data->signature = communicator->recvStringFromJava();
    if (data->signature == NULL) {
        logSocketError(sockfd, "signature");
        // free up the memory allocated
        delete data;
        return;
    }
    DLog(param.name, "Message signature : "<< data->signature);
}

addParam = communicator->recvIntFromJava();
if (addParam) {
    data->depList.push_back(communicator->recvStringFromJava());
    if (data->depList.front() == NULL) {
        logSocketError(sockfd, "dependency list");
        // free up the memory allocated
        delete data;
        return;
    }
    DLog(param.name, "Message dependency list : "<< data->depList.front());
}

addParam = communicator->recvIntFromJava();
if (addParam) {
    data->notification = communicator->recvStringFromJava();
    if (data->notification == NULL) {
        logSocketError(sockfd, "notification");
        // free up the memory allocated
        delete data;
        return;
    }
    DLog(param.name, "Message notification : "<< data->notification);
}

addParam = communicator->recvIntFromJava();
if (addParam) {
    data->node = communicator->recvStringFromJava();
    if (data->node == NULL) {
        logSocketError(sockfd, "node");
        // free up the memory allocated
        delete data;
        return;
    }
    DLog(param.name, "Node destination : "<< data->node);
}
[...]

The communicator wraps the socket and uses standard calls to write and read types:

int NetservNslpCommunicator::recvCommandFromJava(NetservNslpCommunicator::command * cmd){
    int code = recvIntFromJava();
    cout<<"received int "<<code<<endl;
    if(code>=0){
        switch(code){
        case 0:
            *cmd=NetservNslpCommunicator::tg_setup_message;
            break;
        case 1:
            *cmd=NetservNslpCommunicator::tg_remove_message;
            break;
        case 2:
            *cmd=NetservNslpCommunicator::tg_trigger_message;
            break;
        case 3:
            *cmd=NetservNslpCommunicator::tg_probe_message;
            break;
        }
    }
    return code;
}

int NetservNslpCommunicator::recvIPFromJava(protlib::hostaddress * addr){
    cout<<"receiving an IP"<<endl;
    char* str = recvStringFromJava();
    cout<<"String received "<< str << endl;
    addr->set_ipv4(str);
    return 1;
}

char * NetservNslpCommunicator::recvStringFromJava(){
    short length = recvShortFromJava();
    cout<< "receiving a string..."<<endl<<"String length "<<length<<endl;
    char * string = new char[length];
    int r = 0;
    int orLength=length;
    while(length)
        {
            int r = recv(sock, string, length, 0);
            if(r <= 0)
                break; // Socket closed or an error occurred
            length -= r;
        }
    string[orLength]='\0';

    if(orLength==0)
        return NULL;
    else
        return string;
}

int NetservNslpCommunicator::recvIntFromJava(){
    int x = 0;
    recvBuffer(sock, &x, 4);
    return x;
}

short NetservNslpCommunicator::recvShortFromJava()
{
    short x = 0;
    recvBuffer(sock, &x, 2);
    return x;
}


int NetservNslpCommunicator::recvBuffer(int sock, void * buf, size_t size)
{
    int counter = 0;
    // Create a pollfd struct for use in the mainloop
    struct pollfd poll_fd;
    poll_fd.fd = sock;
    poll_fd.events = POLLIN | POLLPRI;
    poll_fd.revents = 0;   

    int r;
    while (size && !stop)
    {
        /* Non-blocking behavior */
        // wait on number_poll_sockets for the events specified above for sleep_time (in ms)
        int poll_status = poll(&poll_fd, 1/*Number of poll socket*/, 100);
        if (poll_fd.revents & POLLERR) // Error condition
        {
            if (errno != EINTR)
                cout << "NetservNslpCommunicator : " << "Poll caused error " << strerror(errno) << " - indicated by revents" << endl;
            else
                cout << "NetservNslpCommunicator : " << "poll(): " << strerror(errno) << endl;
        }
        //ignore hangups when reading from a socket
        if (poll_fd.revents & POLLHUP) // Hung up
        {
            cout << "NetservNslpCommunicator : " << "Poll hung up" << endl;
            //          return -1;
        }
        if (poll_fd.revents & POLLNVAL) // Invalid request: fd not open
        {
            cout << "NetservNslpCommunicator : " << "Poll Invalid request: fd not open" << endl;
            return -1;
        } 

        switch (poll_status)
        {
        case -1:
            if (errno != EINTR)
                cout << "NetservNslpCommunicator : " << "Poll status indicates error: " << strerror(errno) << endl;
            else
                cout << "NetservNslpCommunicator : " << "Poll status: " << str error(errno) << endl;
            break;

        case 0:

            if (isTriggerTimerEnabled){
                counter++;
                if (counter == triggerTimerValue){
                    isTriggerTimerEnabled = false;
                    return -1;
                }
            }
            continue;
            break; 

        default:
            r = recv(sock, buf, size, 0);
            if (r <= 0)
            {
                if (r == 0) { // connection closed
                    r = -1; // return an error if socket closes
                    cout << "NetservNslpCommunicator : " << "No data received from socket!" << endl;
                    stop=true;
                    break;
                }
                if (r == -1 && errno == EINTR) // received interrupt during recv, continuing
                    continue;
                if (r == -1 && errno != EINTR) // socket error, raise exception
                    break;
            }

            if (r != -1)
                size -= r;
            break;
        }
    }

    counter = 0;
    isTriggerTimerEnabled = false;
    return r;
}

I ask you to focus only on the tg_probe_message part. The other messages are still to implement.

The strange behaviour is: the first time the client sends a request to the server everything goes well, all values are read perfectly. Hence the server answers sending back some integer and a sequences of strings. This is a trace (application layer only. One TCP packet per line) of what i capture on the socket:

00  //
00  //
00  //
03  // First integer

00  //
0a  // Short representing string length

31:37:32:2e:31:36:2e:33:2e:32  //the string: "172.16.3.2"

00
00
00
01

00
00
00
00

00
1b

4e:65:74:53:65:72:76:2e:61:70:70:73:2e:4c:6f:69:62:46:61:6b:65:5f:30:2e:30:2e:31 //The string "NetServ.apps.LoibFake_0.0.1"

00
03

6a:61:65 //the string "jae"

00
00
00
03

00
00
00
01

00
01

31 //the string "1"

00
02

00
00

00 //
00 //
00 // 4 times
00 //

The server answers:

00:00:00:01 //response code
00:00:00:02 //response type
00:00:00:04 //number of strings to read
00:12 //str length
31:30:2e:31:30:2e:30:2e:35:20:41:43:54:49:56:45:20:31
00:12 //str length
31:30:2e:31:30:2e:30:2e:34:20:41:43:54:49:56:45:20:31
00:12 //str length
31:30:2e:31:30:2e:30:2e:33:20:41:43:54:49:56:45:20:32
00:12 //str length
31:30:2e:31:30:2e:30:2e:36:20:41:43:54:49:56:45:20:32

The second time the client sends a request (the same request) something weird occurs. This is what i captured with tcpdump during the second connection:

00  //
00  //
00  //
03  // First integer

00  //
0a  // Short representing string length

31:37:32:2e:31:36:2e:33:2e:32  //the string: "172.16.3.2"

00
00
00
01

00
00
00
00

00:1b:4e:65:74:53:65:72:76:2e:61:70:70:73:2e:4c:6f:69:62:46:61:6b:65:5f:30:2e:30:2e:31:00:03:6a:61:65:00:00:00:03:00:00:00:01:00:01:31:00:02:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00:00

With a little of patience you can recognise that the last packet contains ALL the bits of the request (same bits of the first request).

With some debuggin I can see that the command communicator->recvCommandFromJava(&cmd) returns the number 50331648 (03:00:00:00) instead of 3 (00:00:00:03) and when the command communicator->recvIPFromJava(&(data->destAddr)) is executed, which in turn calls the recvStringFromJava(), which uses the recvShortFromJava(), the short representing the string length 00:0a (10) is swapped into the little-endian 0a:00 (2560). I believe this causes the tcp to put all the data available in the next packet and to spoil the subsequent calls.

As you can see from the code I didn't adopted conversion from host-order to net-order in the server (and that is because it works fine for the first request), but it seems that conversion is required during the second request. The documentation on DataOutputStream specifies that int and short are written in big-endian. The server does not apply conversion.

Hence, in the end, this is the question: Is it possible that C++ could change the Host-Format during execution? How could this possibly happen? What I can do to have predicible behaviour on the byte ordering between java client and c++ server?

2 Answers2

0

Endian-ness has nothin to do with putting the data in the next packet. That's just because it's a byte stream protocol.

You have two separate problems to solve: one with ntohl() and friends, the other by continuing to read until you have all the data you're expecting.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • I understand your point. It is all right but the fact that the last packet contains all the data is due to the fact that the server reads a _short_ to know the length of the subsequent string. The NTOHL() problem swap the bytes of this short so the server tries to read 2560 bytes instead of 10. This triggers the TCP stack to pack the data into a single packet, as it does with the other strings. At least this is the explanation I give to the second problem. So I strongly believe they are linked. Do you agree with me? – Dario Valocchi Sep 25 '14 at 13:28
  • And I repeat it to be as clear as possible. The ntohl and ntohs problem occurs only during the second request. – Dario Valocchi Sep 25 '14 at 13:56
0

I found a solution to my problem that works and I think is elegant enough. Because I can't predict the behaviour of the server when it reads primitive types large more than one byte, I use a standard contract mechanism. Every time a client wants to push commands to the server it sends a known Integer code. The server reads the integer and checks if the value is equal to the predeterminated Integer, than it can read all the values without reordering them. Otherwise it will set a flag and will read all the subsequent values swapping them with the function ntohl() and ntohs().