5

I am writing a program for creating, sending, receiving and interpreting ARP packets. I have a structure representing the ARP header like this:

struct ArpHeader
{
    unsigned short hardwareType;
    unsigned short protocolType;
    unsigned char hardwareAddressLength;
    unsigned char protocolAddressLength;
    unsigned short operationCode;
    unsigned char senderHardwareAddress[6];
    unsigned char senderProtocolAddress[4];
    unsigned char targetHardwareAddress[6];
    unsigned char targetProtocolAddress[4];
};

This only works for hardware addresses with length 6 and protocol addresses with length 4. The address lengths are given in the header as well, so to be correct the structure would have to look something like this:

struct ArpHeader
{
    unsigned short hardwareType;
    unsigned short protocolType;
    unsigned char hardwareAddressLength;
    unsigned char protocolAddressLength;
    unsigned short operationCode;
    unsigned char senderHardwareAddress[hardwareAddressLength];
    unsigned char senderProtocolAddress[protocolAddressLength];
    unsigned char targetHardwareAddress[hardwareAddressLength];
    unsigned char targetProtocolAddress[protocolAddressLength];
};

This obviously won't work since the address lengths are not known at compile time. Template structures aren't an option either since I would like to fill in values for the structure and then just cast it from (ArpHeader*) to (char*) in order to get a byte array which can be sent on the network or cast a received byte array from (char*) to (ArpHeader*) in order to interpret it.

One solution would be to create a class with all header fields as member variables, a function to create a byte array representing the ARP header which can be sent on the network and a constructor which would take only a byte array (received on the network) and interpret it by reading all header fields and writing them to the member variables. This is not a nice solution though since it would require a LOT more code.

In contrary a similar structure for a UDP header for example is simple since all header fields are of known constant size. I use

#pragma pack(push, 1)
#pragma pack(pop)

around the structure declaration so that I can actually do a simple C-style cast to get a byte array to be sent on the network.

Is there any solution I could use here which would be close to a structure or at least not require a lot more code than a structure? I know the last field in a structure (if it is an array) does not need a specific compile-time size, can I use something similar like that for my problem? Just leaving the sizes of those 4 arrays empty will compile, but I have no idea how that would actually function. Just logically speaking it cannot work since the compiler would have no idea where the second array starts if the size of the first array is unknown.

PlanckMax
  • 53
  • 1
  • 4
  • 3
    If the max address size is 6, can't you make the arrays of size [6] and then interpret them accordingly? That's the easiest solution if you want to avoid a lot of code. Another option would be to use one big array of fixed length for all addresses and write a function that would prepare a byte array based on the addresses' lengths – Adam Kosiorek Sep 19 '14 at 16:08
  • Zero length arrays or flexible array members are not valid C++. – pmr Sep 19 '14 at 16:09
  • 1
    why not to use a std::string or std::vector as a structure member? And if you use a struct, why you don't give them functionality? Dividing code into data and program is exactly the opposite of object oriented programming. Your question it self sounds like a design failure! – Klaus Sep 19 '14 at 16:15
  • Well, the interpretation of the memory block coming after `operationCode` depends on `protocolType` (IPv4/IPv6) actually, right. I'd recommend to just put an opaque placeholder pointer there and interpret the rest as two additional structures for the mac and the IP addresses. Similar as handled in the [`netinet/in.h`](http://pubs.opengroup.org/onlinepubs/009695399/basedefs/netinet/in.h.html) structure definitons. – πάντα ῥεῖ Sep 19 '14 at 16:27
  • 3
    Wouldn't overloading `operator char*` and `ArpHeader(char* data)` suit your needs? Did you already try it? AFAIK then the real underlying structure becomes irrelevant. – Paweł Stawarz Sep 19 '14 at 16:36
  • @PawełStawarz That sounds like a good idea for c++. – πάντα ῥεῖ Sep 19 '14 at 16:37
  • The whole point of using a structure was to be able to simply cast it to a byte array to avoid a long conversion function and getter and setter methods for the header fields, similar to the way a UDP header structure would be used. – PlanckMax Sep 19 '14 at 19:49

3 Answers3

9

You want a fairly low level thing, an ARP packet, and you are trying to find a way to define a datastructure properly so you can cast the blob into that structure. Instead, you can use an interface over the blob.

struct ArpHeader {
    mutable std::vector<uint8_t> buf_;

    template <typename T>
    struct ref {
        uint8_t * const p_;
        ref (uint8_t *p) : p_(p) {}
        operator T () const { T t; memcpy(&t, p_, sizeof(t)); return t; }
        T operator = (T t) const { memcpy(p_, &t, sizeof(t)); return t; }
    };

    template <typename T>
    ref<T> get (size_t offset) const {
        if (offset + sizeof(T) > buf_.size()) throw SOMETHING;
        return ref<T>(&buf_[0] + offset);
    }

    ref<uint16_t> hwType() const { return get<uint16_t>(0); }
    ref<uint16_t> protType () const { return get<uint16_t>(2); }
    ref<uint8_t> hwAddrLen () const { return get<uint8_t>(4); }
    ref<uint8_t> protAddrLen () const { return get<uint8_t>(5); }
    ref<uint16_t> opCode () const { return get<uint16_t>(6); }

    uint8_t *senderHwAddr () const { return &buf_[0] + 8; }
    uint8_t *senderProtAddr () const { return senderHwAddr() + hwAddrLen(); }
    uint8_t *targetHwAddr () const { return senderProtAddr() + protAddrLen(); }
    uint8_t *targetProtAddr () const { return targetHwAddr() + hwAddrLen(); }
};

If you need const correctness, you remove mutable, create a const_ref, and duplicate the accessors into non-const versions, and make the const versions return const_ref and const uint8_t *.

jxh
  • 69,070
  • 8
  • 110
  • 193
3

Short answer: you just cannot have variable-sized types in C++.

Every type in C++ must have a known (and stable) size during compilation. IE operator sizeof() must give a consistent answer. Note, you can have types that hold variable amount of data (eg: std::vector<int>) by using the heap, yet the size of the actual object is always constant.

So, you can never produce a type declaration that you would cast and get the fields magically adjusted. This goes deeply into the fundamental object layout - every member (aka field) must have a known (and stable) offset.

Usually, the issue have is solved by writing (or generating) member functions that parse the input data and initialize the object's data. This is basically the age-old data serialization problem, which has been solved countless times in the last 30 or so years.

Here is a mockup of a basic solution:

class packet { 
public:
    // simple things
    uint16_t hardware_type() const;

    // variable-sized things
    size_t sender_address_len() const;
    bool copy_sender_address_out(char *dest, size_t dest_size) const;

    // initialization
    bool parse_in(const char *src, size_t len);

private:    
    uint16_t hardware_type_;    
    std::vector<char> sender_address_;
};

Notes:

  • the code above shows the very basic structure that would let you do the following:

    packet p;
    if (!p.parse_in(input, sz))
        return false;
    
  • the modern way of doing the same thing via RAII would look like this:

    if (!packet::validate(input, sz))
        return false;
    
    packet p = packet::parse_in(input, sz);  // static function 
                                             // returns an instance or throws
    
os_
  • 122
  • 4
  • I understand how object oriented programming including classes etc work. The purpose of this post was to find out if there is a quicker way with less code to write an ARP header object like I would write a UDP header struct which can essentially be cast to a byte array in order to avoid long conversion functions. But thank you anyways for clarifying this. – PlanckMax Sep 19 '14 at 19:54
2

If you want to keep access to the data simple and the data itself public, there is a way to achieve what you want without changing the way you access data. First, you can use std::string instead of the char arrays to store the addresses:

#include <string>
using namespace std; // using this to shorten notation. Preferably put 'std::'
                     // everywhere you need it instead.
struct ArpHeader
{
    unsigned char hardwareAddressLength;
    unsigned char protocolAddressLength;

    string senderHardwareAddress;
    string senderProtocolAddress;
    string targetHardwareAddress;
    string targetProtocolAddress;
};

Then, you can overload the conversion operator operator const char*() and the constructor arpHeader(const char*) (and of course operator=(const char*) preferably too), in order to keep your current sending/receiving functions working, if that's what you need.

A simplified conversion operator (skipped some fields, to make it less complicated, but you should have no problem in adding them back), would look like this:

operator const char*(){
    char* myRepresentation;
    unsigned char mySize
            = 2+ senderHardwareAddress.length()
            + senderProtocolAddress.length()
            + targetHardwareAddress.length()
            + targetProtocolAddress.length();

    // We need to store the size, since it varies
    myRepresentation = new char[mySize+1];
    myRepresentation[0] = mySize;
    myRepresentation[1] = hardwareAddressLength;
    myRepresentation[2] = protocolAddressLength;

    unsigned int offset = 3; // just to shorten notation
    memcpy(myRepresentation+offset, senderHardwareAddress.c_str(), senderHardwareAddress.size());
    offset += senderHardwareAddress.size();
    memcpy(myRepresentation+offset, senderProtocolAddress.c_str(), senderProtocolAddress.size());
    offset += senderProtocolAddress.size();
    memcpy(myRepresentation+offset, targetHardwareAddress.c_str(), targetHardwareAddress.size());
    offset += targetHardwareAddress.size();
    memcpy(myRepresentation+offset, targetProtocolAddress.c_str(), targetProtocolAddress.size());

    return myRepresentation;
}

While the constructor can be defined as such:

ArpHeader& operator=(const char* buffer){

    hardwareAddressLength = buffer[1];
    protocolAddressLength = buffer[2];

    unsigned int offset = 3; // just to shorten notation
    senderHardwareAddress = string(buffer+offset, hardwareAddressLength);
    offset += hardwareAddressLength;
    senderProtocolAddress = string(buffer+offset, protocolAddressLength);
    offset += protocolAddressLength;
    targetHardwareAddress = string(buffer+offset, hardwareAddressLength);
    offset += hardwareAddressLength;
    targetProtocolAddress = string(buffer+offset, protocolAddressLength);

    return *this;
}
ArpHeader(const char* buffer){
    *this = buffer; // Re-using the operator=
}

Then using your class is as simple as:

ArpHeader h1, h2;
h1.hardwareAddressLength = 3;
h1.protocolAddressLength = 10;
h1.senderHardwareAddress = "foo";
h1.senderProtocolAddress = "something1";
h1.targetHardwareAddress = "bar";
h1.targetProtocolAddress = "something2";

cout << h1.senderHardwareAddress << ", " << h1.senderProtocolAddress
<< " => " << h1.targetHardwareAddress << ", " << h1.targetProtocolAddress << endl;

const char* gottaSendThisSomewhere = h1;
h2 = gottaSendThisSomewhere;

cout << h2.senderHardwareAddress << ", " << h2.senderProtocolAddress
<< " => " << h2.targetHardwareAddress << ", " << h2.targetProtocolAddress << endl;

delete[] gottaSendThisSomewhere;

Which should offer you the utility needed, and keep your code working without changing anything out of the class.

Note however that if you're willing to change the rest of the code a bit (talking here about the one you've written already, ouside of the class), jxh's answer should work as fast as this, and is more elegant on the inner side.

Paweł Stawarz
  • 3,952
  • 2
  • 17
  • 26
  • Thank you, this is a totally valid solution to the problem, but like you already mentioned jhx's answer is more elegant on a lower level and it is actually kind of what I was looking for since it requires only little code. – PlanckMax Sep 19 '14 at 19:58