C and Erlang: Erlang Port example

Question

Disclaimer: The author of the question has an average knowledge of Erlang and a basic knowledge of C.

I am reading the Interoperability Tutorial User Guide now. I have successfully compiled the complex.c example and it works with the Erlang Port without any problems.

However, I would like to understand how the actual C code works. I understand it in general: in the example it reads 2 bytes from the standard input and checks the first byte. Depending on the first byte it calls either foo or bar function. This is the limit of my understanding of it right now.

So, if we take both erl_comm.c:

/* erl_comm.c */

typedef unsigned char byte;

read_cmd(byte *buf)
{
  int len;

  if (read_exact(buf, 2) != 2)
    return(-1);
  len = (buf[0] << 8) | buf[1];
  return read_exact(buf, len);
}

write_cmd(byte *buf, int len)
{
  byte li;

  li = (len >> 8) & 0xff;
  write_exact(&li, 1);

  li = len & 0xff;
  write_exact(&li, 1);

  return write_exact(buf, len);
}

read_exact(byte *buf, int len)
{
  int i, got=0;

  do {
    if ((i = read(0, buf+got, len-got)) <= 0)
      return(i);
    got += i;
  } while (got<len);

  return(len);
}

write_exact(byte *buf, int len)
{
  int i, wrote = 0;

  do {
    if ((i = write(1, buf+wrote, len-wrote)) <= 0)
      return (i);
    wrote += i;
  } while (wrote<len);

  return (len);
}

and port.c:

/* port.c */

typedef unsigned char byte;

int main() {
  int fn, arg, res;
  byte buf[100];

  while (read_cmd(buf) > 0) {
    fn = buf[0];
    arg = buf[1];

    if (fn == 1) {
      res = foo(arg);
    } else if (fn == 2) {
      res = bar(arg);
    }

    buf[0] = res;
    write_cmd(buf, 1);
  }
}

What does each function actually do there? What purpose do li, len, i, wrote, got variables actually serve?

Some more small questions:

Why do not the functions have any return types, even voids?
When Erlang port sends data to C, the first byte determines a function to be called. If the byte holds the decimal 1, then foo() is called, if the byte holds the decimal 2, then bar() is called. If not changed anyhow this protocol can be used to call up to 255 different C functions with only 1 parameter each. Is that right?
"Adding the length indicator will be done automatically by the Erlang port, but must be done explicitly in the external C program". What does that mean? On which line of code is it done?
From the Tutorial: "By default, the C program should read from standard input (file descriptor 0) and write to standard output (file descriptor 1)." Then: "Note that stdin and stdout are for buffered input/output and should not be used for the communication with Erlang!" What is the catch here?
why buf is initialized to [100]?

Inaimathi · Accepted Answer · 2012-05-08T03:15:40.720

This answer is likewise disclaimed (I'm not an Erlang or C programmer, I just happen to be going through the same material)

Your initial model is a bit off. The way the code actually works is by reading the first two bytes from stdin, assuming that it signifies the length of the actual message, then reading that many more bytes from stdin. In this specific case, it happens that the actual message is always two bytes (a number corresponding to a function and a single integer argument to pass to it).

0 - a) read_exact reads len bytes from stdin, read_cmd uses read_exact first to determine how many bytes it should read (either a number signified by the first two bytes, or none if there are fewer than two bytes available), and then to read that many bytes. write_exact writes len bytes to stdout, write_cmd uses write_exact to output a two byte length header, followed by a message (hopefully) of the appropriate length.

0 - b) I think len is sufficiently covered above. li is the name of the variable used to generate that two-byte header for the write function (I can't take you through the bit shift operations step by step, but the end result is that len is represented in the first two bytes sent). i is an intermediate variable whose main purpose seems to be making sure that write and read don't return an error (if they do, that error code is returned as the result of read_exact/write_exact). wrote and got keep track of how many bytes have been written/read, the containing loops exit before it becomes greater than len.

1 - I'm actually not sure. The versions I was working with are of type int, but otherwise identical. I got mine out of chapter 12 of Programming Erlang rather than the guide you link.

2 - That's correct, but the point of the port protocol is that you can change it to send different arguments (if you're sending arbitrary arguments, it would probably be a better idea to just use the C Node method rather than ports). As an example, I modified it subtly in a recent piece so that it sends a single string, since I only have one function I want to call on the C side, eliminating the need for specifying a function. I should also mention that if you have a system which needs to call more than 255 different operations written in C, you may want to rethink its' structure (or just go the whole nine and write it all in C).

3 - This is done

read_cmd(byte *buf)
{
  int len;

  if (read_exact(buf, 2) != 2)   // HERE
    return(-1);                  // HERE
  len = (buf[0] << 8) | buf[1];  // HERE
  return read_exact(buf, len);
}

in the read_cmd function and

write_cmd(byte *buf, int len)
{
  byte li;

  li = (len >> 8) & 0xff;        // HERE
  write_exact(&li, 1);           // HERE

  li = len & 0xff;               // HERE
  write_exact(&li, 1);           // HERE

  return write_exact(buf, len);
}

in the write_cmd function. I think the explanation is covered in 0 - a); that's a header that tells/finds out how long the rest of the message will be (yes, this means that it can only be a finite length, and that length must be expressible in two bytes).

4 - I'm not entirely sure why that would be a catch here. Care to elaborate?

5 - buf is a byte array, and has to be explicitly bounded (for memory management purposes, I think). I read "100" here as "a number larger than the maximum message size we plan to accommodate". The actual number picked seems arbitrary, it seems like anything 4 or higher would do, but I could stand to be corrected on that point.

Your assumption about `buf` is correct. It is an array, i.e., a contiguous memory area able to hold `n` elements of a specified type. In this scenario the memory as allocated on the stack. A different way of allocating the memory would be to make `buf` a pointer and allocate it on the heap using `malloc` (but then you must make sure to `free` the memory yourself when you're done with it). — Emil Vikström, May 08 '12 at 05:39
Regarding 4. First the tutorial says that the C program should read from stdin (file descriptor 0) and write to stdout (file descriptor 1). Then the tutorial says that stdin/stdout should not be used for the communication with Erlang". Isn't it a direct contradiction? They provide an example C program that uses stdin/stdout and then they say that it should no be used, because stdin/stdout is buffered. I guess I miss something here. — skanatek, May 09 '12 at 16:45
@MartinLee - Oh. True, I suppose. I took it to mean that you shouldn't use these ports to have back-and forth communication between Erlang and C processes (that's what the C Node thing is for). We're not though; Erlang is sending a single request and expecting a single `int` response per invocation of the program. I guess I may have misunderstood or misread. — Inaimathi, May 09 '12 at 16:50
I have read the exercise in Programming Erlang,Chapter 12 - it is much more friendly than the online tutorial; thanks for the reference. Regarding (0-a): "and then to read that many bytes" - is it this line: "len = (buf[0] << 8) | buf[1];"? How does it work? It looks like there is some kind of bitshift and bitlogic combination, but I do not understand why we need it to calculate len. Moreover, on page 215 of the book, the header length is explained, but I do not understand why we should use the first 2 bytes in 0,3,2,45,32 in order to get the length of the packet.Why cannot we use just 1 byte? — skanatek, May 09 '12 at 21:05
@MartinLee - How else would we calculate `len`? I guess you could change the protocol such that it uses one byte and lets it get interpreted literally as an unsigned byte, but that would leave you with a maximum message length of 255. The `2`-byte length is likely arbitrary; some idiosyncratic balance of ease-of-storage vs. message-length. — Inaimathi, May 10 '12 at 03:24
Ok, I get it about length. I have read the bitshift chapter in the K&R book and still do not understand the "len = (buf[0] << 8) | buf[1]" operation. Ok, this line calculates len, but how does it do that? (the only thing that I understand is that there is left bitshift and bitwise OR). And regarding the packet length (page 215 of the exercise): "The first two bytes encode the packet length. This is followed by the result, 77 (again 1-byte long)." So, the packet "0,2,77" has a header 0,2 which says that the length of the packet is 1. How these two bytes (0,2) tell us that the length is 1? — skanatek, May 13 '12 at 15:44
That has to do with how integers are represented at the byte level. The bitshift + bitwise OR convert two bytes into a single `int` value (though I think `0,2` would actually convert to 2, not 1). Ctd. — Inaimathi, May 13 '12 at 17:34
Think in terms of the bits involved. `0,2` -> `[00000000], [00000010]`. `len` is an integer, so it's actually 4 bytes; `[00000000:00000000:00000000:00000000]`. Doing `[00000000] << 8` shifts that to the left by 8 bits (in this case, it makes no difference, since the first bit is 0), giving you `[00000000:00000000]`. Doing `([00000000] << 8) | [00000010]` leaves you with [00000000:00000010], which is the bit representation of "the short integer 2". That's assigned to `len` with two bytes left over (so you could technically declare `len` as a `short integer` with no ill effects). — Inaimathi, May 13 '12 at 17:35
That was my own "non-C programmer" understanding, so take with grain of salt. It may also help to watch [this](http://youtu.be/jTSvthW34GU?t=9m14s). That's a ~5 minute segment wherein he talks about shorts specifically. There's also about 10 minutes of fairly useful examples at the [~20 minute mark](http://youtu.be/jTSvthW34GU?t=20m28s). — Inaimathi, May 13 '12 at 17:36

C and Erlang: Erlang Port example

1 Answers1

Linked