TLVs and sub-TLVs over websockets or what really to use

Question

I have service that is utilizing web sockets in order to transfer data.

I need a way to encode a tree structure and transfer that tree structure over the websocket. I have been reading about the TLV and sub-TLV encoding which seems to be a great idea i.e. it is already used in the protocols such as Radius, LLDP which proves that this is working however my problems is that those protocols are usually used between a trusted devices i.e. between switches/routers (except the LLDP). My problems is that i will be transferring TLV that includes sub-TLVs that do have a random size/length as well as they do not have a statically defined structure for example if you take a look into the in the first TLV where the concept of the sub-TLV's have been defined i.e.

Extended IS Reachability TLV #22 then you will see that this tlv has a structure like this:

  /* +-------+-------+-------+-------+-------+-------+-------+-------+
   * |                        Type                                   | 1
   * +---------------------------------------------------------------+
   * |                        Length ID                              | 1
   * +---------------------------------------------------------------+
   * |                        Neighbour ID                           | 7
   * +---------------------------------------------------------------+
   * |                        TE Metric                              | 3
   * +---------------------------------------------------------------+
   * |                        SubTLVs Length                         | 1
   * +---------------------------------------------------------------+
   * |                        SubTLVs value                          | variable
   * +---------------------------------------------------------------+
   * :                                                               :
   */

By structure i mean that i has predefined 7 byte Neighbour ID, and 3 bytes Metric , 1 byte SubTLVs Length and only then comes the variable part but at least you have some bits that are defined in advanced and can not change.

Now by reading some books (mainlly H Gredler The Complete IS-IS Routing Protocol 2005 - page 296) i found 4 techniques of validating those TLV's i.e.

1) Maximum Length Checking

2) Sub-TLV Overrun Checking

3) Discrete Length Checking

4) TLV Content Pattern Checking

I can not trust what is comming from the user at all, but i have two other problems i.e. how to validate TLV where the value is a) random length/size i.e. i do have a range that value could have i.e. not smaller than 1 byte not larger then 700 kb and b) i can not execute any pattern checking on that value since it is encrypted i.e. not in readable form, thus can not execute any pattern on it.

Thus my question is: How can achieve the same goal i.e. sending tree structure within some other structure i.e. maybe key value pairs or something similar (http is using this one, there should be a reason for that).

Is the TLV way really the best choice for transferring data in the tree structure. I know that if i send that in a binary form that i will hit two rabbits with one shot i.e. when sending files such as pictures i will not need to use some funny base64 encoding etc. but what I really want is a protocol that works on L5-7 (i.e. over websocket) that will allow be to send a data in the form of a tree structure where the receiving part will be able to identify and reassemble the tree, without having to think about serialization and de-serialization parts. So what would be the second best alternative ot TLVs?, taking into account that i am working with java on one side and javasctipt on the other.

If you already established your TLV message structure, why not send binary data over the websockets? (Websockets will wrap and unwrap the data in it's own protocol, but you will get the original binary data after the unwrapping). — Myst, Dec 15 '15 at 16:42
because i do not trust the data that is coming from the user i.e. how can i validate this tlv at all. I.e. it will other sub-tlv's where i do not know their size, basically if i would validate them somehow. Thus i am looking for the second best alternative of the tlv — Tito, Dec 15 '15 at 18:49
Hmm... I'm not sure what the actual issue might be... Can't you just hang up the connection when there's a parsing error? — Myst, Dec 15 '15 at 20:17
see that is exactly my issue, how do i recognize that there is a parsing error. I have mentioned 4 different ways to parts the TLV's in my question, but since the value is encrypted i.e. the part of the tlv can not be read/patterned/or matched it is always a variable value, now how do you validate something like that, this is the only reason i am intending to use some other structure that allows be to pass a tree over the websocket but also allows for strict checking i.e. if that is key value pair in has the carriage return or line feed i.e. i know where each "sub-TLV" ends. — Tito, Dec 15 '15 at 21:26
basically all i am trying to say is that in my case from those 4 validation techniques mentioned for the TLV i end up with only one left i.e. only the length checking, which gives me a feeling that problems might occur in the feature. That is the reason i am asking this question. — Tito, Dec 15 '15 at 21:28
If you're afraid that the protocol is reading invalid data, then length checking isn't very effective for data validation. Consider adding a fixed size validation field using known techniques such as data hashing of first-last byte review (good for protocol validity but not data integrity). — Myst, Dec 16 '15 at 00:14
@Myst can you explain why adding fix size validation field i.e. checksum of the whole tlv for example will not be good for data integrity. I think that might be a very good idea. — Tito, Dec 16 '15 at 09:00
@Myst can you please comment on why data hashing of the first-last byte is "(good for protocol validity but not data integrity)." please supply an example — Tito, Dec 17 '15 at 06:46

Myst · Answer 1 · 2015-12-17T10:12:17.760

this is an answer to the question in the comments, not the original question.

There are a number of options that will allow you to check for protocol errors:

Since your type is using a while byte (and you probably don't have 256 different types), you can check that type is valid. If it isn't, there's a protocol error.
You can add a two byte field after the variable length value field, with the first and last bytes of the value field. If these don't match, there's a protocol error; OR
You can add a fixed size MD5 hash field after the value, and check the content of the value against the hash value. If the data isn't valid, there's a protocol error (or middle man attack).

Using a Hash (checksum), as in option 3 (I suggested MD-5, but any checksum method could be great) is a good way to review both data integrity and protocol conformity.

Using first and last byte review (option 2) will only check protocol conformity, but not data integrity.

Validating the type field (option 1) is a simple way to review protocol integrity, but it is error prone (as random data is more likely to seem valid).

Checking the length of the data received against the actual data's length (options 1-3 in the question) will be error prone and it's likely to cause errors. This is because excess data might be considered to be part of the next "frame", so that an error will be revealed only if data is missing and no more data is being received.

Edit: More details regarding option 2 vs. 3

When using first and last byte validations, there should be ~ 1:65,536 chance of random data passing this validation test per frame (which is probably a good enough test for non-critical single frame data).

Also, if your data tree contains a number of "data field" values, that number grows super fast. (a 4 data field tree have only a 1:2^64 chance of validity when random data is supplied.

i.e.

Assume the data is 1 byte long and the byte value is 10 (0A in Hex).

Both the first byte and the last byte equal 0A, meaning the validation field must be a two byte value of 0A0A. This is the only acceptable value out of 2^16 options.

BUT, it does not promise data integrity.

Let's assume the client is connected to our server using evilproxy.com's services...

If the value was a bank account number with the value 100001, evilproxy.com could change the account number to any value, as long as the first and last digits are the same (i.e. 199991), allowing evilproxy.com to change the data while the protocol's message would still remain valid.

On the other hand, using a checksum such as MD5 hush, would mean that the checksum's value would change when the account number is manipulated in this example.

MD5 uses 128 bits (16 bytes), meaning that a random validation field value has a 1:2^128 chance of "hitting the mark" (That's a negligible chance).

On the other hand, if evilproxy.com knows the "salt" of algorithm you use for the checksum, this field could be updated along-side with the data.

Hence, if you expect to communicate more sensitive data, it's better to consider a checksum.

i agree with the answer except the part number 3 i.e. there where you mentioned "middle man attack" , if the data isn't valid can not mean that the is a middle man occurring. I think those two are unrelated. — Tito, Dec 17 '15 at 06:52
@tito, I agree that if the data isn't valid, it does not mean that there is necessarily a man in the middle attack... but it is an option and the checksum allows for more security agains man in the middle attacks. You can see my updated answer. — Myst, Dec 17 '15 at 10:14

TLVs and sub-TLVs over websockets or what really to use

1 Answers1

this is an answer to the question in the comments, not the original question.

Edit: More details regarding option 2 vs. 3