Dereferencing Microdata item type URLs: "should not" vs. "must not"

Question

In W3C’s HTML Microdata, it says (and it’s currently the same in WHATWG’s HTML Living Standard):

Except if otherwise specified by that specification, the URLs given as the item types should not be automatically dereferenced.

Note: A specification could define that its item type can be derefenced to provide the user with help information, for example. In fact, vocabulary authors are encouraged to provide useful information at the given URL.

And this is directly followed by:

Item types are opaque identifiers, and user agents must not dereference unknown item types, or otherwise deconstruct them, in order to determine how to process items that use them.

I’m confused about this. In the first paragraph it says that URLs in the itemtype attribute "should not be automatically dereferenced" (should, not must; so according to this paragraph, user agents are allowed to dereference). But in the last paragraph it says that user agents "must not dereference unknown item types".

Is this a contradiction or do they mean something different?

Maybe it is only about known vs. unknown (although in the first paragraph it doesn’t mention "known" at all, so I’d assume it applies to all vocabularies, whether known or not)? But why should it make a difference if a user agents knows a vocabulary? And what exactly means it to "know" a vocabulary in the first place?

Or maybe the "in order to determine how to process items that use them" part is the crux of the matter here? So user agents are allowed to dereference for any reason except if they try to determine how to process the items?

score 0 · Answer 1 · answered Sep 30 '14 at 00:22

I think, but I do not know for sure ...

The distinction is known vs unknown. As for what "known" means, remember that this is just data, and user agents are not necessarily browsers. For example, a particular data set could, in theory at least, be interpreted to control real world machinery.

The first part is saying, if the UA knows the data type, then it shouldn't need to dereference it because the UA will always know what the resource obtained by that dereference will be. So it's just network traffic overhead. The same as UAs shouldn't dereference DTDs because they should already know what the DTD resource will contain. It's a should because it's impossible to say that for an arbitrary known-to-the-UA data type, there is no circumstance where a dereference might yield a useful result.

The later part is saying, if the UA doesn't know the data type, there's no protocol defined by which dereferencing will yield a meaningful resource, so the UA would at best merely be guessing. There is no value to any system to do the dereference and some network cost, so it must not do it.

Dereferencing Microdata item type URLs: "should not" vs. "must not"

1 Answers1