Uniform resource locator or universal resource locator
There is some confusion in the web community about URI space partitioning, specifically, about the relationship between the concepts of URL, URN and URI. The confusion is due to the incompatibility between two different views of URI partitioning, which we call the “classical” and “contemporary” views.
During the early years of discussion of web identifiers (early to mid-1990s), it was assumed that an identifier type would fall into one of two (or possibly more) classes. An identifier can specify the location of a resource (a URL) or its name (a URN) regardless of location. Thus, a URI can be either a URL or a URN. There was discussion of generalizing this by adding a discrete number of additional classes; for example, a URI could point to metadata rather than to the resource itself, in which case the URI would be a URC (citation). Thus, the URI space was considered to be divided into subspaces: URL and URN, and additional subspaces, to be defined. The only additional space that was proposed was the URC and it was never accepted; so, without loss of generality, it is reasonable to say that the URI space was considered divided into two classes: URL and URN. So, for example, “http:” was a URL scheme and “isbn:” would (someday) be a URN scheme. Any new schema would fall into one or the other of these two classes.
Over time, the importance of this additional level of hierarchy seemed to diminish; it was concluded that an individual schema need not be included in one of a discrete set of URI types such as “URL”, “URN”, “URC”, etc. Web identification schemes are, in general, URI schemes; a given URI scheme can define subspaces. Thus, “http:” is a URI scheme. “urn:” is also a URI scheme; it defines subspaces, called “namespaces”. For example, the set of URNs of the form “urn:isbn:n-nn-nnn-nnnn-n” is a URN namespace. (“isbn” is a URN namespace identifier. It is neither a “URN scheme” nor a “URI scheme”.)
Moreover, in the contemporary view, the term “URL” does not refer to a formal partitioning of the URI space; rather, URL is a useful but informal concept: a URL is a type of URI that identifies a resource through a representation of its primary access mechanism (e.g., its “location” on the network), rather than by any other attributes it may have. Thus, as we have noted, “http:” is a URI scheme. An http URI is a URL. The phrase “URL scheme” is now used infrequently, usually to refer to some subclass of URI schemes that excludes URNs.
The body of documents (RFCs, etc.) covering URI architecture, syntax, registration, etc., spans both classical and contemporary times. Those versed in URI issues tend to use “URL” and “URI” in a way that makes them appear to be interchangeable. Among these experts, this is not a problem. But among the Internet community at large, it is. People are not convinced that URI and URL mean the same thing, in documents where they (apparently) do. When one sees one RFC that talks about URI schemes (e.g., [RFC 2396]), another that talks about URL schemes (e.g., [RFC 2717]), and another that talks about URN schemes ([RFC 2276]) it is natural to wonder what the difference is, and how they relate to each other. Although RFC 2396 1.2 attempts to address the distinction between URIs, URLs and URNs, it has failed to clear up the confusion.
This section examines the status of registration of URI schemes and URN namespaces, as well as the mechanisms by which registration is currently performed.
A distinction should be made between:
- Registered URI schemes. The official registry of URI scheme names is maintained by IANA, at http://www.iana.org/assignments/uri-schemes . For each scheme, the RFC that defines it is listed, e.g., “http:” is defined by [RFC 2616]. The table currently contains 30 schemas. In addition, there are some “reserved” schema names; at one time they were intended to become registered schemas, but they have since been removed.
- Unregistered URI schemes. We distinguish between public (unregistered) and private schemes. A public scheme (registered or unregistered), is one for which there is some public document describing it.
- Registration of URI schemes. RFC 2717] specifies procedures for registering schema names, and points to [RFC 2718] which provides guidelines. RFC 2717 describes an organization of schemas into “trees.”
Regarding unregistered schema URIs:
- Unregistered public schemas. There exists maintains a list of known public URI schemes, both registered and unregistered, a total of 84 schemes. About 50 of them are unregistered (not listed in the IANA registry). Some may be obsolete (for example, it appears that “phone” is obsolete, superseded by “tel”). Some have an RFC, but are not included in the IANA list.
- Some have an RFC, but are not included in the IANA list.
- Private schemes. It is probably impossible to determine all of them, and it is not clear that it is worth trying, except perhaps to get an idea of their number. The minutes of the August 1997 IETF meeting note that there may be between 20 and 40 in use at Microsoft, with 2-3 added per day, and that WebTV has 24, with 6 added per year.
Regarding the registration of URI schemes:
- IETF tree. The IETF tree is intended for schemes of general interest to the Internet community, and which require a substantive review and approval process. Registration in the IETF tree requires the publication of the syntax and semantics of the schema in an RFC.
- Other trees. Although RFC 2717 describes “alternative trees”, no alternative trees have been registered to date, although a vendor-supplied tree (“vnd”) is pending. URI schemes in alternate trees will be distinguishable because they will have a “.” in the scheme name.
A URN namespace is identified by a “namespace ID”, NID, which is registered with IANA.
Pending URN NIDs
There are a number of pending URN NIDs registration requests, but there is no reliable way to discover them, or their status. For example, ‘isbn’ and ‘nbn’ have been approved by the IESG and are in the RFC Editor queue. In particular, ‘isbn’, as a potential URN namespace (or URI scheme), has been a source of much speculation and confusion for several years. It would be useful if there were some formal means of tracking the status of NID requests such as ‘isbn’.
In the “unregistered” category (besides the experimental case, which is not described in this document) there are bona fide NIDs that simply haven’t bothered to explore the registration process, and the most prominent one that comes to mind is ‘hdl’. In the case of ‘hdl’, there has been speculation that this scheme has not been registered because the owners are unclear whether it should be registered as a URI scheme or as a URN namespace.
Registration procedures for URN NIDs
A request for a NID should describe features including: structural features of identifiers (e.g., features relevant to caching/shortening approaches); specific character encoding rules (e.g., which character should be used for single quotes); regulations, standards, etc., explaining namespace structure; identifier uniqueness considerations; delegation of allocation authority, including how to become an identifier allocator; identifier persistence considerations; quality of service considerations; process for resolving identifiers; lexical equivalence rules; any special considerations necessary to conform to URN syntax (particularly applicable in the case of legacy naming systems); validation mechanisms (to determine whether a given string is currently a validly assigned URN; and scope (e.g., “U.S. Social Security numbers”).