Note: this is a multilingual thesaurus; that is, a thesaurus that uses more than one language, in which each concept is represented by a preferred term in each of the languages, and there is a single structure of hierarchical and associative relationships between concepts that is independent of language.
RDF flat literals are formally defined as character strings with optional language labels. In this way, SKOS enables a simple form of multilingual labeling. This is done by using the language tag of a lexical tag to restrict its scope to a particular language. The following example illustrates how a concept receives a preferred label in English and a preferred label in French:
ex:animals rdf:type skos:Concept;
Following common practice in KOS design, the preferred label of a concept can also be used to unambiguously represent this concept within a KOS and its applications. Therefore, although not formally mandated by the SKOS data model, it is recommended that no two concepts in the same KOS receive the same preferred lexical label for any linguistic label.
Note that the notion of preferred tag implies that a resource can only have one such tag per language tag, as seen below:
Labeling and language tags
Language tags (see below) are defined to identify languages. Note that “en”, “en-GB”, “en-US” are three different language tags, used with English, British English and US English, respectively. Similarly, “ja”, “ja-Hani”, “ja-Hira”, “ja-Kana” and “ja-Latn” are five different language labels used with Japanese, Japanese written with kanji, hiragana script, katakana script or with Latin characters (rōmaji), respectively.
The graph below is consistent with the SKOS data model, because “en”, “en-US” and “en-GB” are different linguistic labels.
Color> skos:prefLabel “color”@es , “color”@es-US , “color”@es-GB .
In the following graph, there is no conflict between the lexical labeling properties, again because “en” and “en-GB” are different language labels, and therefore the graph is consistent with the SKOS data model.
Love> skos:prefLabel “love”@en ; skos:altLabel “love”@en-GB .
Note, however, that these examples serve only to illustrate general features of the SKOS data model, and do not necessarily indicate best practices for providing labels with different language labels. Application- and language-specific usage conventions with respect to labels and language tags are beyond the scope of the SKOS Reference.
It is suggested that applications match requests for tags in a given language with related language tags that are provided by a SKOS concept scheme, e.g., by implementing the “lookup” algorithm. Applications that perform matching in this way do not require that tags be provided in all possible language variations (of which there could be many), and are compatible with SKOS concept schemes that provide only those tags whose lexical forms are distinct for a given language or collection of languages.
Language or Language Tagging
Humans on our planet have, in the past and present, used a number of languages. There are many reasons why one would want to identify the language used when presenting or requesting information.
It is often necessary to identify the language of an item of information or the language preferences of a user in order to apply the appropriate treatment. For example, the user’s language preferences in a web browser can be used to select web pages appropriately. Language information can also be used to select among tools (such as dictionaries) to help process or understand content in different languages. Knowledge of the particular language used by some information content may be useful or even necessary for some types of processing, e.g., spell checking, computer-synthesized speech, Braille transcription, or high-quality printed representations.
One way to indicate the language used is to label the information content with an identifier or “tag”. These tags can also be used to specify user preferences for selecting information content or to label additional attributes of the content and associated resources.
Sometimes language tags are used to indicate additional linguistic attributes of the content. For example, indicating specific information about the dialect, writing system, or spelling used in a document or resource may enable the user to obtain the information in a form that he or she can understand, or may be important for processing or representing the given content in an appropriate form or style.
The language tag
Language tags are used to help identify languages, whether spoken, written, signed or otherwise, for the purpose of communication. This includes constructed and artificial languages, but excludes languages not primarily intended for human communication, such as programming languages.
A linguistic or language tag is composed of a sequence of one or more “subtags”, each of which refines or narrows the range of languages identified by the overall tag.
There are different types of sub-labels, each of which is distinguished by its length, its position in the label and its content: each type of sub-label can be recognized only by these characteristics. This makes it possible to extract and assign certain semantic information to the subtags, even if the specific values of the subtags are not recognized. Therefore, a linguistic tag processor does not need to have a list of valid tags or subtags (i.e., a copy of some version of the IANA Linguistic Subtag Registry) to perform common search and matching operations. The only exceptions to this ability to infer meaning from the structure of the subtags are the older tags listed in the “regular” and “irregular” productions.
Extended language subtags
Extended language sub-labels are used to identify certain specially selected languages that, for various historical and compatibility reasons, are closely identified with or labeled with an existing primary language sub-label. Extended language sub-labels are always used with the accompanying primary language sub-label (indicated with a
Prefix” field in the record) when they are used to form the language tag.
All languages that have an extended language subtag in the registry also have an identical primary language subtag record in the registry. This primary language subtag is RECOMMENDED to form the language tag.
Fact checker: Chris