legalthesaurus.org

Our Thesaurus

Legal Thesaurus: Linked Open Data
Semantic Representation
Version: 1.3
Last updated: 24 May 2016
HTML version: http://legalthesaurus.org (for linking)
PDF version: http://legalthesaurus.org/gvp-lod.pdf (for printing)

 

1        Introduction

This document explains the representation of the Legal Thesaurus in semantic format, using appropriate ontologies.

The document is published in HTML format, with appropriately and permanently named anchors for each section that can be shared in discussions. It is also published in PDF appropriate for printing.

Many data in the Thesaurus comes from the Encyclopedia of Law. The quality of data is higher since it’s a centralized database (“single source of truth” for each claim) and has stricter editorial process

1.1       The Legal Thesaurus and the Lawi Project

The Legal Thesaurus were first built to help users categorize, describe, and index law information available in the projects of Lawi. The thesaurs is partially compliant with international standards.

Lawi Data

Lawi Data is a database of facts. Lawi Data is intended to provide a central data store for all the legal Encyclopedias.

1.1.1        About the Legal Thesaurus

The Legal Thesaurus is a structured, multilingual vocabulary including terms, descriptions, and other information for generic concepts related to law, tax, politics and public administration.

Terms for any concept may include the plural form of the term, singular form, natural order, inverted order, spelling variants, academic and common forms, various forms of speech, and synonyms that have various etymological roots. Among these terms, one is flagged as the term (or descriptor) preferred by the Legal Thesaurus. There may be multiple descriptors reflecting usage in multiple languages.

The Legal Thesaurus is a thesaurus in quasy compliance with ISO and NISO standards.

The focus of each Legal Thesaurus record is a concept. Linked to each concept are terms, related concepts, its position in the hierarchy, sources for the data, and notes. The conceptual framework of facets and hierarchies in the Legal Thesaurus is designed to allow a general classification scheme for legal subjects and related fields.
The thesaurus contains hierarchical, equivalence, and associative relationships. Currently there are several facets.

There may be multiple broader contexts, making Legal Thesaurus polyhierarchical. In addition the Legal Thesaurus has equivalence and associative relationships. The temporal coverage of the Legal Thesaurus ranges from Antiquity to the present and the scope is global.

See more information about the history, purpose and scope of the Legal Thesaurus.

Biographic Micro-thesauro

The Biographic Micro-thesauro is a structured vocabulary that includes proper names or anonymous appellations (e.g., Master of the Aachen Altar), biographies, related people or corporate bodies, and other information about judges, lawyers, firms, tribunals, and other people and groups involved in legal issues. Records in ULAN include either individuals (persons) or groups of individuals working together (corporate bodies).

The focus of each Biographic Micro-thesauro record is a person or group (judge, corporate body, lawyer, etc). Linked to each artist record are names, sources for the data, and notes. The temporal coverage of the Biographic Micro-thesauro ranges from Antiquity to the present and the scope is global.

Names in the Biographic Micro-thesauro may include given names, pseudonyms, variant spellings, names in multiple languages, and names that have changed over time (e.g., married names). Among these names, one is flagged as the preferred name.

Even though the structure is relatively flat, the Biographic Micro-thesauro is constructed as a hierarchical thesaurus; compliant with ISO and NISO standards for thesaurus construction. It currently has five published facets (see Hierarchy and Classes below). There may be multiple parents, making the Biographic Micro-thesauro structure polyhierarchical.

See more information about the history, purpose and scope of theBiographic Micro-thesauro; and see Biographic Micro-thesauro Specifics for the semantic representation of ULAN data.

1.2       Revisions, Review, Feedback

1.2.1        Revisions

We anticipate that in the future, the data will be refreshed more often.

1.2.2        External Review Process

Numerous individuals are serving as External Advisors on this project. Most have been recommended by colleagues in our community (e.g., International Terminology Working Group members who are currently translating the AAT into various languages). We ended up inviting a fairly large group because we wanted to make sure that we had expertise in many areas. It has been very important to us that these trusted colleagues had a chance to comment on our ontology choices prior to the release of the datasets.

1.2.3        Providing Feedback

We welcome comments if you find something in this document or our dataset that needs clarification or improvement. We ask specifically for your opinion in several places in this document, as listed in Future Versions.

We have established a public discussion forum (Google Group) which we hope the community will use to ask questions, discuss issues, and find solutions related to the technical aspects of this publication.

1.2.4        Disclaimer

The vocabulary datasets are provided “as is”. The Getty disclaims all other warranties, either express or implied, including, but not limited to, implied warranties of merchantability and fitness for a particular purpose, with respect to the database. The Getty Vocabularies are compiled by the Getty Vocabulary Program from contributions from various contributors, including museums, libraries, archives, bibliographic indexing projects, international translation projects, and others. Not all contributor data complies precisely with the GVP Editorial Guidelines; therefore, absolute consistency in the dataset is not possible. The data is subject to frequent updates and corrections.

1.5       Prefixes

The prefixes that we use (both internal and external) are defined in the following sections. See Prefixes.

External Ontologies

Our mapping uses a number of external ontologies (as listed in External Prefixes):
· SKOS and ISO 25964 for representing thesaurus info: They are similar, but skos:Collection has these limitations (you can’t put them under a Concept, you can’t say explicitly which are Top Collections in a scheme and you don’t have inverse/transitive versions of skos:member). We use iso:ThesaurusArray, which is a subclass of skos:Collection but can be put under a Concept using iso:superOrdinate.

· Core Dublin: DC, DCT for common properties: DCT properties are more newer strict than DC properties, and a lot of them require URI.We use DC/DCT for various common properties, e.g. dc:identifier, dct:source, dct:contributor, dct:created, dct:modified. If both a DC and DCT property fit a purpose, we use the DCT property if the target is a URL.

Schema.org Geographic Features: We use (schema:Place).

Schema.org for Agents: We’ve selected it in preference to FOAF, since it has more of what we need

  • Class schema:Person and schema:Organization (this could mean an incorporated or non-incorporated group).
  • Class schema:Event with schema:location. bio:Event is more specifically designated as a life event, so we use both of these classes
  • schema:nationality.
  • Biography places: schema:birthPlace, deathPlace, foundationLocation, dissolutionLocation. (we use also this property for the dates).

Subject

Subjects have the following information, described in the respective sections:
· Subjects are connected to the Legal thsaurus: ConceptScheme using skos:inScheme.
· Associative Relations: apply to Concepts only
· dc:identifier (see Identifiers)
· skos:exactMatch to other thesauri (see Alignment): applies to Concepts only
· skos:scopeNote links to Scope Notes
· dct:source links to Source (can be to Local Sources as shown, or directly to a global Source)

        Standard Hierarchical Relations

SKOS and ISO 25964 provide a number of hierarchical relations. We use the following (d shows the direction up/down):

.3.1        Standard Hierarchical Relations

SKOS and ISO 25964 provide a number of hierarchical relations. We use the following (d shows the direction up/down):

Relation d Domain Range Description
skos:broader skos:Concept skos:Concept Parent concept of a concept
iso:broaderGeneric skos:Concept skos:Concept Parent in the case of Genus/Species relation
iso:broaderPartitive skos:Concept skos:Concept Parent in the case of Part/Whole relation
iso:broaderInstantial skos:Concept skos:Concept Parent in the case of Kind/Instance relation
skos:broaderTransitive skos:Concept skos:Concept Ancestor concepts (transitive version of broader)
iso:superOrdinate iso:ThesaurusArray skos:Concept Parent concept of array
skos:narrower skos:Concept skos:Concept Children concepts of a concept
skos:member iso:ThesaurusArray skos:Concept Children concepts/arrays of array. See skos:member Structure for an illustration. skos:memberList is also used if the array is ordered, see skos:memberList Structure
iso:subordinateArray skos:Concept iso:ThesaurusArray Children arrays of a concept

Sort Order

Sorting with Thesaurus Array

skos:OrderedCollection defines a standard way to order its children. In addition to skos:member, this uses skos:memberList. iso:ThesaurusArray borrows the same paradigm.

skos:member Structure

· A Guide Term is represented as an iso:ThesaurusArray. The Concept is represented as skos:Concept. But it also may have a subordinate array, that is an iso:ThesaurusArray, is anonymous, and serves only to hold the skos:OrderedCollection. The anonymous array should not be displayed as a level in the hierarchy.

skos:memberList Structure

While the skos:member links establish collection membership (used both with unordered skos:Collection and ordered skos:OrderedCollection), additional skos:memberList structure provides the ordering of the members.

Associative Relationships

All associative relations in the Thesaurus are sub-properties of skos:related.

Obsolete Subject

We publish some information about Obsolete concepts:
· skos:prefLabel: only the preferred label
· schema:endDate: when it was obsoleted
· dct:isReplacedBy: merged to which subject

Terms

Terms may carry the following information:
· dc:identifier: numeric ID, also used in the term URL. See Identifiers
· dct:language see Language

Scope Note

Scope notes define the meaning of a concept, or provide historic description of a place. Notes have the following info (compare to Terms):
· dc:identifier: numeric ID, also used in the URL. See Identifiers
· · dct:language see Language.
·

Identifiers

We map the database ID’s of Subjects, Terms and Scope Notes to dc:identifier. If a Subject is merged to another, we emit it as Obsolete Subject

Notations

A notation is a code or number used to uniquely identify a concept within the scope of a given concept scheme. Unlike Terms, notations are not normally recognizable as a sequence of words in any natural language. DDC, UDC, STW and other well-known thesauri use notations. We use skos:notation.

Revision History Representation

We use dc:description: additional narrative about the action, such as which Recessive subject was merged, or what is the language of the note that was added. We also use dct:created, dct:modified and dct:issued.

For links between the entity and its actions we use skos:changeNote: from entity to action. The SKOS Advanced Documentation pattern shows this for skos:Concept.

Things and their Conceptualisations

As Dan approach in a message to the W3C public-esw-thes list:

“a SKOS “butterflies” concept is a social and technological artifact designed to help interconnect descriptions of butterflies, documents (and data) about butterflies, and people with interest or expertise relating to butterflies. I’m quite consciously avoiding saying what a “butterflies” concept in SKOS “refers to”, because theories of reference are hard to choose between. Instead, I prefer to talk about why we bother building SKOS and what we hope can be achieved by it.”

The FOAF vocabulary specification says of the property “foaf:focus”:

“The focus property relates a conceptualisation of something to the thing itself. Specifically, it is designed for use with W3C’s SKOS vocabulary, to help indicate specific individual things (typically people, places, artifacts) that are mentioned in different SKOS schemes (eg. thesauri).

W3C SKOS is based around collections of linked ‘concepts’, which indicate topics, subject areas and categories. In SKOS, properties of a skos:Concept are properties of the conceptualization (see 2005 discussion for details); for example administrative and record-keeping metadata. Two schemes might have an entry for the same individual; the foaf:focus property can be used to indicate the thing in they world that they both focus on. Many SKOS concepts don’t work this way; broad topical areas and subject categories don’t typically correspond to some particular entity. However, in cases when they do, it is useful to link both subject-oriented and thing-oriented information via foaf:focus.”

Dan summarises how he sees the property “foaf:focus” in a message to the W3C public-esw-thes list:

“The addition of foaf:topic is intended as a modest and pragmatic bridge between SKOS-based descriptions of topics, and other more entity-centric RDF descriptions. When a SKOS Concept stands for a person or agent, FOAF and its extensions are directly applicable; however we expect foaf:focus to also be used with places, events and other identifiable entities that are covered both by SKOS vocabularies as well as by factual datasets like wikipedia/dbpedia and Freebase.”

VIAF model

VIAF has interchanged data with the Encyclopedia of Law, with resulting links between library authorities and the Encyclopedia of Law. But using Lawi Data means more entities (people and organizations) and more coded information about the entities. We are keeping the entry URLs in the links file as well as the data identifiers; we think it’s much more convenient to have them together.

A commonly used bulk file from VIAF is the ‘links’ file that shows all the links made between VIAF identifiers and source file identifiers (pointers to the bulk files can be found here).  The links file includes external links, so the individual Wikipedia pages will show up in the file along with the Wikidata WKP IDs.

The VIAF authority headings which were treated as 1st class entities (viaf:Heading) passed, in many cases, in favor of simpler skos:prefLabel and skos:altLabel forms. The VIAF model helps avoid dependence on specific ontologies by using the opaque VIAF URI to identify the primary entity.

URIs

We may want to refer to something that doesn’t live on the web, with the base URI providing information about that thing. For example:

http://lawin.org/black-dictionary is the URI for the Encyclopedia of Law entry
http://lawin.org/black-dictionary#thing is a URI for the dictionary itself

A problem with this solution is the server isn’t told what the fragment identifier is, and therefore it can’t be used as the basis for a redirection, for example. To avoid this problem, we can use particular HTTP headers (eg Link or Content-Location or other specialist headers).

The above is based on the assumption that we need to have separate URIs for things that are not on the web (eg dictionaries) and documents on the web about them (eg entries about dictionaries). This is useful because it enables people to make separate statements about the author of a book:

http://lawin.org/thing/black-dictionary
dct:creator http://wikipedia.org/thing/mr-black;
.
from the authors of the Encyclopedia of Law entry about that dictionary:

http://lawin.org/thing/black-dictionary
dct:creator
http://lawin.org/user/john-smith

Other solution may be to interpret particular properties as describing a relationship between a resource and a value such as this:

dct:creator http://lawin.org/user-john-smith

Fragment Identifiers

We use, in the Ecyclopedia of Law, fragment identifiers (the bit of a URI after a #) to point to scrollable positions within an entry (as a Table of Contents), but they can be used for more than that. The important and useful thing about fragment identifiers is that they are stripped from the URI before it is submitted to the server. You can therefore have multiple fragment identifiers on the same actual page, which can then be served from a (local or intermediate or accelerator) cache without adding load to the server.

Within the Lawi projects, we’re planning to use this technique in the presentation of the results of free-text searches. Following a search for text within legislation, the visitor will be presented with a list of items of legislation that contain sections that contain that search term. From there, they will click through to a table of contents in which the relevant sections are highlighted; this must be a standard query URI.

What we ideally want to support next is for the visitor to click through to a relevant section, and then on to the next section and so on, but ensure they can click through to the highlighted table of contents at any point. This behaviour isn’t essential as the highlighted table of contents is always accessible through the Back button, and it doesn’t change the actual content of the page, but it’s helpful — an enhancement of the page.

We rely heavily on caching to support the large number of visitors to the Ecyclopedia of Law and we really don’t want to have to handle the number of distinct query-based requests that we’d get for the section views that we anticipate will result from free-text searches. Equally, we don’t really want to use cookies to record the original free-text query, as we would like to keep the URIs bookmarkable and sharable.

So we will use a fragment identifier of the form #text={search}. If Javascript is enabled, the fragment identifier will be used to rewrite the links to the table of contents and other sections. It might, in the future, be used to highlight the terms within the page or provide a status message reminding the visitor of what they originally searched for.