Index Configuration

Contents

Introduction

For each OCFL repository, there MUST be an index configuration file stored on the file system below the [root]/{repository}/config directory named repository.xml. Additional XSLT files may exist here. See below for more information.

[root]
└── {repository}
    └── config
        ├── metadata_format1.xsl
        ├── metadata_format2.xsl
        └── repository.xml

Use the WebDAV API, respectively a WebDAV client, in order to edit the configuration file.

Basic Configuration

Wrapper Elements

An index configuration file is an XML file with the following basic structure:

<repository>
    <index>
        <object></object>
        <query></query>
        <annotation></annotation>
        <oai></oai>
    </index>
</repository>

All these elements are required. The wrapper elements represent four index type configurations, object, query, annotation, and oai.

File Filter Element

Each of the index types object, query, annotation, and oai MUST contain a <fileFilter> XML element:

<repository>
    <index>
        <object>
            <fileFilter exclude="{regexp}">{regexp}</fileFilter>
        </object>
        <query>
            <fileFilter exclude="{regexp}">{regexp}</fileFilter>
        </query>
        <annotation>
            <fileFilter exclude="{regexp}">{regexp}</fileFilter>
        </annotation>
        <oai>
            <fileFilter exclude="{regexp}">{regexp}</fileFilter>
        </oai>
    </index>
</repository>

The text content of the <fileFilter> element MUST contain a regular expression selecting those OCFL content files that are to get into an index type during reindex. The value of the @exclude attribute MAY contain a regular expression selecting those OCFL content files of an OCFL repository that are to be ignored during reindex.

Example
<fileFilter exclude="\.snapshot|TRASH">.*\/content\/[^\/]*\.xml$</fileFilter>

The example selects all OCFL content files that end with .xml and that are located directly below an OCFL content directory. OCFL content paths containing the strings .snapshot or TRASH are ignored.

Mapper Element

Each of the index types object, query, annotation, and oai MUST contain a <mapper> XML element:

<repository>
    <index>
        <object>
            <mapper type="groovy">{mapper}</mapper>
        </object>
        <query>
            <mapper type="xquery">{mapper}</mapper>
        </query>
        <annotation>
            <mapper type="groovy">{mapper}</mapper>
        </annotation>
        <oai>
            <mapper type="xslt">{mapper}</mapper>
        </oai>
    </index>
</repository>

For detailed informaion about the respective {mapper} of each index type, see the configuration guide below.

Query Specific Elements

The query index configuration MAY contain a <namespace> element with child elements of any name and text content:

<repository>
    <index>
        <query>
            <fileFilter exclude="{regexp}">{regexp}</fileFilter>
            <namespace>
                <{any}>{any}</{any}>
            </namespace>
            <mapper type="xquery">{mapper}</mapper>
        </query>
    </index>
</repository>

The <namespace> element defines namespace definitions for the xquery mapper, where the element's name represents a namespace prefix and the element's text content represents a namespace URI.

Example
<repository>
    <index>
        <query>
            <fileFilter>.*.xml</fileFilter>
            <namespace>
                <cmd>http://www.clarin.eu/cmd/</cmd>
                <json>http://basex.org/modules/json</json>
            </namespace>
            <mapper type="xquery"></mapper>
        </query>
    </index>
</repository>

The example configuration results in the following XQuery namespace declarations:

xquery version "3.0";
declare namespace cmd="http://www.clarin.eu/cmd/";
declare namespace json="http://basex.org/modules/json";

OAI Specific Elements

The oai index configuration MUST contain three additional elements, <repositoryName>, <adminEmails>, and <metadataFormats>.

<repository>
    <index>
        <oai>
            <fileFilter>{regexp}</fileFilter>
            <repositoryName>{any}</repositoryName>
            <adminEmails>
                <adminEmail>{email}</adminEmail>
                <adminEmail>{email}</adminEmail>
            </adminEmails>
            <metadataFormats>
                <metadataFormat>
                    <prefix>{prefix}</prefix>
                    <schema>{schemaURI}</schema>
                    <namespace>{namespaceURI}</namespace>
                </metadataFormat>
                <metadataFormat>
                    <prefix>{prefix}</prefix>
                    <schema>{schemaURI}</schema>
                    <namespace>{namespaceURI}</namespace>
                </metadataFormat>
            </metadataFormats>
            <mapper type="xslt"></mapper>
        </oai>
    </index>
</repository>
Example
<repository>
    <index>
        <oai>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter>
            <repositoryName>Language Archive Cologne</repositoryName>
            <adminEmails>
                <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail>
            </adminEmails>
            <metadataFormats>
                <metadataFormat>
                    <prefix>oai_dc</prefix>
                    <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
                    <namespace>http://www.openarchives.org/OAI/2.0/oai_dc/</namespace>
                </metadataFormat>
                <metadataFormat>
                    <prefix>cmd</prefix>
                    <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema>
                    <namespace>http://www.clarin.eu/cmd/</namespace>
                </metadataFormat>
            </metadataFormats>
            <mapper type="xslt"></mapper>
        </oai>
    </index>
</repository>

The example results in the following OAI-PMH Identify Response:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
    <responseDate>2020-08-27T16:46:58Z</responseDate>
    <request verb="Identify">{}</request>
    <Identify>
        <repositoryName>Language Archive Cologne</repositoryName>
        <baseURL>{}</baseURL>
        <protocolVersion>2.0</protocolVersion>
        <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail>
        <earliestDatestamp>1990-01-01T12:00:00Z</earliestDatestamp>
        <deletedRecord>no</deletedRecord>
        <granularity>YYYY-MM-DDThh:mm:ssZ</granularity>
    </Identify>
</OAI-PMH>

The example configuration results in the following OAI-PMH ListMetadataFormats Response:

<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd">
    <responseDate>2020-08-27T16:50:29Z</responseDate>
    <request verb="ListMetadataFormats">{}</request>
    <ListMetadataFormats>
        <metadataFormat>
            <metadataPrefix>oai_dc</metadataPrefix>
            <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
            <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace>
        </metadataFormat>
        <metadataFormat>
            <metadataPrefix>cmd</metadataPrefix>
            <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema>
            <metadataNamespace>http://www.clarin.eu/cmd/</metadataNamespace>
        </metadataFormat>
    </ListMetadataFormats>
</OAI-PMH>

Configuration Guide

Object Index

File Filter

The object file filter should select all the content metadata files containing the references and persistent IDs (Handle PIDs, DOIs) for all data available in an OCFL repository.

Groovy Mapper

Choose one of the following available object mappers:

Example
<repository>
    <index>
        <object>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter>
            <mapper type="groovy">a5.mapper.de.unikoeln.dch.blam.BlamObjectCollectionMapper</mapper>
        </object>
    </index>
</repository>

Query Index

File Filter

The query file filter should select all metadata files that are to get into the query index. Normally, this is the content metadata files selected by the object file filter plus additional XML files such as metadata translations or transcriptions that should be searchable.

XQuery Mapper

An XQuery mapper is a list of XML elements of any name, where each XML element contains two XML attributes @fulltext and @facet with a boolean value and a text node representing a valid XQuery script:

<mapper type="xquery">
    <{any} fulltext="{boolean}" facet="{boolean}">{xquery}</{any}>
</mapper>
Example
<repository>
    <index>
        <query>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*.xml</fileFilter>
            <namespace>
                <cmd>http://www.clarin.eu/cmd/</cmd>
                <json>http://basex.org/modules/json</json>
            </namespace>
            <mapper type="xquery">
                <MetadataType fulltext="false" facet="true">
                    let $element-name := (//cmd:BundleGeneralInfo/local-name(.),//cmd:CollectionGeneralInfo/local-name(.))[1]
                    return
                    if ($element-name = "BundleGeneralInfo") then
                    "Bundle"
                    else if ($element-name = "CollectionGeneralInfo") then
                    "Collection"
                    else
                    "Other"
                </MetadataType>
                <Title>
                    //cmd:BundleGeneralInfo/cmd:BundleDisplayTitle/text() |
                    //cmd:CollectionGeneralInfo/cmd:CollectionDisplayTitle/text()
                </Title>
                <MetadataObject fulltext="false" facet="false">
                    json:serialize(/cmd:CMD, map { 'format': 'jsonml'})
                </MetadataObject>
            </mapper>
        </query>
    </index>
</repository>

The example configuration results in the following Query API JSON response:

{
    "@odata.context": "{}",
    ...
    "value":
    [
        {
            "MetadataType": ["..."],
            "Title": ["..."],
            "MetadataObject": []
        }
    ]
}

Annotation Index

File Filter

The annotation file filter should select all OCFL content files of an OCFL repository that contain annotations and that are to get into the annotation index behind the IIIF compatible Annotation API.

Groovy Mapper

Choose one of the following available annotation mappers:

Example
<repository>
    <index>
        <annotation>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\.eaf</fileFilter>
            <mapper type="groovy">a5.mapper.nl.mpi.eaf.EafAnnotationMapper</mapper>
        </annotation>
    </index>
</repository>

OAI Index

File Filter

The oai file filter should select all metadata files of an OCFL repository that shall go into the OAI index. Normally, the oai file filter is the same as the object file filter.

XSLT Mapper

An xslt mapper is a list of XML elements, where each XML element name represents a namespace prefix and an XML attribute @script referencing an XSLT script.

<mapper type="xslt">
    <{any} script="{path}"/>
</mapper>
Example
<repository>
    <index>
        <oai>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter>
            <repositoryName>Language Archive Cologne</repositoryName>
            <adminEmails>
                <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail>
            </adminEmails>
            <metadataFormats>
                <metadataFormat>
                    <prefix>oai_dc</prefix>
                    <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
                    <namespace>http://www.openarchives.org/OAI/2.0/oai_dc/</namespace>
                </metadataFormat>
                <metadataFormat>
                    <prefix>cmd</prefix>
                    <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema>
                    <namespace>http://www.clarin.eu/cmd/</namespace>
                </metadataFormat>
            </metadataFormats>
            <mapper type="xslt">
                <oai_dc script="./blam-cmdi2oai_dc.xsl"/>
                <cmd script="./identity.xsl"/>
            </mapper>
        </oai>
    </index>
</repository>

A Complete Example Configuration File

<repository>
    <index>
        <object>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter>
            <mapper type="groovy">a5.mapper.de.unikoeln.dch.blam.BlamObjectCollectionMapper</mapper>
        </object>
        <query>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*.xml</fileFilter>
            <namespace>
                <cmd>http://www.clarin.eu/cmd/</cmd>
                <json>http://basex.org/modules/json</json>
            </namespace>
            <mapper type="xquery">
                <id fulltext="false" facet="true">
                    /cmd:CMD/cmd:Header/cmd:MdSelfLink
                </id>
                <MetadataType fulltext="false" facet="true">
                    let $element-name := (/cmd:CMD/cmd:Components/*/cmd:BundleGeneralInfo/local-name(.),/cmd:CMD/cmd:Components/*/cmd:CollectionGeneralInfo/local-name(.))[1]
                    return
                    if ($element-name = "BundleGeneralInfo" and not(exists(/cmd:CMD/cmd:Resources/cmd:ResourceProxyList/*))) then
                    "BundleTranslation"
                    else if ($element-name = "BundleGeneralInfo") then
                    "Bundle"
                    else if ($element-name = "CollectionGeneralInfo") then
                    "Collection"
                    else
                    "Other"
                </MetadataType>
                <Title>
                    /cmd:CMD/cmd:Components/*/cmd:BundleGeneralInfo/cmd:BundleDisplayTitle/text()
                    |
                    /cmd:CMD/cmd:Components/*/cmd:CollectionGeneralInfo/cmd:CollectionDisplayTitle/text()
                </Title>
                <Creator>
                    concat(
                    string-join(/cmd:CMD/cmd:Components/*/cmd:BundlePublicationInfo/cmd:BundleCreators/cmd:BundleCreator/cmd:CreatorName/cmd:CreatorGivenName, ' '),
                    " ",
                    string-join(/cmd:CMD/cmd:Components/*/cmd:BundlePublicationInfo/cmd:BundleCreators/cmd:BundleCreator/cmd:CreatorName/cmd:CreatorFamilyName, ' '),
                    " ",
                    string-join(/cmd:CMD/cmd:Components/*/cmd:CollectionPublicationInfo/cmd:CollectionCreators/cmd:CollectionCreator/cmd:CreatorName/cmd:CreatorGivenName, ' '),
                    " ",
                    string-join(/cmd:CMD/cmd:Components/*/cmd:CollectionPublicationInfo/cmd:CollectionCreators/cmd:CollectionCreator/cmd:CreatorName/cmd:CreatorFamilyName, ' ')
                    )
                </Creator>
                <ObjectLanguage facet="true">
                    for $language in /cmd:CMD/cmd:Components//(cmd:BundleObjectLanguage | cmd:CollectionObjectLanguage)
                    return
                    ($language/cmd:ObjectLanguageDisplayName/text()|$language/cmd:ObjectLanguageName/text())[1]
                </ObjectLanguage>
                <Description fulltext="true" facet="false">
                    /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleDescription/text() | cmd:CollectionGeneralInfo/cmd:CollectionDescription/text())
                </Description>
                <Keywords fulltext="true" facet="true">
                    /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleKeywords/cmd:BundleKeyword/text() | cmd:CollectionGeneralInfo/cmd:CollectionKeywords/cmd:CollectionKeyword/text())
                </Keywords>
                <RecordingDate>
                    /cmd:CMD/cmd:Components/*/cmd:BundleGeneralInfo/cmd:BundleRecordingDate
                </RecordingDate>
                <GeoLocation fulltext="false" facet="true">
                    /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleGeoLocation/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionGeoLocation/text())
                </GeoLocation>
                <Location fulltext="true" facet="true">
                    /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleLocationFacet/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionLocationFacet/text())
                </Location>
                <Region fulltext="true" facet="true">
                    /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleRegionFacet/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionRegionFacet/text())
                </Region>
                <Country fulltext="true" facet="true">
                    /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleCountryFacet/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionCountryFacet/text())
                </Country>
                <ProjectName fulltext="true" facet="false">
                    /cmd:CMD/cmd:Components/*/cmd:ProjectInfo/cmd:Project/cmd:ProjectDisplayName/text()
                </ProjectName>
                <ProjectDescription fulltext="true" facet="false">
                    /cmd:CMD/cmd:Components/*/cmd:ProjectInfo/cmd:Project/cmd:ProjectDescription/text()
                </ProjectDescription>
                <ResourceMimeType fulltext="false" facet="true">
                    /cmd:CMD/cmd:Resources/cmd:ResourceProxyList/cmd:ResourceProxy/cmd:ResourceType/data(@mimetype)
                </ResourceMimeType>
                <ResourceType fulltext="false" facet="true">
                    /cmd:CMD/cmd:Resources/cmd:ResourceProxyList/cmd:ResourceProxy/cmd:ResourceType/text()
                </ResourceType>
                <IsPartOf>
                    /cmd:CMD/cmd:Resources/cmd:IsPartOfList/cmd:IsPartOf
                </IsPartOf>
                <MetadataObject fulltext="false" facet="false">
                    json:serialize(/cmd:CMD, map { 'format': 'jsonml'})
                </MetadataObject>
            </mapper>
        </query>
        <annotation>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\.eaf</fileFilter>
            <mapper type="groovy">a5.mapper.nl.mpi.eaf.EafAnnotationMapper</mapper>
        </annotation>
        <oai>
            <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter>
            <repositoryName>Language Archive Cologne</repositoryName>
            <adminEmails>
                <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail>
            </adminEmails>
            <metadataFormats>
                <metadataFormat>
                    <prefix>oai_dc</prefix>
                    <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema>
                    <namespace>http://www.openarchives.org/OAI/2.0/oai_dc/</namespace>
                </metadataFormat>
                <metadataFormat>
                    <prefix>cmd</prefix>
                    <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema>
                    <namespace>http://www.clarin.eu/cmd/</namespace>
                </metadataFormat>
            </metadataFormats>
            <mapper type="xslt">
                <oai_dc script="./blam-cmdi2oai_dc.xsl"/>
                <cmd script="./identity.xsl"/>
            </mapper>
        </oai>
    </index>
</repository>