For each OCFL repository, there MUST be an index configuration file stored on the file system below the
[root]/{repository}/config
directory named repository.xml
. Additional XSLT files may
exist here. See below for more information.
[root] └── {repository} └── config ├── metadata_format1.xsl ├── metadata_format2.xsl └── repository.xml
Use the WebDAV API, respectively a WebDAV client, in order to edit the configuration file.
An index configuration file is an XML file with the following basic structure:
<repository> <index> <object></object> <query></query> <annotation></annotation> <oai></oai> </index> </repository>
All these elements are required. The wrapper elements represent four index type configurations, object
,
query
, annotation
, and oai
.
Each of the index types object
, query
, annotation
, and oai
MUST
contain a <fileFilter>
XML element:
<repository> <index> <object> <fileFilter exclude="{regexp}">{regexp}</fileFilter> </object> <query> <fileFilter exclude="{regexp}">{regexp}</fileFilter> </query> <annotation> <fileFilter exclude="{regexp}">{regexp}</fileFilter> </annotation> <oai> <fileFilter exclude="{regexp}">{regexp}</fileFilter> </oai> </index> </repository>
The text content of the <fileFilter>
element MUST contain a regular expression selecting those OCFL
content files that are to get into an index type during reindex. The value of the @exclude
attribute
MAY contain a regular expression selecting those OCFL content files of an OCFL repository that are to be ignored during
reindex.
<fileFilter exclude="\.snapshot|TRASH">.*\/content\/[^\/]*\.xml$</fileFilter>
The example selects all OCFL content files that end with .xml
and that are located directly below an OCFL
content directory. OCFL content paths containing the strings .snapshot
or TRASH
are ignored.
Each of the index types object
, query
, annotation
, and oai
MUST
contain a <mapper>
XML element:
<repository> <index> <object> <mapper type="groovy">{mapper}</mapper> </object> <query> <mapper type="xquery">{mapper}</mapper> </query> <annotation> <mapper type="groovy">{mapper}</mapper> </annotation> <oai> <mapper type="xslt">{mapper}</mapper> </oai> </index> </repository>
object
mapper MUST be of type groovy
. query
mapper MUST be of type xquery
. annotation
mapper MUST be of type groovy
. oai
mapper MUST be of type xslt
.For detailed informaion about the respective {mapper}
of each index type, see the configuration guide
below.
The query index configuration MAY contain a <namespace>
element with child elements of any name
and text content:
<repository> <index> <query> <fileFilter exclude="{regexp}">{regexp}</fileFilter> <namespace> <{any}>{any}</{any}> </namespace> <mapper type="xquery">{mapper}</mapper> </query> </index> </repository>
The <namespace>
element defines namespace definitions for the xquery mapper, where the element's
name represents a namespace prefix and the element's text content represents a namespace URI.
<repository> <index> <query> <fileFilter>.*.xml</fileFilter> <namespace> <cmd>http://www.clarin.eu/cmd/</cmd> <json>http://basex.org/modules/json</json> </namespace> <mapper type="xquery"></mapper> </query> </index> </repository>
The example configuration results in the following XQuery namespace declarations:
xquery version "3.0"; declare namespace cmd="http://www.clarin.eu/cmd/"; declare namespace json="http://basex.org/modules/json";
The oai
index configuration MUST contain three additional elements, <repositoryName>
,
<adminEmails>
, and <metadataFormats>
.
<repository> <index> <oai> <fileFilter>{regexp}</fileFilter> <repositoryName>{any}</repositoryName> <adminEmails> <adminEmail>{email}</adminEmail> <adminEmail>{email}</adminEmail> </adminEmails> <metadataFormats> <metadataFormat> <prefix>{prefix}</prefix> <schema>{schemaURI}</schema> <namespace>{namespaceURI}</namespace> </metadataFormat> <metadataFormat> <prefix>{prefix}</prefix> <schema>{schemaURI}</schema> <namespace>{namespaceURI}</namespace> </metadataFormat> </metadataFormats> <mapper type="xslt"></mapper> </oai> </index> </repository>
<repositoryName>
and <adminEmails>
elements contain information for the
OAI-PMH Identify Request,
which is individual for each repository.<metadataFormats>
element contains information for the
OAI-PMH ListMetadataFormats Request, which is individual for each repository.<repository> <index> <oai> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter> <repositoryName>Language Archive Cologne</repositoryName> <adminEmails> <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail> </adminEmails> <metadataFormats> <metadataFormat> <prefix>oai_dc</prefix> <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema> <namespace>http://www.openarchives.org/OAI/2.0/oai_dc/</namespace> </metadataFormat> <metadataFormat> <prefix>cmd</prefix> <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema> <namespace>http://www.clarin.eu/cmd/</namespace> </metadataFormat> </metadataFormats> <mapper type="xslt"></mapper> </oai> </index> </repository>
The example results in the following OAI-PMH Identify Response:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2020-08-27T16:46:58Z</responseDate> <request verb="Identify">{}</request> <Identify> <repositoryName>Language Archive Cologne</repositoryName> <baseURL>{}</baseURL> <protocolVersion>2.0</protocolVersion> <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail> <earliestDatestamp>1990-01-01T12:00:00Z</earliestDatestamp> <deletedRecord>no</deletedRecord> <granularity>YYYY-MM-DDThh:mm:ssZ</granularity> </Identify> </OAI-PMH>
The example configuration results in the following OAI-PMH ListMetadataFormats Response:
<OAI-PMH xmlns="http://www.openarchives.org/OAI/2.0/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://www.openarchives.org/OAI/2.0/ http://www.openarchives.org/OAI/2.0/OAI-PMH.xsd"> <responseDate>2020-08-27T16:50:29Z</responseDate> <request verb="ListMetadataFormats">{}</request> <ListMetadataFormats> <metadataFormat> <metadataPrefix>oai_dc</metadataPrefix> <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema> <metadataNamespace>http://www.openarchives.org/OAI/2.0/oai_dc/</metadataNamespace> </metadataFormat> <metadataFormat> <metadataPrefix>cmd</metadataPrefix> <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema> <metadataNamespace>http://www.clarin.eu/cmd/</metadataNamespace> </metadataFormat> </ListMetadataFormats> </OAI-PMH>
The object file filter should select all the content metadata files containing the references and persistent IDs (Handle PIDs, DOIs) for all data available in an OCFL repository.
Choose one of the following available object mappers:
<repository> <index> <object> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter> <mapper type="groovy">a5.mapper.de.unikoeln.dch.blam.BlamObjectCollectionMapper</mapper> </object> </index> </repository>
The query file filter should select all metadata files that are to get into the query index. Normally, this is the content metadata files selected by the object file filter plus additional XML files such as metadata translations or transcriptions that should be searchable.
An XQuery mapper is a list of XML elements of any name, where each XML element contains two XML attributes
@fulltext
and @facet
with a boolean value and a text node representing a valid XQuery
script:
<mapper type="xquery"> <{any} fulltext="{boolean}" facet="{boolean}">{xquery}</{any}> </mapper>
accessLevel
automatically added to the query index for each
indexed XML file. System properties always start with a lower case letter.@fulltext
XML attribute defaults to true
, which means that query properties by
default are searchable by means of the $search
and $autocomplete
parameters of the
Query API. It is recommended to set @fulltext
to false
for technical property values such as PIDs or file paths in order to avoid unexpected search results and
autocomplete terms.@facet
attribute defaults to false
. Set to true
for properties that
shall be available for faceted aggregation by means of the facets
parameter of the
Query API.{xquery}
MUST be a valid XQuery script that selects those texts and values from a XML file that
shall be searchable. It is good practice if the XQuery script returns a singular string value or text node or a
sequence of string values or text nodes. If (a list of) element nodes is returned, the XQuery mapper tries to
convert the result set into a string representation.<repository> <index> <query> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*.xml</fileFilter> <namespace> <cmd>http://www.clarin.eu/cmd/</cmd> <json>http://basex.org/modules/json</json> </namespace> <mapper type="xquery"> <MetadataType fulltext="false" facet="true"> let $element-name := (//cmd:BundleGeneralInfo/local-name(.),//cmd:CollectionGeneralInfo/local-name(.))[1] return if ($element-name = "BundleGeneralInfo") then "Bundle" else if ($element-name = "CollectionGeneralInfo") then "Collection" else "Other" </MetadataType> <Title> //cmd:BundleGeneralInfo/cmd:BundleDisplayTitle/text() | //cmd:CollectionGeneralInfo/cmd:CollectionDisplayTitle/text() </Title> <MetadataObject fulltext="false" facet="false"> json:serialize(/cmd:CMD, map { 'format': 'jsonml'}) </MetadataObject> </mapper> </query> </index> </repository>
The example configuration results in the following Query API JSON response:
{ "@odata.context": "{}", ... "value": [ { "MetadataType": ["..."], "Title": ["..."], "MetadataObject": [] } ] }
The annotation file filter should select all OCFL content files of an OCFL repository that contain annotations and that are to get into the annotation index behind the IIIF compatible Annotation API.
Choose one of the following available annotation mappers:
<repository> <index> <annotation> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\.eaf</fileFilter> <mapper type="groovy">a5.mapper.nl.mpi.eaf.EafAnnotationMapper</mapper> </annotation> </index> </repository>
The oai file filter should select all metadata files of an OCFL repository that shall go into the OAI index. Normally, the oai file filter is the same as the object file filter.
An xslt mapper is a list of XML elements, where each XML element name represents a namespace prefix and an XML
attribute @script
referencing an XSLT script.
<mapper type="xslt"> <{any} script="{path}"/> </mapper>
<metadataFormat>
element of the oai index configuration. There MUST be a XSLT script for
each <metadataFormat>
defined.{path}
MUST be a relative path, relative to the repository.xml
configuration file.
The output of the XSLT script MUST be validateable against the XML Schema defined in the
<metadataFormat>
section of the OAI index configuration.<repository> <index> <oai> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter> <repositoryName>Language Archive Cologne</repositoryName> <adminEmails> <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail> </adminEmails> <metadataFormats> <metadataFormat> <prefix>oai_dc</prefix> <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema> <namespace>http://www.openarchives.org/OAI/2.0/oai_dc/</namespace> </metadataFormat> <metadataFormat> <prefix>cmd</prefix> <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema> <namespace>http://www.clarin.eu/cmd/</namespace> </metadataFormat> </metadataFormats> <mapper type="xslt"> <oai_dc script="./blam-cmdi2oai_dc.xsl"/> <cmd script="./identity.xsl"/> </mapper> </oai> </index> </repository>
<repository> <index> <object> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter> <mapper type="groovy">a5.mapper.de.unikoeln.dch.blam.BlamObjectCollectionMapper</mapper> </object> <query> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*.xml</fileFilter> <namespace> <cmd>http://www.clarin.eu/cmd/</cmd> <json>http://basex.org/modules/json</json> </namespace> <mapper type="xquery"> <id fulltext="false" facet="true"> /cmd:CMD/cmd:Header/cmd:MdSelfLink </id> <MetadataType fulltext="false" facet="true"> let $element-name := (/cmd:CMD/cmd:Components/*/cmd:BundleGeneralInfo/local-name(.),/cmd:CMD/cmd:Components/*/cmd:CollectionGeneralInfo/local-name(.))[1] return if ($element-name = "BundleGeneralInfo" and not(exists(/cmd:CMD/cmd:Resources/cmd:ResourceProxyList/*))) then "BundleTranslation" else if ($element-name = "BundleGeneralInfo") then "Bundle" else if ($element-name = "CollectionGeneralInfo") then "Collection" else "Other" </MetadataType> <Title> /cmd:CMD/cmd:Components/*/cmd:BundleGeneralInfo/cmd:BundleDisplayTitle/text() | /cmd:CMD/cmd:Components/*/cmd:CollectionGeneralInfo/cmd:CollectionDisplayTitle/text() </Title> <Creator> concat( string-join(/cmd:CMD/cmd:Components/*/cmd:BundlePublicationInfo/cmd:BundleCreators/cmd:BundleCreator/cmd:CreatorName/cmd:CreatorGivenName, ' '), " ", string-join(/cmd:CMD/cmd:Components/*/cmd:BundlePublicationInfo/cmd:BundleCreators/cmd:BundleCreator/cmd:CreatorName/cmd:CreatorFamilyName, ' '), " ", string-join(/cmd:CMD/cmd:Components/*/cmd:CollectionPublicationInfo/cmd:CollectionCreators/cmd:CollectionCreator/cmd:CreatorName/cmd:CreatorGivenName, ' '), " ", string-join(/cmd:CMD/cmd:Components/*/cmd:CollectionPublicationInfo/cmd:CollectionCreators/cmd:CollectionCreator/cmd:CreatorName/cmd:CreatorFamilyName, ' ') ) </Creator> <ObjectLanguage facet="true"> for $language in /cmd:CMD/cmd:Components//(cmd:BundleObjectLanguage | cmd:CollectionObjectLanguage) return ($language/cmd:ObjectLanguageDisplayName/text()|$language/cmd:ObjectLanguageName/text())[1] </ObjectLanguage> <Description fulltext="true" facet="false"> /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleDescription/text() | cmd:CollectionGeneralInfo/cmd:CollectionDescription/text()) </Description> <Keywords fulltext="true" facet="true"> /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleKeywords/cmd:BundleKeyword/text() | cmd:CollectionGeneralInfo/cmd:CollectionKeywords/cmd:CollectionKeyword/text()) </Keywords> <RecordingDate> /cmd:CMD/cmd:Components/*/cmd:BundleGeneralInfo/cmd:BundleRecordingDate </RecordingDate> <GeoLocation fulltext="false" facet="true"> /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleGeoLocation/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionGeoLocation/text()) </GeoLocation> <Location fulltext="true" facet="true"> /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleLocationFacet/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionLocationFacet/text()) </Location> <Region fulltext="true" facet="true"> /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleRegionFacet/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionRegionFacet/text()) </Region> <Country fulltext="true" facet="true"> /cmd:CMD/cmd:Components/*/(cmd:BundleGeneralInfo/cmd:BundleLocation/cmd:BundleCountryFacet/text() | cmd:CollectionGeneralInfo/cmd:CollectionLocation/cmd:CollectionCountryFacet/text()) </Country> <ProjectName fulltext="true" facet="false"> /cmd:CMD/cmd:Components/*/cmd:ProjectInfo/cmd:Project/cmd:ProjectDisplayName/text() </ProjectName> <ProjectDescription fulltext="true" facet="false"> /cmd:CMD/cmd:Components/*/cmd:ProjectInfo/cmd:Project/cmd:ProjectDescription/text() </ProjectDescription> <ResourceMimeType fulltext="false" facet="true"> /cmd:CMD/cmd:Resources/cmd:ResourceProxyList/cmd:ResourceProxy/cmd:ResourceType/data(@mimetype) </ResourceMimeType> <ResourceType fulltext="false" facet="true"> /cmd:CMD/cmd:Resources/cmd:ResourceProxyList/cmd:ResourceProxy/cmd:ResourceType/text() </ResourceType> <IsPartOf> /cmd:CMD/cmd:Resources/cmd:IsPartOfList/cmd:IsPartOf </IsPartOf> <MetadataObject fulltext="false" facet="false"> json:serialize(/cmd:CMD, map { 'format': 'jsonml'}) </MetadataObject> </mapper> </query> <annotation> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\.eaf</fileFilter> <mapper type="groovy">a5.mapper.nl.mpi.eaf.EafAnnotationMapper</mapper> </annotation> <oai> <fileFilter exclude="\.snapshot|TRASH|replaced-files">.*\/content\/[^\/]*\.xml</fileFilter> <repositoryName>Language Archive Cologne</repositoryName> <adminEmails> <adminEmail>dev-ka3-uzk@uni-koeln.de</adminEmail> </adminEmails> <metadataFormats> <metadataFormat> <prefix>oai_dc</prefix> <schema>http://www.openarchives.org/OAI/2.0/oai_dc.xsd</schema> <namespace>http://www.openarchives.org/OAI/2.0/oai_dc/</namespace> </metadataFormat> <metadataFormat> <prefix>cmd</prefix> <schema>http://infra.clarin.eu/cmd/xsd/minimal-cmdi.xsd</schema> <namespace>http://www.clarin.eu/cmd/</namespace> </metadataFormat> </metadataFormats> <mapper type="xslt"> <oai_dc script="./blam-cmdi2oai_dc.xsl"/> <cmd script="./identity.xsl"/> </mapper> </oai> </index> </repository>