Ingest by File System

Ingest by File System is a way to ingest hundreds of data collections or whole data repositories. It can be used for automatic ingests for which the manual Ingest by WebDAV would be too time consuming.

Contents

Pre-OCFL Structure (Before Reindex)

In order to ingest data collections, the file system MUST conform to the following pre-OCFL structure:

[root]
└── {repository}
    └── data
        ├── 0=ocfl_1.0
        ├── {collection} 1
        │   ├── {bundle} 2
        │   │   ├── 0=ocfl_object_1.0 3
        │   │   ├── 4
        │   │   ├── 5
        │   │   └── v1 6
        │   │       ├── 7
        │   │       ├── 8
        │   │       └── content 9
        │   │           ├── dir1 10
        │   │           │   ├── file1 10
        │   │           │   ├── file2 10
        │   │           │   ├── ...
        │   │           ├── dir2 10
        │   │           │   ├── file1 10
        │   │           │   ├── ...
        │   │           ├── file1 10
        │   │           ├── file2 10
        │   │           ├── ...
        │   │
        │   ├── {bundle}
        │   │   ├── 0=ocfl_object_1.0
        │   │   └── v1
        │   │       └── content
        │   │           ├── ...
        │   ├── ...
        │
        ├── {collection}
        │   ├── {bundle}
        │   ├── 0=ocfl_object_1.0
        │   ├── ...

Reindex

Use the Ingest API in order to transform the pre-OCFL structure into a full OCFL structure.

It is also possible to mix pre-OCFL objects with already full OCFL objects under one OCFL storage root. In this case, the already full OCFL objects are ignored by the Ingest API.

If pre-OCFL content files are added, deleted, or changed at a later time, you MUST start the file system ingest from the beginning by first deleting all OCFL inventory files and OCFL inventory digest files. E.g. on Linux with:

linux:[root]/{repository}/data$ find -name 'inventory.json*' -delete

Or in order to reindex just some OCFL objects or one OCFL object:

linux:[root]/{repository}/data/{collection}$ find -name 'inventory.json*' -delete
linux:[root]/{repository}/data/{collection}/{bundle}$ find -name 'inventory.json*' -delete

Full OCFL Structure (After Reindex)

After the reindex, each OCFL object contains OCFL object inventory files with information about identity, fixity, paths, as well as creator and creation time of all OCFL content files. For each OCFL inventory file there is a OCFL inventory digest file. Use the WebDAV API to browse and check the resulting OCFL structure.

Remarks

It is not possible to ingest versioned data collections with this file system ingest. For ingest of versioned data collections via file system, use one of the many available OCFL clients and OCFL validators and make sure that your OCFL structure conforms to the OCFL specification.