From Mary Willoughby, a DLG staff member who has worked on several of DLG's "Troup" method projects.
"The folder level digitization method used for the NHPRC Civil War grant project relies on the collation of tens of thousands of scans into separate folders of items. These are transformed into pdf and djvu files that are linked with the existing finding aid for each collection at the folder level. How do we make sure that the right scans end up in the right folders? We use a strict naming system to recreate the collection, box, and folder hierarchy within the file name of each scan. The name also conveys information about the institution, item, and scan number (usually, but not always equivalent to the page number) of an item.
To state the rules generally:
Master tif image file names consist of a combination of five elements that reflect the structure of the collection. They are:
1. A collection identifier consisting of an institutional prefix and the existing numeric portion of the collection ID.
2. The box number (where applicable) padded with zeros to three digits.
3. The folder number (where applicable) padded with zeros to three digits.
4. The item number (numbered according to position in folder) padded with zeros to three digits.
5. The scan number (numbered according to how many scans it takes to present an item) padded with zeros to three digits.
These segments are separated with hyphens to enhance readability and aid in visual evaluation of file lists for quality control purposes.
Derivative names (which will be used to link the presentation images to the EAD) omit the item and scan numbers since they will be one per folder.
To give a specific example:
Considering the Francis Marion Coker papers from the Hargrett Library (ms15) the fourth page of the third item of folder 2 of box 1 would be:
harg15-001-002-003-004.tif
The derivative file for this folder would be:
harg15-001-002.pdf
It would be linked to the EAD at the section identifying folder 2 of box 1.
It’s a little confusing at first, but once you learn what the different parts mean you can immediately identify where the original object that corresponds to a scan is located. This is essential for quality control purposes in case we need to rescan an item or verify its location. It also is the means by which the files are divided up for derivative creation so that the correct images for a collection folder all ultimately appear in the same pdf or djvu file."
No comments:
Post a Comment