Monday, April 23, 2012

Transforming metadata records for small collections into OAIDC for import into DLG union catalog

America's Turning Point is a collaborative project that brings together the Civil War resources of three Georgia institutions--Atlanta History Center, Georgia Historical Society, and the Hargrett Rare Book and Manuscript Library at UGA. Each institution will maintain access its finding aids (which will have links to the scanned documents), but the project will also bring together descriptive records for all three institutions' digitized materials in the DLG union catalog, http://dlg.galileo.usg.edu.

One of the basic requirements of the NHPRC's Digitizing Historical Records program is that funded projects re-use existing descriptive metadata. All three institutions already have EAD-encoded finding aids and MARC. These will be used as the basis for the Dublin Core records required by the DLG union catalog.

Many of the collections to be included in the project are small, consisting of a single folder. Dublin Core records for these collection can be easily created by transforming the MARC records into DC. For those in AT (AHC and Hargrett), we export MARC21slim xml records from AT, run them through a basic clean-up script, and use a modified version of the MARC21slim2OAIDC xslt stylesheet from the Library of Congress.

The basic clean up script takes care of a few simple issues

  • When exporting MARC records from AT, the root element = collection. We want it to be <record>.
  • We create an 856 field and normalize the collection names to correspond with our naming scheme.

The modified XSLT stylesheet adds new fields (<dc:contributor>, <dc:publisher>, <dc:date>, and a <dc:description> with sponsorship details). It also creates <dc:rights>, <dc:source>, and <dc:coverage.temporal> fields. Even after transformation into OAIDC, we do a minor amount of tweaking--adding <dc:coverage.spatial> and correcting punctuation in and .

Once the OAIDC records have been finalized, we use the DLG's importer to add the records to our union catalog. At this time using a local perl, we also create a tab-delimited file to import the digital objects into AT. The script captures the following data:

  • DigitalObjectID
  • dateExpression
  • objectType (provided by user)
  • title
  • uri

Friday, March 23, 2012

Packing Up

On February 23, Atlanta History Center vice president Paul Crater and I began pulling collections from the shelves at the Kenan Research Center to pack and deliver to the Digital Library of Georgia for digitization. The History Center is contributing approximately 35,000 pages of material from seventy-four collections to the project.

To reduce the amount of time collections will be at DLG and thus inaccessible to researchers, the project staff decided to deliver the content in three installments, the first of which occurred on March 19, 2012. To illustrate the volume of materials to be digitized, AHC staff pulled half of the collections set to be digitized for the grant.



Since the collections will be inaccessible to researchers until they are returned to the Atlanta History Center and reshelved in their original containers, inventories and catalog records for these collections have been temporarily removed from our online catalogs.  

For the first installment, we chose forty-six collections, many of which are relatively small in size with only a few documents in each. However, the Wilber Kurtz Jr. visual arts collection contains over 6,000 images to be digitized. We carefully rehoused all the collections in record boxes for the trip and created spreadsheets that include an accurate account of the scans required for the contents of each collection.

Which brings me to an occupational hazard: reading the material with which you are working.

Found in Carrie Berry’s diary on 3 August 1864 (MSS 29f, spelling unchanged):

This was my birthday I was ten years old but I did not hav (sic) a cake times was too hard so I celebrated by ironing I hope by my next birthday we will have peace in our land so I can have a nice dinner


And from the 1864 diary of George J. Johnston, Co. F, 60th Alabama, Gracie’s Brigade (MSS 187f, spelling unchanged):

Should I be killed in battle or die in hospital be kind enough my dear soldier friend to inform Mrs Mary L McGibbon Opelika, Ala. where my remains may be found And if you saw me fall on the battlefield tell her how I behaved myself in the presence of the enemy.

On the opposite page from the entry listed above, Private Johnston left a note for any Federal soldiers who might have stumbled on his remains:

Pay respect to my body you infernal thievish Yankee scoundrel.  I have someone at home to read this book.

Yes, reading what we're processing does tend to slow things down a tad.  But who can resist taking a peek into the past?  We're very glad to be a part of the process.

Tuesday, March 13, 2012

Training the student scanners at DLG

From Donnie Summerlin, one of the student supervisors at the DLG and a project team member.
 
"On March 8, 2012 the Digital Library of Georgia began training their student assistants in the process of scanning, naming, and cropping materials received from the Atlanta History Center.

We first taught each student how to properly prepare for handling the materials. This preparation included putting their bags in a locker, washing their hands, covering their work table with fresh paper, and cleaning their scanner to ensure the scans are of the highest quality.

Once their workspace was prepared, we trained the students to properly handle the materials during the scanning process. We felt it important to convey to them the significance of the collections they would be working with and the importance of following best practices to ensure their protection, because the safety of the documents is the most important consideration. We taught them to provide support for the documents when moving them and leave the materials in the order in which they found them. We also stressed the importance of identifying concerns with the documents throughout the process and notifying one of their supervisors to ensure that any problems are handled promptly.

The students also learned how to organize the files created during the digitization process. We taught them how to correctly establish a folder hierarchy and appropriately name the files they place in those folders to ensure that their work is placed in the proper order and context, which in turn will preserve the digital integrity of the collections. As the project continues, we plan to regularly review the process with the students to make certain that their work maintains the standards set forth in the project guidelines."





Wednesday, February 29, 2012

Filenaming

From Mary Willoughby, a DLG staff member who has worked on several of DLG's "Troup" method projects.

"The folder level digitization method used for the NHPRC Civil War grant project relies on the collation of tens of thousands of scans into separate folders of items. These are transformed into pdf and djvu files that are linked with the existing finding aid for each collection at the folder level. How do we make sure that the right scans end up in the right folders? We use a strict naming system to recreate the collection, box, and folder hierarchy within the file name of each scan. The name also conveys information about the institution, item, and scan number (usually, but not always equivalent to the page number) of an item.

To state the rules generally:

Master tif image file names consist of a combination of five elements that reflect the structure of the collection. They are:

1. A collection identifier consisting of an institutional prefix and the existing numeric portion of the collection ID.
2. The box number (where applicable) padded with zeros to three digits.
3. The folder number (where applicable) padded with zeros to three digits.
4. The item number (numbered according to position in folder) padded with zeros to three digits.
5. The scan number (numbered according to how many scans it takes to present an item) padded with zeros to three digits.

These segments are separated with hyphens to enhance readability and aid in visual evaluation of file lists for quality control purposes.

Derivative names (which will be used to link the presentation images to the EAD) omit the item and scan numbers since they will be one per folder.

To give a specific example:

Considering the Francis Marion Coker papers from the Hargrett Library (ms15) the fourth page of the third item of folder 2 of box 1 would be:
harg15-001-002-003-004.tif

The derivative file for this folder would be:
harg15-001-002.pdf

It would be linked to the EAD at the section identifying folder 2 of box 1.

It’s a little confusing at first, but once you learn what the different parts mean you can immediately identify where the original object that corresponds to a scan is located. This is essential for quality control purposes in case we need to rescan an item or verify its location. It also is the means by which the files are divided up for derivative creation so that the correct images for a collection folder all ultimately appear in the same pdf or djvu file."

Wednesday, February 22, 2012

Ramping up at UGA

Staff at DLG and Hargrett Library began ramping up for the project in early February. The first collections will arrive from the Atlanta History Center in mid-March.

Since all the imaging work will be done in two different buildings (the Main Library and the new Special Collections Library), staff in both locations are keeping in close contact. As project needs change, the students may change their home base from one unit to another. As such, the three student supervisors are working collaboratively to hire our student scanners. They've advertised the position on DawgTrak, UGA's student job bulletin board, drawn up expectation documents, and fine-tuned training documents. Student interviews are being scheduled, and we'll be doing a mock training session next week.

On the equipment front, we've rearranged DLG's workspace to accommodate our additional five scanning students. Given the volume of imaging work, we needed to ensure that we had enough working files and archival storage.

We're looking forward to the project and welcoming our new student assistants.

Monday, February 20, 2012

We've been funded!

After many months of (not so) patiently waiting, we are thrilled to announce that the National Historical Publications and Records Commission (NHPRC) has agreed to fund a project that will result in free online access to over 81,000 digital surrogates of letters, diaries, military records, account books, poetry, photographs, and maps that document the American Civil War in Georgia!

NHPRC February 2012 Newsletter

In June of 2011, the Kenan Research Center at the Atlanta History Center, in partnership with the Digital Library of Georgia, the Hargrett Rare Book and Manuscript Library of the University of Georgia, and the Georgia Historical Society. submitted a grant proposal to the NHPRC for funding to incorporate economical solutions to create, preserve, and provide free online access to these extraordinary materials.

Staff at each of the partnering libraries selected collections based on the strengths of their institution. These include the Atlanta Campaign and the defense of Savannah; the Eastern Theater and Western Theater outside of Georgia; Confederate government records and correspondence of its prominent officials; life on the homefront; slavery; and the Civil War in memory. The records include the diverse experiences and perspectives of military leaders, soldiers, and civilians whose lives were directly impacted by the Civil War. Thousands of first-hand accounts of Union and Confederate soldiers and officers document their hardships and opinions of the war and national politics. Military documents, including orders issued by William T. Sherman, describe the strategy of the Atlanta Campaign. Letters and diaries from Georgia civilians, young and old, male and female, describe in compelling detail the anxiety leading up to the war, the blockade of Georgia’s coast, the siege of Atlanta, and General Sherman’s subsequent march through Georgia. Financial and military documents reveal details of the buying and selling of slaves by private parties and by governments in the defense of the Confederacy. Letters, questionnaires, and 20th-century photograph collections capture the memories of Civil War veterans and document important Georgia Civil War landmarks a few decades after the conflict.

Now that we've got the official go-ahead, we'll be posting about the process in hopes that our experience will help other repositories as they seek to make their collections more accessible.  We look forward to the journey--stay tuned to America's Turning Point: Documenting the Civil War Experience in Georgia!