Skip to main content
Image of paleobio ledger

Mass Digitization, Paleobiology Locality Ledgers, and Transcription

By Nathan Anderson on Thu, 07/01/2021 - 13:00

­­­From November 2019 to March 2020, the Smithsonian Mass Digitization Program, part of the Digitization Program Office of the OCIO, embarked on the second phase of digitization production capturing fossil collections in partnership with the Department of Paleobiology at the Smithsonian National Museum of Natural History (NMNH) and with support from the Smithsonian National Collections Program. The project’s goal was to digitize 38,000 marine invertebrate fossils found in the eastern Pacific from the Cenozoic era. The bivalves, gastropods, and other invertebrates in this collection span the last 66 million years of the Earth’s history and are crucial to understanding how marine ecosystems responded to intense periods of environmental change. The transformation of hand written, analog specimen records to geo-referenced, interoperable online data presents a major challenge for mobilizing and increasing the accessibility of these collections. This project sought to develop and refine methods for imaging and data enrichment to help unlock this information for greater scientific use.

 

Figure 1: NMNH Paleobio digitization prior to March 2020 shutdown & example specimen photography.

 

After the pandemic halted our mass digitization imaging project, we pivoted our attention to the best use of our resources by enriching the collection’s digital records. For natural history collections, a large portion of the scientific value of the specimen is tied to its associated data and is best preserved in the form of a locality ledger. These ledgers range in creation date from as early as 1910, to as late the 1970s. They are standardized with some variations on how the information is presented, making them ideal candidates for structured transcription.

Figure 2: Digitized page from the National Museum of Natural History’s USGS Cenozoic Locality Ledgers.

 

There are entries on each ledger page describing the place where the specimen was collected– its locality. These records provide a variety of details about a locality all tied to a locality number used to identify the record. It usually includes geographic information, such as the country, state, and county where the locality is situated. Sometimes there is more detailed information, such as township, range information, and coordinates, which may include latitude and longitude, UTM, or other types of coordinates, township and range information. More descriptive information intended to help find the site, such as a narrative description (e.g., “100 yards upstream from highway bridge, at base of bluff”) is often recorded.

An important aspect of a locality for fossil specimens is the stratigraphic data, which is information about the geological context of the specimen. It might include the geological age of the rocks in which the fossil was found, the name of the rock formation, or more specific, narrative information about the position of the fossil within the site (e.g., “2 meters below the purple layer”). Stratigraphic data is important because it allows researchers to relate the fossil and the site where it was found to other fossils and fossil localities locally, nationally, or internationally.

Another piece of important information is the provenance, or how the specimen was obtained. It could be the name of the person or collectors who collected the specimen and the date when they found it. Provenance data can provide a link between the specimen and archival information like field notes, journals, or correspondence which may be an important source of additional information about the fossil.

Sometimes present are Taxonomic Types/Names of specimen(s) – This information relates to the identification of the fossil – what type of organism it is. The ledgers also contain numerous illustrations, maps, diagrams and other remarks or anecdotes.

Figure 3: Section of a ledger page w/transcribed text and where it is possible to identify different types of descriptive information.

 

Reading and interpreting all the handwritten data takes a skilled and trained team of transcription experts accustomed to working with natural history collections. Working with a two-transcriber verification system (meaning two individuals transcribe the same record twice and both sets must match) all batches of imaged ledgers are carefully transcribed before sending back to the museum for further review. Paleobiology informatics staff then work on importing the virtual records following established data cleaning protocols, including quality control review scripts and data parsing tools developed specifically for the project. Validation of new locality and collecting event data, record enhancement, and georeferencing all help provide further context into these specimens.

 

Once the data has been accepted by NMNH Department of Paleobiology staff it ultimately resides in their collection information system (EMu) and is shared publicly online at https://collections.nmnh.si.edu/search/paleo/.

Data will also be compiled and shared with the greater scientific community via such resources as https://www.idigbio.org and can be related to the Paleobiology Database navigator https://paleobiodb.org/navigator/

 

 

Figure 4: Interactive map of georeferenced fossilized marine invertebrate data online at PBDB

 

Project Details

  • 38,000 Digitized specimens w/multiple diagnostic specimen views
  • 13 Bound and imaged USGS Cenozoic locality ledgers
  • ~26,000 record localities + additional suffixed listings for some record numbers
  • Six transcribed fields per locality record
  • 3,627 pages with locality information
  • ~ 20-25 lines per page
  • ~5-7 localities per page