How we kept track of the names of 30,000 bees during digitization

By Luis J. Villanueva on Thu, 01/19/2023 - 15:00

One of the recent mass digitization projects of the Digitization Program Office (DPO) had a particular challenge because we needed to keep track of scientific names. The project was the digitization of more than thirty thousand specimens of bumblebees (genus Bombus) and carpenter bees (genus Xylocopa) from the Entomology Department of the National Museum of Natural History (NMNH).

Fig. 1. Bee specimens from the collection organized by scientific name in the trays.

Bee specimens from the collection organized by scientific name in the trays.

Scientific names are the main identification in biology of each group of individuals, but they also represent the way that taxonomists understand species to be related to each other. Due to new information or new insights, these names may change. NMNH did not want to retain the species identity in the images to reduce a possible source of confusion. The names are only saved physically in the specimen trays and digitally in the Collection Information System. We developed a workflow to allow us to track the current names of each specimen without permanently including it in the image or its metadata.

Fig. 2. Bee specimens classified under a different name in a tray.

Bee specimens classified under a different name in a tray.

The workflow during digitization can be summarized in these steps:

Vendor staff would search Virtual Barcodes, a DPO-developed system, for the scientific name of the next batch of specimens to be digitized. The system returned a barcode that encoded the numerical ID in the collection database of the scientific name.
The vendor would scan this barcode and save that ID number in the file metadata of all following specimens.
When a group of specimens with a different scientific name is set in line for digitization, the vendor repeated step 1.

Upon delivery of the images to DPO, we extracted the ID from the metadata, saved the specimen ID and the scientific name ID to a comma-separated (CSV) file and cleared the metadata field. This resulted in the images having no identifiers for the scientific name, as the museum required. Afterwards, we delivered the images to the Smithsonian Digital Asset Management System (DAMS).

Bee specimen mounted and digitized. The record of this individual can be seen at the NMNH website.

After the digitization project was completed, we delivered all the CSV files to the data manager of the collection, who imported the IDs to the database. This relatively simple workflow allowed us to meet the needs of NMNH while reducing the manual work needed. The Virtual Barcodes provided the scientific name for the vendor to verify that they had the correct name. In addition, by encoding the scientific name in a barcode that was easily scanned, we reduced the possibility of data-entry errors. This database-driven system also allowed us to add names that had been missed within a few minutes of discovering the issue.

At the end of the project, we digitized 30,020 specimens and the current scientific name of each was saved in the database with just a few clicks by the vendor and the data manager. We are approaching all new digitization projects looking for other data entry or data migration steps that can be automated or converted to mostly automated methods.