Skip to main content
botanical specimen, fern

Spotlight on Office of Research Computing - Digitization and Deep Learning

By Jessica Warner on Wed, 12/19/2018 - 11:09

At the Smithsonian Institution’s (SI) Digitization Program Office (DPO), our mission is to partner with others to increase the quantity, quality, and impact of digitized collections. This blog post is the first in a series highlighting the outstanding work of our partners from around the Smithsonian who help us do just that. 

Since 2015 the DPO’s Mass Digitization Program has teamed up with staff in the Smithsonian’s National Museum of Natural History’s (NMNH) Botany Department to image over 2 million (out of a total 5 million) botanical specimens held in the United States National Herbarium so they can be made available to researchers around the world. And this is where our colleagues in the Smithsonian’s Office of Research Computing (RC) and the Data Science Lab come in!

RC works with Smithsonian researchers as a collaborator, partner and advocate. One of the primary objectives of RC is to use Data Science to help researchers develop innovative approaches to move their research forward, and make it more widely accessible to others. This was evident in a recent collaboration, where Adam Metallo of the DPO worked closely with colleagues in the Data Science Lab (Rebecca Dikow and Paul Frandsen) and NMNH (Laurence Dorr, Sylvia Orli, and Eric Schuettpelz) to explore how applying deep learning principles to such a massive quantity of digital assets could reveal hidden information and invite new questions to consider.

The first deep learning experiment included an investigation of whether computer vision could be used to detect mercury staining on the botanical specimens. The team trained a convolutional neural network that was 91% accurate in detecting mercury staining (Schuettpelz et al., 2017). The team’s current work, led by Alex White, a post-doctoral fellow, is looking at extending the initial model to distinguish among genera of ferns and fern-allies.

Botanical specimen, fern

By having access to a vast data set of digitized botanical specimens, the RC’s Data Science team has been able to use deep learning technology to uncover significant information about them in a whole new way. 

Smithsonian’s Office of the Chief Information Officer | Research Computing group consists of:

Beth Stern – Director 
     Data Science Lab: 
         Dr. Rebecca Dikow - Research Data Scientist
          Mike Trizna – Data Scientist
          Mirian Tsuchiya – Post-doctoral Fellow (Genomics)
          Alex White - Post-doctoral Fellow (Machine Learning) 
Keri Thompson – Research Data Management
Dan Davis – Technical Manager 
Adam Soroka – Senior Solutions Architect

The work of the DPO is enabled and enriched by our enterprising co-workers, whether we’re working with SI museum, archive or library staff who select and prepare collections for digital capture,

Mass digitization workstation and technician

or with IT professionals who ensure that digitized assets move smoothly and safely through our data ecosystem,

OCIO PRISM homescreen

or with educators who use digital assets to bring collections into classrooms around the world.

SCLDA Learning Lab homescreen slideshow