Digital Commons Harvesting Tool: Automatically Populating the IR with Faculty Records

DC Harvesting Tool Overview

The Digital Commons (DC) Harvesting Tool with Scopus integration automates many time-consuming faculty publication workflow steps to populate your IR more comprehensively.

Adding faculty publications can fulfill core institutional goals for the IR—by providing a service to faculty, showcasing research discoveries, representing an institution’s scholarly outputs, and helping to reach open access targets. However, typical paths to find and ingest faculty publications include many, laborious manual steps for each record.

With the Digital Commons Harvesting Tool, you can automate the following workflow steps to make it more efficient regardless of whether your IR hosts metadata-only records or requires a full-text file/copy with every record:

  • Find all your institution’s works
  • Identify OA content
  • Map and prepopulate high-quality metadata for upload
  • Check for duplicate records already in your IR

Applications of the Harvesting Tool

The Harvesting Tool is particularly helpful to institutions who are:

  • Populating a brand new, or recently migrated IR
  • Standing up faculty publication workflows for the first time
  • Seeking even more efficiency with existing faculty publication workflows
  • Looking to improve departments’ and faculty members’ engagement with the IR
  • Seeking improved ways to support the research enterprise on campus
  • Wanting to identify more open access content to host

Have questions or need assistance before getting started? Contact Consulting Services at or 510-665-1200, option 2, weekdays 6:30 a.m.–7:30 p.m. North America Pacific Time.

How It Works

Administrators of Digital Commons with appropriate permissions can access the DC Harvesting Tool via their My Account page.

Access to Scopus data is integrated into the DC Harvesting Tool and is included in your Digital Commons subscription as part of a pilot through December 2021. Scopus is a leading subscription abstract and index database. It has extensive breadth and coverage with close to 80 million records from ~25,000 journals, books and book series, conference proceedings, and trade publications across all disciplines. Using robust machine learning algorithms, it delivers high-quality metadata with disambiguation at author and institution levels.

When you perform a search, the Harvesting Tool uses the Scopus API to locate relevant author and institutional records, which you can then easily export in prepopulated batch upload spreadsheets for ingest into DC.

Below is an overview of the basic workflow when using the Harvesting Tool. See the Digital Commons Harvesting Tool: Step-by-Step Guide (PDF) for detailed instructions, including tips for preparation.

1. Search Scopus data

Search by author and/or affiliation, with the option to include parameters for publication date. Author search is flexible to work with the information you have available, such as last name + affiliation, last name + first name/initial, or author IDs like ORCID and Scopus ID.

2. Review search results

Confirm results are relevant and have the correct author and other metadata. Narrow down or broaden your search if needed to refine results. It is recommended that users over-specify their search parameters whenever possible to obtain the most relevant results.

An Open Access flag indicates where a work was originally published in open access form.

3. Export records to a prepopulated spreadsheet & check for duplicates

When search results contain the records of interest, export a spreadsheet containing the Scopus metadata that maps to the schema of the target publication/collection in the IR.

Export options include a duplication check function that identifies likely duplicates either in the selected publication or in all publications of a given type.

If one of the duplication check options is chosen, a column appears in the prepopulated spreadsheet with a “Likely Duplicate” flag for matching works in the IR (based on metadata such as title, first author name, DOI, and journal title). A second column provides the URLs of the detected duplicates for easy inspection.

The spreadsheet format allows you to review, add, and fine-tune the metadata, if needed, before importing to DC. Depending on whether your IR hosts metadata-only records or requires full-text copies with records, the spreadsheet format also allows you to have a “worklist document” to then seek faculty/researcher approvals, check publisher rights and permissions, and request/obtain permitted full-text copies of works. Results from those optional steps can then be incorporated into the batch upload spreadsheet.

4. Upload the spreadsheet using DC batch import

Import records into the selected DC publication/collection using that publication’s batch import tool, skipping straight to the “Upload spreadsheet” step.

For detailed instructions on using the DC Harvesting Tool, please see the step-by-step guide (PDF). If you have any questions or would like recommendations on how the Harvesting Tool can help with your institution’s specific needs, please contact your consultant for more information.

Print this resource.