How to migrate 1800 Drupal 7 sites

Benji Fisher

May 13, 2026 - MidCamp

Introduction

About me

Yellow Pig 

Usability group, Migration subsystem, Security team

Harvard Web Publishing since Dec. 2023

Follow along

QR code for https://slides.benjifisher.info 

Outline

  • Introduction
  • How did we get here?
  • The Plan
  • How to do it
  • Two-week sprint: 100 sites
  • One Step: Provisioning
  • Iterate
  • Conclusion

How did we get here?

OpenScholar

The easiest way to power all of your institution’s research websites.

  • Drupal 7
  • Media module
  • Control panel
  • Virtual site
  • Custom modules

OpenScholar timeline

  • 2009: Open-source research program - Gary King at Harvard’s Institute for Quantitative Social Science (IQSS)
  • 2010: https://www.drupal.org/project/openscholar
  • 2017: OpenScholar becomes a private company
  • 2025: Drupal 10 version, with hosting and support

Reference: https://theopenscholar.com/about-us

OpenScholar at Harvard

  • The “Oprah effect”
  • No governance
  • 12,000 sites
  • 90% or more are tiny
  • Many are important, high-traffic sitess
    • HUPD
    • HR department

The Plan

OpenScholar or not OpenScholar?

Before my time: decide to move off the platform.

Going through a phase

  • Phase 1: Build a platform on Drupal 10. (Iterate, refine.)
    • Features
    • Design system (UX and a11y)
    • Editorial experience
  • Phase 2: Develop the migrations (Migrate API)
  • Phase 3: Migrate all the sites.

Just kidding!

Do it all at once:

  • Design and implement features and components.
  • Develop migrations.
  • Migrate all the sites.

How to do it

How to migrate one site

  • DNS (and Akamai)
  • Provision (Acquia SiteFactory)
  • SSO
  • Search (Solr)
  • Analytics (GTM/GA)
  • Theme variation
  • Tracking issue (Jira)
  • Migrate (down, across, up)
  • Cleanup
  • Review
  • Launch

How to migrate 1800 sites

One at a time.

How to migrate 1800 sites

One at a time.

Just kidding!

How many sites per sprint?

Chart of sites migrated in each 2-week sprint 

33 two-week sprints

1859 total, up to 119 sites per sprint

Communication

12,000 sites means 12,000 site owners

Kill, Keep, Combine:

  • Kill 10,000 sites
  • Keep (migrate) almost 2,000 sites
  • Combine or split: only a handful

Two-week sprint: 100 sites

Google Sheets

  • Dump data from the Drupal 7 multisites
  • Enter Kill/Keep decisions
  • Choose theme options

Problem: too many cooks, not enough validation

Jira

  • One story per site
  • Track status:
    • Provisioned
    • Migrated
    • Cleaned up
    • In review
    • Approved
    • Launched
  • Track who owns the next step

Common denominator

What is the common format for Google Sheets and Jira?

Common denominator: CSV

  • Import/export into/from Google Sheets/Jira
  • Save and track in Git
  • Add/update and review with pull requests

Google Sheets and Jira are too mutable. CSV files in a Git repository are reliable.

Anatomy of a CSV file

One row per site. Some of the columns:

  • OS URL
  • SF Prod Domain
  • Backup URL
  • SF Group
  • vSite Name
  • SF Machine Name
  • PM Email
  • Stack ID
  • Theme Recipe

Scripts

  • Prepare for a sprint
    • Validate
    • Check for duplicates
    • DNS request
    • Jira import
  • Set up analytics
  • Provision (2 scripts)
  • Launch (4 scripts)

New CSV for Old

Every site that is migrated in one sprint gets launched at the same time.

Just kidding!

  1. Export from Jira: sites ready to launch
  2. Extract a list of site “machine names”
  3. Look up data in provisioning CSV files
  4. Create new CSV with one row for each site

One Step: Provisioning

Provision one site

Form to create a new site in Acquia Cloud SiteFactory 

Provision a few sites

Acquia provides an API and a CLI tool:

acli acsf:sites:create \
  --install_profile=hwp_dstan \
  --stack_id=2 \
  mynewsite \
  206

The group “Demo Sites” has numeric ID 206.

Provision a lot of sites

sites.csv:

name, stack, group
mynewsite, 2, 206
anothersite, 3, 576

Run a script:

./scripts/provision-site.php sites.csv

Automation: Scripts and CSV files

  • Efficiency
    • One script does more than provision.
  • Reliability
    • People review the CSV files.
    • Scripts validate the data.
  • Recovery
    • There is an audit trail.

Iterate

Start small

Migrate less than 25 sites in each of the first 10 sprints.

  • Develop the scripts
  • Refine the process
  • Discover what goes wrong

Bottlenecks

  • Cleanup of migrated sites
  • Finding enough sites approved for migration
  • Communication with site owners
    • Keep/Kill
    • Approve to start
    • Approve to launch
  • Launch (cap: 65/week)

Conclusion

Outline

  • Introduction
  • How did we get here?
  • The Plan
  • How to do it
  • Two-week sprint: 100 sites
  • One Step: Provisioning
  • Iterate
  • Conclusion

References

Questions

Copyleft

Creative Commons License
This slide deck by Benji Fisher is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/benjifisher/slide-decks.