How do you migrate 1800 Drupal 7 sites?

Benji Fisher

June 6, 2025 - Evolve Drupal

Introduction

About me

Yellow Pig 

Usability group, Migration subsystem, Security team

Harvard Web Publishing since Dec. 2023

Follow along

QR code for https://slides.benjifisher.info 

Outline

  • Introduction
  • How did we get here?
  • The Plan
  • How to do it
  • Two-week sprint: 100 sites
  • Iterate
  • Conclusion

How did we get here?

OpenScholar

The easiest way to power all of your institution’s research websites.

  • Drupal 7
  • Media module
  • Control panel
  • Virtual site
  • Custom modules

OpenScholar timeline

  • 2009: Open-source research program - Gary King at Harvard’s Institute for Quantitative Social Science (IQSS)
  • 2010: https://www.drupal.org/project/openscholar
  • 2017: OpenScholar becomes a private company
  • 2025: Drupal 10 version, with hosting and support

Reference: https://theopenscholar.com/about-us

OpenScholar at Harvard

  • The “Oprah effect”
  • No governance
  • 12,000 sites
  • 90% or more are tiny
  • Many are important, high-traffic sitess
    • HUPD
    • HR department

The Plan

OpenScholar or not OpenScholar?

Before my time: decide to move off the platform.

Going through a phase

  • Phase 1: Build a platform on Drupal 10. (Iterate, refine.)
    • Features
    • Design system (UX and a11y)
    • Editorial experience
  • Phase 2: Develop the migrations (Migrate API)
  • Phase 3: Migrate all the sites.

Just kidding!

Do it all at once:

  • Still designing and implementing features, components
  • Developing migrations until a few weeks ago
  • In a few weeks, all our sites will be migrated

How to do it

How to migrate one site

  • DNS (and Akamai)
  • Provision (Acquia SiteFactory)
  • Analytics (GTM/GA)
  • Theme variation
  • Tracking issue (Jira)
  • Migrate (down, across, up)
  • Cleanup
  • Review
  • SSO
  • Launch

How to migrate 1800 sites

One at a time.

How to migrate 1800 sites

One at a time.

Just kidding!

How many sites per sprint?

Chart of sites migrated in each 2-week sprint 

33 two-week sprints

1859 total, up to 119 sites per sprint

Communication

12,000 sites means 12,000 site owners

Kill, Keep, Combine:

  • Kill 10,000 sites
  • Keep (migrate) almost 2,000 sites
  • Combine or split: only a handful

Two-week sprint: 100 sites

Google Sheets

  • Dump data from the Drupal 7 multisites
  • Enter Kill/Keep decisions
  • Choose theme options

Problem: too many cooks, not enough validation

Jira

  • One story per site
  • Track status:
    • Provisioned
    • Migrated
    • Cleaned up
    • In review
    • Approved
    • Launched
  • Track who owns the next step

Common denominator

What is the common format for Google Sheets and Jira?

Common denominator: CSV

  • Import/export into/from Google Sheets/Jira
  • Save and track in Git
  • Add/update and review with pull requests

Google Sheets and Jira are too mutable. CSV files in a Git repository are reliable.

Anatomy of a CSV file

One row per site. Some of the columns:

  • OS URL
  • SF Prod Domain
  • Backup URL
  • SF Group
  • vSite Name
  • SF Machine Name
  • PM Email
  • Stack ID
  • Theme Recipe

Scripts

  • Prepare for a sprint
    • Validate
    • Check for duplicates
    • DNS request
    • Jira import
  • Set up analytics
  • Provision (2 scripts)
  • Launch (4 scripts)

New CSV for Old

Every site that is migrated in one sprint gets launched at the same time.

Just kidding!

  1. Export from Jira: sites ready to launch
  2. Extract a list of site “machine names”
  3. Look up data in provisioning CSV files
  4. Create new CSV with one row for each site

Iterate

Start small

Migrate less than 25 sites in the first 10 sprints.

  • Develop the scripts
  • Refine the process
  • Discover what goes wrong

Bottlenecks

  • Cleanup of migrated sites
  • Finding enough sites approved for migration
  • Communication with site owners
    • Keep/Kill
    • Approve to start
    • Approve to launch
  • Launch (current cap: 65/week)

Conclusion

Outline

  • Introduction
  • How did we get here?
  • The Plan
  • How to do it
  • Two-week sprint: 100 sites
  • Iterate
  • Conclusion

References

Questions

Copyleft

Creative Commons License
This slide deck by Benji Fisher is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/benjifisher/slide-decks.