How to migrate 1800 Drupal 7 sites

Benji Fisher

May 13, 2026 - MidCamp

Introduction

About me

Yellow Pig 

Usability group, Migration subsystem, Security team

Harvard Web Publishing since Dec. 2023

Follow along

QR code for https://slides.benjifisher.info 

Outline

  • Introduction
  • How did we get here?
  • The Plan
  • How to do it
  • Two-week sprint: 100 sites
  • One Step: Provisioning
  • Iterate
  • Database Deep Dive
  • Conclusion

How did we get here?

OpenScholar

The easiest way to power all of your institution’s research websites.

  • Drupal 7
  • Media module
  • Control panel
  • Virtual site
  • Custom modules

OpenScholar timeline

  • 2009: Open-source research program - Gary King at Harvard’s Institute for Quantitative Social Science (IQSS)
  • 2010: https://www.drupal.org/project/openscholar
  • 2017: OpenScholar becomes a private company
  • 2025: Drupal 10 version, with hosting and support

Reference: https://theopenscholar.com/about-us

OpenScholar at Harvard

  • The “Oprah effect”
  • No governance
  • 12,000 sites
  • 90% or more are tiny
  • Many are important, high-traffic sites
    • HUPD
    • HR department

The Plan

OpenScholar or not OpenScholar?

Before my time: decide to move off the platform.

Going through a phase

  • Phase 1: Build a platform on Drupal 10. (Iterate, refine.)
    • Features
    • Design system (UX and a11y)
    • Editorial experience
  • Phase 2: Develop the migrations (Migrate API)
  • Phase 3: Migrate all the sites.

Just kidding!

Do it all at once:

  • Design and implement features and components.
  • Develop migrations.
  • Migrate all the sites.

How to do it

The team

Harvard Web Publishing (HWP)

  • Client engagement: 6.5 people
  • Engineering: 5 people (2 back-end developers)
  • UX research: 1.5 people

FFW/Jakala:

  • Tech leads: 2
  • BE devs: 2 or 3
  • FE devs: 2 or 3
  • Manual cleanup: 5 or 6

How to migrate one site

  • DNS (and Akamai)
  • Provision (Acquia SiteFactory)
  • SSO
  • Search (Solr)
  • Analytics (GTM/GA)
  • Theme variation
  • Tracking issue (Jira)
  • Migrate (down, across, up)
  • Cleanup
  • Review
  • Launch

How to migrate 1800 sites

One at a time.

How to migrate 1800 sites

One at a time.

Just kidding!

How many sites per sprint?

Chart of sites migrated in each 2-week sprint 

33 two-week sprints: February, 2024 to June, 2025

1856 total, up to 118 sites per sprint

Communication

12,000 sites means 12,000 site owners

Kill, Keep, Combine:

  • Kill 10,000 sites
  • Keep (migrate) almost 2,000 sites
  • Combine or split: only a handful

Two-week sprint: 100 sites

Google Sheets

  • Dump data from the Drupal 7 multisites
  • Enter Kill/Keep decisions
  • Choose theme options

Problem: too many cooks, not enough validation

Jira

  • One story per site
  • Track status:
    • Provisioned
    • Migrated
    • Cleaned up
    • In review
    • Approved
    • Launched
  • Track who owns the next step

Common denominator

What is the common format for Google Sheets and Jira?

Common denominator: CSV

  • Import/export into/from Google Sheets/Jira
  • Save and track in Git
  • Add/update and review with pull requests

Google Sheets and Jira are too mutable. CSV files in a Git repository are reliable.

Anatomy of a CSV file

One row per site. Some of the columns:

  • OS URL
  • SF Prod Domain
  • Backup URL
  • SF Group
  • vSite Name
  • SF Machine Name
  • PM Email
  • Stack ID
  • Theme Recipe

Scripts

  • Prepare for a sprint
    • Validate
    • Check for duplicates
    • DNS request
    • Jira import
  • Set up analytics
  • Provision (2 scripts)
  • Launch (4 scripts)

New CSV for Old

Every site that is migrated in one sprint gets launched at the same time.

Just kidding!

  1. Export from Jira: sites ready to launch
  2. Extract a list of site “machine names”
  3. Look up data in provisioning CSV files
  4. Create new CSV with one row for each site

One Step: Provisioning

Provision one site

Form to create a new site in Acquia Cloud SiteFactory 

Provision a few sites

Acquia provides an API and a CLI tool:

acli acsf:sites:create \
  --install_profile=hwp_dstan \
  --stack_id=2 \
  mynewsite \
  206

The group “Demo Sites” has numeric ID 206.

Provision a lot of sites

sites.csv:

name, stack, group
mynewsite, 2, 206
anothersite, 3, 576

Run a script:

./scripts/provision-site.php sites.csv

Automation: Scripts and CSV files

  • Efficiency
    • One script does more than provision.
  • Reliability
    • People review the CSV files.
    • Scripts validate the data.
  • Recovery
    • There is an audit trail.

Iterate

Start small

Migrate less than 25 sites in each of the first 10 sprints.

  • Develop the scripts
  • Refine the process
  • Discover what goes wrong

Bottlenecks

  • Cleanup of migrated sites
  • Finding enough sites approved for migration
  • Communication with site owners
    • Keep/Kill
    • Approve to start
    • Approve to launch
  • Launch (cap: 65/week)

Database Deep Dive

The problem with widgets

  • Good news: lots of consistency
    • All 1,800 sites have the same codebase, same database structure.
  • Bad news: some curious structure

OpenScholar started in 2009, while Drupal 7 was still in development.

It used “widgets”: sort of like content blocks (Drupal 8+) or Bricks (Drupal 7 contrib module, 2015).

Sample query

MySQL [db]> SELECT s.value
    FROM node n
    JOIN vsite_layout_block v ON v.context = CONCAT('os_pages-page-', n.nid)
    JOIN spaces_overrides s ON s.object_id = v.delta
    WHERE n.nid = 13311\G
*************************** 1. row ***************************
value: O:8:"stdClass":7:{
  s:8:"embed_as";a:2:{...}
  s:7:"changed";N;
  s:5:"delta";i:1362519977;
  s:5:"title";s:15:"Graduate Office";
  s:11:"description";s:27:"Graduate Office Information";
  s:7:"options";a:5:{...}
  s:10:"plugin_key";s:13:"os_boxes_html";
}
1 row in set (0.093 sec)

Fast and slow

In a database (like MySQL):

  • String functions are fast: SUBSTRING_INDEX()
  • Joins are fast: JOIN spaces_overrides s ON s.object_id = v.delta
  • Joins on substrings/concatenations are SLOW

MySQL slow query

MySQL [db]> SELECT v.sid, ogm.gid
    FROM vsite_layout_block v
    JOIN og_membership ogm ON CONCAT('os_pages-page-', ogm.etid) = v.context
    WHERE v.sid <> ogm.gid
    AND ogm.entity_type = 'node'
    LIMIT 0,10;
+------+-----+
| sid  | gid |
+------+-----+
| 3111 |   6 |
| 3111 |   6 |
+------+-----+
2 rows in set (55 min 40.365 sec)

Solution: pre-compute the plugin key

Add a column to that table:

ALTER TABLE spaces_overrides
ADD COLUMN plugin_key varchar(64);

Now populate that column:

UPDATE spaces_overrides s
SET plugin_key = SUBSTRING_INDEX(SUBSTRING_INDEX(SUBSTRING_INDEX(s.value, 's:10:"plugin_key";', -1), '";', 1), ':"', -1)
WHERE s.object_type = 'boxes'
AND plugin_key IS NULL;
Query OK, 106770 rows affected (1.801 sec)
Rows matched: 106770  Changed: 106770  Warnings: 0

Conclusion

Outline

  • Introduction
  • How did we get here?
  • The Plan
  • How to do it
  • Two-week sprint: 100 sites
  • One Step: Provisioning
  • Iterate
  • Database Deep Dive
  • Conclusion

Please Provide Your Feedback!

mid.camp/50

Contribution Day

Thursday 10:00am to 4:00pm

  • You don’t have to know code to give back!
  • New Contributor training 9:00am to 10:00am

References

Questions

Copyleft

Creative Commons License
This slide deck by Benji Fisher is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/benjifisher/slide-decks.