Benji Fisher
May 27, 2022 - DrupalCamp NJ
Migration subsystem, Usability group, Security team (provisional member)
Build. Grow. Protect.
Find a link to this presentation on my GitLab Pages:
I need to update my Drupal 6 site. Better late than never!
I am getting tired of WordPress, but I have all these posts that I want to keep. How can I switch to Drupal?
I need to create Drupal content every hour from an external Atom feed (or XML/JSON/SOAP/CSV)
Did anyone say “Feeds module”?
I need to change the structure of my live Drupal site: add/remove a field, move field data to linked Paragraphs, …
Three stages: Extract, Transform, Load (ETL)
Quiz: which stage is the most fun?
Filter pipelines:
git branch --merged | grep feature | xargs git branch -d
list | map(item => item|lower) | join(', ')
Each step gets its input from the previous one.
The Transform/Process stage of the Migrate API works the same way.
Drupal core and contrib modules provide many filters, or process plugins.
Most are configurable.
Learning to use them and combine them into pipelines takes some practice.
The Migrate API uses YAML to describe pipelines. (explanation of this example)
Convert a text field (HTML string) to a DOMDocument object, process it, and save it as a string:
process:
'body/value':
- plugin: dom
method: import
source: 'body/0/value'
# Other plugins do their work here.
- plugin: dom
method: export
The body/0/value
bit is a short-cut. It is more
complicated for multi-valued fields.
Use an XPath selector
to identify one or more elements
in a DOMDocument object:
selector |
Matches |
---|---|
//a |
all <a> elements |
//a[class="external"] |
all <a> elements with
class="external" |
//li[class="nav"]/a |
all <a> elements direct children of
<li class="nav"> |
I have to import documentation pages from an external system. The documentation is formatted as HTML, but it does not have the magic CSS classes that my theme uses. How can I make it match the site style guide?
Let’s hope the source HTML has some consistency. Then we can identify elements we want to style with an XPath expression and apply configured styles:
Every Person page starts with a job title in an
<h4>
tag and a photo. How can I move those into separate fields, and keep the rest in the Body field?
Example:
When processing a Person page, use the dom_select
plugin:
Getting the photo is similar:
dom_select
with
selector: //img/@src
.migration_lookup
to get the File ID in the Person
migration.Once the job title and photo are in separate fields, remove them from the Body field:
In my Drupal 7 site, “About us” was
/node/6
, but in the new site it is/node/136
. A lot of Body fields have<a href="/node/6">About Us</a>
. What can I do?
This is why Marco Villegas (@marvil07
) and I wrote the
DOM process plugins. Thanks to Isovera and Pega Systems for letting us
donate the code to the Migrate Plus module.
The Migrate API keeps track of source and destination IDs. Use
migration_lookup
to handle entity-reference fields:
process:
field_related_content:
- plugin: migration_lookup
source: field_related_content
migration:
- this_migration
- that_migration
Pause and reflect.
Use dom_migration_lookup
to handle text fields:
I have a Drupal 7 site that uses the Media module. How do I migrate to Drupal 9?
Standard migration: migrate files to files.
Custom migration: migrate files to media.
migration_lookup
to find file
ID from first stepThis works great for structured data (File fields).
Media tokens in text fields (Media and WYSIWYG modules)
Look at my kitten photo:
[[{"type": "media", "fid": 1909, ... }]]
There’s a module for that: Media Migration (alpha)
My kitten is even cuter!
<img src="/images/kitten.jpg" alt="cutest!" />
There’s a module for that, too: Migrate Media Handler
DOMDocument
This slide deck by
Benji
Fisher is licensed under a
Creative
Commons Attribution-ShareAlike 4.0 International License.
Based on a work at
https://gitlab.com/benjifisher/slide-decks.