31dom/02.md
Mauricio Dinarte 5c9742aef6 Update articles
2023-08-15 11:44:37 -06:00

8.7 KiB

Writing your first Drupal migration

In the previous chapter, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and executing migrations. Now, let's write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the standard installation profile, which comes with the Basic Page content type. As we progress through the book, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of this chapter is learning the structure of a migration plugin and how to execute it.

Writing the migration plugin

The migration plugin is a file written in YAML formar. It needs to live in a module so let's create a custom one named first_example and set Drupal core's migrate module as dependencies in the *.info.yml file.

type: module
name: First Migration
description: 'Example of basic Drupal migration. Learn more at <a href="https://understanddrupal.com/migrations" title="Drupal Migrations">https://understanddrupal.com/migrations</a>.'
package: Migrate examples
core_version_requirement: ^10 || ^11
dependencies:
  - drupal:migrate

Now, let's create a folder called migrations and inside it, a file called first_example.yml. Note that the extension is yml not yaml. The content of the file will be:

id: first_example
label: "First migration"
source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      creative_title: "The versatility of Drupal fields"
      engaging_content: "Fields are Drupal's atomic data storage mechanism..."
    - unique_id: 2
      creative_title: "What is a view in Drupal? How do they work?"
      engaging_content: "In Drupal, a view is a listing of information. It can a list of nodes, users, comments, taxonomy terms, files, etc..."
  ids:
    unique_id:
      type: integer
process:
  title: creative_title
  body: engaging_content
destination:
  plugin: entity:node
  default_bundle: page

The final folder structure will look like:

.
|-- core
|-- index.php
|-- modules
|   `-- custom
|       `-- first_example
|           `-- first_example
|               |-- migrations
|               |   `-- first_example.yml
|               `-- first_example.info.yml

YAML is a key-value format which allow nesting of elements. It is very sensitive to white spaces and indentation. For example, it requires at least one space character after the colon symbol (:) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files. You can use the !!! plugin for VsCode or !!! for PHPStorm to validate the structure of the file is correct.

A quick glimpse at the migration plugin reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys than the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options further in the book.

Let's review each key-value pair in the file. For the id key, it is customary to set its value to match the filename containing the migration plugin without the .yml extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The id value should be alphanumeric characters, optionally using underscores (_) to separate words. As for the label key, it is a human readable string used to name the migration in various interfaces.

In this example, we are using the embedded_data source plugin. It allows you to define the data to migrate right inside the plugin file. To configure it, you define a data_rows key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing "columns" of data to be imported.

A common use case for the embedded_data plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present Drupal site building workshops. To save time, I use this source plugin to create nodes which are later used when explaining how to create Views.

For the destination, we are using the entity:node plugin which allows you to create nodes of any content type. The default_bundle key indicates that all nodes to be created will be of type Basic page, by default. It is important to note that the value of the default_bundle key is the machine name of the content type. You can find it at /admin/structure/types/manage/page. In general, the Migrate API uses machine names for the values. As we explore the system, we will point out when they are used and where to find the right ones.

In the process section, you map columns from the source to node properties and fields. The keys are entity property names or the fields' machine names. In this case, we are setting values for the title of the node and its body field. You can find the field machine names in the content type configuration page: /admin/structure/types/manage/page/fields. During the migration, values can be copied directly from the source or transformed via process plugins. This example makes a verbatim copy of the values from the source to the destination. The column names in the source are not required to match the destination property or field name. In this example, they are purposely different to make them easier to identify.

The example can be downloaded at https://www.drupal.org/project/migrate_examples Download it into the ./modules/custom directory of the Drupal installation. The example above is part of the first_example submodule so make sure to enable it.

Executing the migration

Let's use Drush built-in commands to execute migrations. Open a terminal, switch directories to Drupal's webroot, and execute the following commands.

$ drush pm:enable -y migrate first_example
$ drush migrate:status
$ drush migrate:import first_example

Note: For the curious, execute drush list --filter=migrate to get a list of other migration-related commands.

The first command enables the core migrate module and the custom module holding the migration plugin. The second command shows a list of all migrations available in the system. For now, only one should be listed with the migration ID first_example. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. Congratulations, you have successfully executed your first Drupal migration!!!

Or maybe not? Drupal migrations can fail in many ways. Sometimes the error messages are not very descriptive. In chapters !!!X, we will talk about recommended workflows and strategies for debugging migrations. For now, let's mention a couple of things that could go wrong with this example. If after running the drush migrate:status command, you do not see the first_example migration, make sure that the first_example module is enabled. If it is enabled, and you do not see it, rebuild the cache by running drush cache:rebuild.

If you see the migration, but you get a YAML parse error when running the migrate:import command, check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the Basic page content type is page, not basic_page. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: drush migrate:rollback first_example, then fix the default_bundle value, rebuild the cache, and import again.

Note: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let's keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements to work on migration projects. If you decide to use Migrate Tools, make sure to check compatibily with the version of Drush.