31dom/01.md
Mauricio Dinarte 6006d8df2d Update articles
2023-08-05 01:00:52 -06:00

5.3 KiB

Drupal migrations: Understanding the ETL process

The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store it in Drupal. Its primary use is to create Drupal content and configuration entities: nodes and content types, taxonomy terms and vocabularies, users, files, etc. The API is, in fact, a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. The API is thoroughly documented, and their maintainers are very active in the #migration slack channel for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. This book covers different migrate concepts so that you can apply them to your particular project.

Understanding the ETL process

Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let's try to understand the general idea by following a real life analogy: baking bread. To make some bread, you need to obtain various ingredients: wheat flour, salt, yeast, etc. (extracting). Then, you need to combine them in a process that involves mixing and baking (transforming). Finally, when the bread is ready, you put it into shelves for display in a bakery (loading). In Drupal, each step is performed by a Migrate plugin:

The extract step is provided by source plugins.
The transform step is provided by process plugins.
The load step is provided by destination plugins.

As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, JSON and SOAP feeds, XML documents, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, !!!Google Sheets, and much more.

The list of core process plugins is extensive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transformations. Migrate Plus offers more process plugins for DOM manipulation, string replacement, array operations, etc.

Drupal core provides destination plugins for content and configuration entities. Most of the time, targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This latter is often used when upgrading sites from Drupal 6 or 7 to newer versions of Drupal. Via a combination of source, process, and destination plugins, it is possible to import Paragraphs, !!!Commerce Product Variations, and more.

Technical note: The Migrate API defines another plugin type: id_map. They are used to map source IDs to destination IDs. This allows the system to keep track of records that have been imported and roll them back if needed.

Drupal migrations: a two step process

Performing a Drupal migration is a two step process: writing the migration definitions and executing them. Migration definitions are written in YAML format. The technical name for these files is migration plugins. They contain information on how to fetch data from the source, how to process the data, and how to store it in the destination. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read from a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use as many process plugins as needed to convert your data from the format defined in the source to the format expected in the destination.

A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle variation. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the nodes to process in a particular migration.

Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using commands provided by Drush. The contributed Migrate Tools modules provides user interface (UI) to run migrations. At the time of this writing, the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic discussed in !!! chapter X. For now, we are going to stick to Drupal core's mechanisms for writing migration plugins and using Drush to execute them. !!!Contributed modules like Migrate Scheduler, Migrate Manifest, offer alternatives for executing migrations.