From 6006d8df2d87c16398def0a32ffc9c3c501c226d Mon Sep 17 00:00:00 2001 From: Mauricio Dinarte Date: Sat, 5 Aug 2023 01:00:52 -0600 Subject: [PATCH] Update articles --- 01.md | 16 ++++++++-------- 02.md | 58 ++++++++++++++++++++++++++++------------------------------ 03.md | 2 +- 3 files changed, 37 insertions(+), 39 deletions(-) diff --git a/01.md b/01.md index a000dcf..67a61d8 100644 --- a/01.md +++ b/01.md @@ -1,27 +1,27 @@ # Drupal migrations: Understanding the ETL process -The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store them in Drupal. It is, in fact, a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. Its primary use is to create Drupal content entities: nodes, users, files, comments, etc. The API is thoroughly [documented](https://www.drupal.org/docs/drupal-apis/migrate-api), and their maintainers are very active in the #migration [slack channel](https://www.drupal.org/slack) for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. Today we are starting a blog post series that will cover different migrate concepts so that you can apply them to your particular project. +The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store it in Drupal. Its primary use is to create Drupal content and configuration entities: nodes and content types, taxonomy terms and vocabularies, users, files, etc. The API is, in fact, a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. The API is thoroughly [documented](https://www.drupal.org/docs/drupal-apis/migrate-api), and their maintainers are very active in the #migration [slack channel](https://www.drupal.org/slack) for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. This book covers different migrate concepts so that you can apply them to your particular project. ## Understanding the ETL process -Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let's try to understand the general idea by following a real life analogy: baking bread. To make some bread, you need to obtain various ingredients: wheat flour, salt, yeast, etc. (_extracting_). Then, you need to combine them in a process that involves mixing and baking (_transforming_). Finally, when the bread is ready, you put it into shelves for display in the bakery (_loading_). In Drupal, each step is performed by a Migrate plugin: +Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let's try to understand the general idea by following a real life analogy: baking bread. To make some bread, you need to obtain various ingredients: wheat flour, salt, yeast, etc. (_extracting_). Then, you need to combine them in a process that involves mixing and baking (_transforming_). Finally, when the bread is ready, you put it into shelves for display in a bakery (_loading_). In Drupal, each step is performed by a Migrate plugin: The extract step is provided by source plugins. The transform step is provided by process plugins. The load step is provided by destination plugins. -As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, XML documents, JSON and SOAP feeds, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, Google Sheets, and much more. +As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, JSON and SOAP feeds, XML documents, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, !!!Google Sheets, and much more. -The [list of core process plugins](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) is impressive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transform operations. [Migrate Plus](https://www.drupal.org/project/migrate_plus) offers more process plugins for DOM manipulation, string replacement, transliteration, etc. +The [list of core process plugins](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) is extensive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transformations. [Migrate Plus](https://www.drupal.org/project/migrate_plus) offers more process plugins for DOM manipulation, string replacement, array operations, etc. -Drupal core provides destination plugins for content and configuration entities. Most of the time, targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This is often used when upgrading sites from Drupal 6 or 7 to Drupal 8. Via a combination of source, process, and destination plugins, it is possible to write Commerce Product Variations, Paragraphs, and more. +Drupal core provides destination plugins for content and configuration entities. Most of the time, targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This latter is often used when upgrading sites from Drupal 6 or 7 to newer versions of Drupal. Via a combination of source, process, and destination plugins, it is possible to import Paragraphs, !!!Commerce Product Variations, and more. Technical note: The Migrate API defines another plugin type: **id_map**. They are used to map source IDs to destination IDs. This allows the system to keep track of records that have been imported and roll them back if needed. ## Drupal migrations: a two step process -Performing a Drupal migration is a two step process: **writing** the migration definitions and **executing** them. Migration definitions are written in YAML format. These files contain information on how to fetch data from the _source_, how to _process_ the data, and how to store it in the _destination_. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read from a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use **as many process plugins as needed** to convert your data from the format defined in the source to the format expected in the destination. +Performing a Drupal migration is a two step process: **writing** the migration definitions and **executing** them. Migration definitions are written in YAML format. The technical name for these files is **migration plugins**. They contain information on how to fetch data from the _source_, how to _process_ the data, and how to store it in the _destination_. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read from a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use **as many process plugins as needed** to convert your data from the format defined in the source to the format expected in the destination. -A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the elements to process in a particular migration. +A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle variation. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the nodes to process in a particular migration. -Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using the [Migrate Tools](https://www.drupal.org/project/migrate_tools) module which provides [Drush](https://www.drush.org/) commands and a user interface (UI) to run migrations. Note that the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic we will cover in the future. For now, we are going to stick to Drupal core's mechanisms of defining migrations. Contributed modules like Migrate Scheduler, Migrate Manifest, and Migrate Run offer alternatives for executing migrations. +Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using commands provided by [Drush](https://www.drush.org/). The contributed Migrate Tools modules provides user interface (UI) to run migrations. At the time of this writing, the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic discussed in !!! chapter X. For now, we are going to stick to Drupal core's mechanisms for writing migration plugins and using Drush to execute them. !!!Contributed modules like Migrate Scheduler, Migrate Manifest, offer alternatives for executing migrations. \ No newline at end of file diff --git a/02.md b/02.md index 4454f14..1ed2897 100644 --- a/02.md +++ b/02.md @@ -1,26 +1,26 @@ # Writing your first Drupal migration -In the previous chapter, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and running migrations. Now, let's write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the `standard` installation profile, which comes with the `Basic Page` content type. As we progress through the book, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of this chapter is learning the structure of a migration definition file and how to run it. +In the previous chapter, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and executing migrations. Now, let's write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the `standard` installation profile, which comes with the `Basic Page` content type. As we progress through the book, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of this chapter is learning the structure of a migration plugin and how to execute it. -## Writing the migration definition file +## Writing the migration plugin -The migration definition file needs to live in a module. So, let's create a custom one named `ud_migrations_first` and set Drupal core's `migrate` module as dependencies in the \*.info.yml file. +The migration plugin is a file written in YAML formar. It needs to live in a module so let's create a custom one named `first_example` and set Drupal core's `migrate` module as dependencies in the `*.info.yml` file. ```yaml type: module -name: UD First Migration +name: First Example description: 'Example of basic Drupal migration. Learn more at https://understanddrupal.com/migrations.' -package: Understand Drupal -core: 8.x +package: Migrate examples +core_version_requirement: ^10 || ^11 dependencies: - drupal:migrate ``` -Now, let's create a folder called `migrations` and inside it, a file called `udm_first.yml`. Note that the extension is `yml` not `yaml`. The content of the file will be: +Now, let's create a folder called `migrations` and inside it, a file called `first_example.yml`. Note that the extension is `yml` not `yaml`. The content of the file will be: ```yaml -id: udm_first -label: "UD First migration" +id: first_example +label: "First migration" source: plugin: embedded_data data_rows: @@ -49,47 +49,45 @@ The final folder structure will look like: |-- index.php |-- modules | `-- custom -| `-- ud_migrations -| `-- ud_migrations_first +| `-- first_example +| `-- first_example | |-- migrations -| | `-- udm_first.yml -| `-- ud_migrations_first.info.yml +| | `-- first_example.yml +| `-- first_example.info.yml ``` -YAML is a key-value format with optional nesting of elements. It is **very sensitive to white spaces and indentation**. For example, it requires at least one space character after the colon symbol (**:**) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files. +YAML is a key-value format which allow nesting of elements. It is **very sensitive to white spaces and indentation**. For example, it requires at least one space character after the colon symbol (**:**) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files. You can use the !!! plugin for VsCode or !!! for PHPStorm to validate the structure of the file is correct. -A quick glimpse at the file reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys than the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options further in the book. +A quick glimpse at the migration plugin reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys than the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options further in the book. -Let's review each key-value pair in the file. For the `id` key, it is customary to set its value to match the filename containing the migration definition, but without the `.yml` extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The `id` value should be alphanumeric characters, optionally using underscores (**\_**) to separate words. As for the `label` key, it is a human readable string used to name the migration in various interfaces. +Let's review each key-value pair in the file. For the `id` key, it is customary to set its value to match the filename containing the migration plugin without the `.yml` extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The `id` value should be alphanumeric characters, optionally using underscores (**\_**) to separate words. As for the `label` key, it is a human readable string used to name the migration in various interfaces. -In this example, we are using the [embedded_data](https://api.drupal.org/api/drupal/core!modules!migrate!src!Plugin!migrate!source!EmbeddedDataSource.php/class/EmbeddedDataSource) source plugin. It allows you to define the data to migrate right inside the definition file. To configure it, you define a `data_rows` key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing "columns" of data to be imported. +In this example, we are using the [embedded_data](https://api.drupal.org/api/drupal/core!modules!migrate!src!Plugin!migrate!source!EmbeddedDataSource.php/class/EmbeddedDataSource) source plugin. It allows you to define the data to migrate right inside the plugin file. To configure it, you define a `data_rows` key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing "columns" of data to be imported. -A common use case for the `embedded_data` plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present Drupal site building workshops. To save time, I use this plugin to create nodes which are later used when explaining how to create Views. +A common use case for the `embedded_data` plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present Drupal site building workshops. To save time, I use this source plugin to create nodes which are later used when explaining how to create Views. For the destination, we are using the `entity:node` plugin which allows you to create nodes of any content type. The `default_bundle` key indicates that all nodes to be created will be of type `Basic page`, by default. It is important to note that the value of the `default_bundle` key is the **machine name** of the content type. You can find it at `/admin/structure/types/manage/page`. In general, the Migrate API uses _machine names_ for the values. As we explore the system, we will point out when they are used and where to find the right ones. In the process section, you map columns from the source to node properties and fields. The keys are entity property names or the fields' machine names. In this case, we are setting values for the `title` of the node and its `body` field. You can find the field machine names in the content type configuration page: `/admin/structure/types/manage/page/fields`. During the migration, values can be copied directly from the source or transformed via process plugins. This example makes a verbatim copy of the values from the source to the destination. The column names in the source are not required to match the destination property or field name. In this example, they are purposely different to make them easier to identify. -The repository, which will be used for many examples throughout the book, can be downloaded at Place it into the `./modules/custom` directory of the Drupal installation. The example above is part of the "UD First Migration" submodule so make sure to enable it. +The example can be downloaded at Download it into the `./modules/custom` directory of the Drupal installation. The example above is part of the `first_example` submodule so make sure to enable it. -## Running the migration +## Executing the migration -Let's use Drush to run the migrations with the commands provided by [Migrate Run](https://www.drupal.org/project/migrate_run). Open a terminal, switch directories to Drupal's webroot, and execute the following commands. +Let's use Drush built-in commands to execute migrations. Open a terminal, switch directories to Drupal's webroot, and execute the following commands. ```console -$ drush pm:enable -y migrate migrate_run ud_migrations_first +$ drush pm:enable -y migrate first_example $ drush migrate:status -$ drush migrate:import udm_first +$ drush migrate:import first_example ``` -**Note**: It is assumed that the Migrate Run module has been downloaded via composer or otherwise. +**Note**: For the curious, execute `drush list --filter=migrate` to get a list of other migration-related commands. -**Important**: All code snippets showing Drush commands assume version 10 unless otherwise noted. If you are using Drush 8 or lower, the commands' names and aliases are different. Usually, a hyphen (-) was used as delimiter in command names. For example, `pm-enable` in Drush 8 instead of `pm:enable` in Drush 10. Execute `drush list --filter=migrate` to verify the proper commands for your version of Drush. +The first command enables the core migrate module and the custom module holding the migration plugin. The second command shows a list of all migrations available in the system. For now, only one should be listed with the migration ID `first_example`. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. **Congratulations, you have successfully executed your first Drupal migration!!!** -The first command enables the core migrate module, the runner, and the custom module holding the migration definition file. The second command shows a list of all migrations available in the system. For now, only one should be listed with the migration ID `udm_first`. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. **Congratulations, you have successfully run your first Drupal migration!!!** +_Or maybe not?_ Drupal migrations can fail in many ways. Sometimes the error messages are not very descriptive. In chapters !!!X, we will talk about recommended workflows and strategies for debugging migrations. For now, let's mention a couple of things that could go wrong with this example. If after running the `drush migrate:status` command, you do not see the `first_example` migration, make sure that the `first_example` module is enabled. If it is enabled, and you do not see it, rebuild the cache by running `drush cache:rebuild`. -_Or maybe not?_ Drupal migrations can fail in many ways. Sometimes the error messages are not very descriptive. In upcoming chapters, we will talk about recommended workflows and strategies for debugging migrations. For now, let's mention a couple of things that could go wrong with this example. If after running the `drush migrate:status` command, you do not see the `udm_first` migration, make sure that the `ud_migrations_first` module is enabled. If it is enabled, and you do not see it, rebuild the cache by running `drush cache:rebuild`. +If you see the migration, but you get a YAML parse error when running the `migrate:import` command, check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the `Basic page` content type is `page`, not `basic_page`. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: `drush migrate:rollback first_example`, then fix the `default_bundle` value, rebuild the cache, and import again. -If you see the migration, but you get a YAML parse error when running the `migrate:import` command, check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the "Basic page" content type is `page`, not `basic_page`. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: `drush migrate:rollback udm_first`, then fix the `default_bundle` value, rebuild the cache, and import again. - -**Note**: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let's keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements to work on migration projects. If you decide to use Migrate Tools, make sure to uninstall Migrate Run. Both provide the same Drush commands and conflict with each other if the two are enabled. +**Note**: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let's keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements to work on migration projects. If you decide to use Migrate Tools, make sure to check compatibily with the version of Drush. diff --git a/03.md b/03.md index 5364d5d..4008c3f 100644 --- a/03.md +++ b/03.md @@ -1,6 +1,6 @@ # Using process plugins for data transformation in Drupal migrations -In the previous chapter, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination or to meet business requirements. Now we will learn more about process plugins and how they work as part of the Drupal migration pipeline. +In the previous chapter, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination syatem or to meet business requirements. Now we will learn more about process plugins and how they work as part of the Drupal migration pipeline. ## Syntactic sugar