Compare commits

..

10 commits

Author SHA1 Message Date
Mauricio Dinarte
378703cbd2 Update articles 2023-08-15 06:07:17 -06:00
Mauricio Dinarte
0a46f584fb Update articles 2023-08-05 08:00:35 -06:00
Mauricio Dinarte
2569fbaeea Update articles 2023-08-05 05:25:40 -06:00
Mauricio Dinarte
6006d8df2d Update articles 2023-08-05 01:00:52 -06:00
Mauricio Dinarte
fa217c896e Add script to conver MD files to TXT 2023-08-04 11:00:18 -06:00
Mauricio Dinarte
811cfcd876 Move book index to export directory 2023-08-04 10:52:04 -06:00
Mauricio Dinarte
ba253739b1 Rename files 2023-08-04 10:43:39 -06:00
Mauricio Dinarte
f5cf7db0c5 Include video course promo 2020-10-04 21:24:46 -06:00
Mauricio Dinarte
90ec6af8f9 Update image paths 2020-10-04 17:52:49 -06:00
Mauricio Dinarte
48ff8401e9 Fix parenthesis in URL 2020-10-04 17:52:49 -06:00
40 changed files with 611 additions and 545 deletions

2
.gitignore vendored Normal file
View file

@ -0,0 +1,2 @@
export/*
!export/Book.txt

27
01.md Normal file
View file

@ -0,0 +1,27 @@
# Drupal migrations: Understanding the ETL process
The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store it in Drupal. Its primary use is to create Drupal content and configuration entities: nodes and content types, taxonomy terms and vocabularies, users, files, etc. The API is, in fact, a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. The API is thoroughly [documented](https://www.drupal.org/docs/drupal-apis/migrate-api), and their maintainers are very active in the #migration [slack channel](https://www.drupal.org/slack) for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. This book covers different migrate concepts so that you can apply them to your particular project.
## Understanding the ETL process
Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let's try to understand the general idea by following a real life analogy: baking bread. To make some bread, you need to obtain various ingredients: wheat flour, salt, yeast, etc. (_extracting_). Then, you need to combine them in a process that involves mixing and baking (_transforming_). Finally, when the bread is ready, you put it into shelves for display in a bakery (_loading_). In Drupal, each step is performed by a Migrate plugin:
The extract step is provided by source plugins.
The transform step is provided by process plugins.
The load step is provided by destination plugins.
As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, JSON and SOAP feeds, XML documents, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, !!!Google Sheets, and much more.
The [list of core process plugins](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) is extensive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transformations. [Migrate Plus](https://www.drupal.org/project/migrate_plus) offers more process plugins for DOM manipulation, string replacement, array operations, etc.
Drupal core provides destination plugins for content and configuration entities. Most of the time, targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This latter is often used when upgrading sites from Drupal 6 or 7 to newer versions of Drupal. Via a combination of source, process, and destination plugins, it is possible to import Paragraphs, !!!Commerce Product Variations, and more.
Technical note: The Migrate API defines another plugin type: **id_map**. They are used to map source IDs to destination IDs. This allows the system to keep track of records that have been imported and roll them back if needed.
## Drupal migrations: a two step process
Performing a Drupal migration is a two step process: **writing** the migration definitions and **executing** them. Migration definitions are written in YAML format. The technical name for these files is **migration plugins**. They contain information on how to fetch data from the _source_, how to _process_ the data, and how to store it in the _destination_. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read from a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use **as many process plugins as needed** to convert your data from the format defined in the source to the format expected in the destination.
A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle variation. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the nodes to process in a particular migration.
Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using commands provided by [Drush](https://www.drush.org/). The contributed Migrate Tools modules provides user interface (UI) to run migrations. At the time of this writing, the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic discussed in !!! chapter X. For now, we are going to stick to Drupal core's mechanisms for writing migration plugins and using Drush to execute them. !!!Contributed modules like Migrate Scheduler, Migrate Manifest, offer alternatives for executing migrations.

27
01.txt
View file

@ -1,27 +0,0 @@
# Drupal migrations: Understanding the ETL process
The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store them in Drupal. It is, in fact, a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. Its primary use is to create Drupal content entities: nodes, users, files, comments, etc. The API is thoroughly [documented](https://www.drupal.org/docs/drupal-apis/migrate-api), and their maintainers are very active in the #migration [slack channel](https://www.drupal.org/slack) for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. Today we are starting a blog post series that will cover different migrate concepts so that you can apply them to your particular project.
## Understanding the ETL process
Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let's try to understand the general idea by following a real life analogy: baking bread. To make some bread, you need to obtain various ingredients: wheat flour, salt, yeast, etc. (_extracting_). Then, you need to combine them in a process that involves mixing and baking (_transforming_). Finally, when the bread is ready, you put it into shelves for display in the bakery (_loading_). In Drupal, each step is performed by a Migrate plugin:
The extract step is provided by source plugins.
The transform step is provided by process plugins.
The load step is provided by destination plugins.
As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, XML documents, JSON and SOAP feeds, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, Google Sheets, and much more.
The [list of core process plugins](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) is impressive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transform operations. [Migrate Plus](https://www.drupal.org/project/migrate_plus) offers more process plugins for DOM manipulation, string replacement, transliteration, etc.
Drupal core provides destination plugins for content and configuration entities. Most of the time, targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This is often used when upgrading sites from Drupal 6 or 7 to Drupal 8. Via a combination of source, process, and destination plugins, it is possible to write Commerce Product Variations, Paragraphs, and more.
Technical note: The Migrate API defines another plugin type: **id_map**. They are used to map source IDs to destination IDs. This allows the system to keep track of records that have been imported and roll them back if needed.
## Drupal migrations: a two step process
Performing a Drupal migration is a two step process: **writing** the migration definitions and **executing** them. Migration definitions are written in YAML format. These files contain information on how to fetch data from the _source_, how to _process_ the data, and how to store it in the _destination_. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read from a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use **as many process plugins as needed** to convert your data from the format defined in the source to the format expected in the destination.
A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the elements to process in a particular migration.
Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using the [Migrate Tools](https://www.drupal.org/project/migrate_tools) module which provides [Drush](https://www.drush.org/) commands and a user interface (UI) to run migrations. Note that the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic we will cover in the future. For now, we are going to stick to Drupal core's mechanisms of defining migrations. Contributed modules like Migrate Scheduler, Migrate Manifest, and Migrate Run offer alternatives for executing migrations.

93
02.md Normal file
View file

@ -0,0 +1,93 @@
# Writing your first Drupal migration
In the previous chapter, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and executing migrations. Now, let's write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the `standard` installation profile, which comes with the `Basic Page` content type. As we progress through the book, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of this chapter is learning the structure of a migration plugin and how to execute it.
## Writing the migration plugin
The migration plugin is a file written in YAML formar. It needs to live in a module so let's create a custom one named `first_example` and set Drupal core's `migrate` module as dependencies in the `*.info.yml` file.
```yaml
type: module
name: First Migration
description: 'Example of basic Drupal migration. Learn more at <a href="https://understanddrupal.com/migrations" title="Drupal Migrations">https://understanddrupal.com/migrations</a>.'
package: Migrate examples
core_version_requirement: ^10 || ^11
dependencies:
- drupal:migrate
```
Now, let's create a folder called `migrations` and inside it, a file called `first_example.yml`. Note that the extension is `yml` not `yaml`. The content of the file will be:
```yaml
id: first_example
label: "First migration"
source:
plugin: embedded_data
data_rows:
- unique_id: 1
creative_title: "The versatility of Drupal fields"
engaging_content: "Fields are Drupal's atomic data storage mechanism..."
- unique_id: 2
creative_title: "What is a view in Drupal? How do they work?"
engaging_content: "In Drupal, a view is a listing of information. It can a list of nodes, users, comments, taxonomy terms, files, etc..."
ids:
unique_id:
type: integer
process:
title: creative_title
body: engaging_content
destination:
plugin: "entity:node"
default_bundle: page
```
The final folder structure will look like:
```yaml
.
|-- core
|-- index.php
|-- modules
| `-- custom
| `-- first_example
| `-- first_example
| |-- migrations
| | `-- first_example.yml
| `-- first_example.info.yml
```
YAML is a key-value format which allow nesting of elements. It is **very sensitive to white spaces and indentation**. For example, it requires at least one space character after the colon symbol (**:**) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files. You can use the !!! plugin for VsCode or !!! for PHPStorm to validate the structure of the file is correct.
A quick glimpse at the migration plugin reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys than the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options further in the book.
Let's review each key-value pair in the file. For the `id` key, it is customary to set its value to match the filename containing the migration plugin without the `.yml` extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The `id` value should be alphanumeric characters, optionally using underscores (**\_**) to separate words. As for the `label` key, it is a human readable string used to name the migration in various interfaces.
In this example, we are using the [embedded_data](https://api.drupal.org/api/drupal/core!modules!migrate!src!Plugin!migrate!source!EmbeddedDataSource.php/class/EmbeddedDataSource) source plugin. It allows you to define the data to migrate right inside the plugin file. To configure it, you define a `data_rows` key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing "columns" of data to be imported.
A common use case for the `embedded_data` plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present Drupal site building workshops. To save time, I use this source plugin to create nodes which are later used when explaining how to create Views.
For the destination, we are using the `entity:node` plugin which allows you to create nodes of any content type. The `default_bundle` key indicates that all nodes to be created will be of type `Basic page`, by default. It is important to note that the value of the `default_bundle` key is the **machine name** of the content type. You can find it at `/admin/structure/types/manage/page`. In general, the Migrate API uses _machine names_ for the values. As we explore the system, we will point out when they are used and where to find the right ones.
In the process section, you map columns from the source to node properties and fields. The keys are entity property names or the fields' machine names. In this case, we are setting values for the `title` of the node and its `body` field. You can find the field machine names in the content type configuration page: `/admin/structure/types/manage/page/fields`. During the migration, values can be copied directly from the source or transformed via process plugins. This example makes a verbatim copy of the values from the source to the destination. The column names in the source are not required to match the destination property or field name. In this example, they are purposely different to make them easier to identify.
The example can be downloaded at <https://www.drupal.org/project/migrate_examples> Download it into the `./modules/custom` directory of the Drupal installation. The example above is part of the `first_example` submodule so make sure to enable it.
## Executing the migration
Let's use Drush built-in commands to execute migrations. Open a terminal, switch directories to Drupal's webroot, and execute the following commands.
```console
$ drush pm:enable -y migrate first_example
$ drush migrate:status
$ drush migrate:import first_example
```
**Note**: For the curious, execute `drush list --filter=migrate` to get a list of other migration-related commands.
The first command enables the core migrate module and the custom module holding the migration plugin. The second command shows a list of all migrations available in the system. For now, only one should be listed with the migration ID `first_example`. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. **Congratulations, you have successfully executed your first Drupal migration!!!**
_Or maybe not?_ Drupal migrations can fail in many ways. Sometimes the error messages are not very descriptive. In chapters !!!X, we will talk about recommended workflows and strategies for debugging migrations. For now, let's mention a couple of things that could go wrong with this example. If after running the `drush migrate:status` command, you do not see the `first_example` migration, make sure that the `first_example` module is enabled. If it is enabled, and you do not see it, rebuild the cache by running `drush cache:rebuild`.
If you see the migration, but you get a YAML parse error when running the `migrate:import` command, check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the `Basic page` content type is `page`, not `basic_page`. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: `drush migrate:rollback first_example`, then fix the `default_bundle` value, rebuild the cache, and import again.
**Note**: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let's keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements to work on migration projects. If you decide to use Migrate Tools, make sure to check compatibily with the version of Drush.

95
02.txt
View file

@ -1,95 +0,0 @@
# Writing your first Drupal migration
In the previous chapter, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and running migrations. Now, let's write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the `standard` installation profile, which comes with the `Basic Page` content type. As we progress through the book, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of this chapter is learning the structure of a migration definition file and how to run it.
## Writing the migration definition file
The migration definition file needs to live in a module. So, let's create a custom one named `ud_migrations_first` and set Drupal core's `migrate` module as dependencies in the \*.info.yml file.
```yaml
type: module
name: UD First Migration
description: 'Example of basic Drupal migration. Learn more at <a href="https://understanddrupal.com/migrations" title="Drupal Migrations">https://understanddrupal.com/migrations</a>.'
package: Understand Drupal
core: 8.x
dependencies:
- drupal:migrate
```
Now, let's create a folder called `migrations` and inside it, a file called `udm_first.yml`. Note that the extension is `yml` not `yaml`. The content of the file will be:
```yaml
id: udm_first
label: "UD First migration"
source:
plugin: embedded_data
data_rows:
- unique_id: 1
creative_title: "The versatility of Drupal fields"
engaging_content: "Fields are Drupal's atomic data storage mechanism..."
- unique_id: 2
creative_title: "What is a view in Drupal? How do they work?"
engaging_content: "In Drupal, a view is a listing of information. It can a list of nodes, users, comments, taxonomy terms, files, etc..."
ids:
unique_id:
type: integer
process:
title: creative_title
body: engaging_content
destination:
plugin: "entity:node"
default_bundle: page
```
The final folder structure will look like:
```yaml
.
|-- core
|-- index.php
|-- modules
| `-- custom
| `-- ud_migrations
| `-- ud_migrations_first
| |-- migrations
| | `-- udm_first.yml
| `-- ud_migrations_first.info.yml
```
YAML is a key-value format with optional nesting of elements. It is **very sensitive to white spaces and indentation**. For example, it requires at least one space character after the colon symbol (**:**) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files.
A quick glimpse at the file reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys than the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options further in the book.
Let's review each key-value pair in the file. For the `id` key, it is customary to set its value to match the filename containing the migration definition, but without the `.yml` extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The `id` value should be alphanumeric characters, optionally using underscores (**\_**) to separate words. As for the `label` key, it is a human readable string used to name the migration in various interfaces.
In this example, we are using the [embedded_data](https://api.drupal.org/api/drupal/core!modules!migrate!src!Plugin!migrate!source!EmbeddedDataSource.php/class/EmbeddedDataSource) source plugin. It allows you to define the data to migrate right inside the definition file. To configure it, you define a `data_rows` key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing "columns" of data to be imported.
A common use case for the `embedded_data` plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present Drupal site building workshops. To save time, I use this plugin to create nodes which are later used when explaining how to create Views.
For the destination, we are using the `entity:node` plugin which allows you to create nodes of any content type. The `default_bundle` key indicates that all nodes to be created will be of type `Basic page`, by default. It is important to note that the value of the `default_bundle` key is the **machine name** of the content type. You can find it at `/admin/structure/types/manage/page`. In general, the Migrate API uses _machine names_ for the values. As we explore the system, we will point out when they are used and where to find the right ones.
In the process section, you map columns from the source to node properties and fields. The keys are entity property names or the fields' machine names. In this case, we are setting values for the `title` of the node and its `body` field. You can find the field machine names in the content type configuration page: `/admin/structure/types/manage/page/fields`. During the migration, values can be copied directly from the source or transformed via process plugins. This example makes a verbatim copy of the values from the source to the destination. The column names in the source are not required to match the destination property or field name. In this example, they are purposely different to make them easier to identify.
The repository, which will be used for many examples throughout the book, can be downloaded at <https://github.com/dinarcon/ud_migrations> Place it into the `./modules/custom` directory of the Drupal installation. The example above is part of the "UD First Migration" submodule so make sure to enable it.
## Running the migration
Let's use Drush to run the migrations with the commands provided by [Migrate Run](https://www.drupal.org/project/migrate_run). Open a terminal, switch directories to Drupal's webroot, and execute the following commands.
```console
$ drush pm:enable -y migrate migrate_run ud_migrations_first
$ drush migrate:status
$ drush migrate:import udm_first
```
**Note**: It is assumed that the Migrate Run module has been downloaded via composer or otherwise.
**Important**: All code snippets showing Drush commands assume version 10 unless otherwise noted. If you are using Drush 8 or lower, the commands' names and aliases are different. Usually, a hyphen (-) was used as delimiter in command names. For example, `pm-enable` in Drush 8 instead of `pm:enable` in Drush 10. Execute `drush list --filter=migrate` to verify the proper commands for your version of Drush.
The first command enables the core migrate module, the runner, and the custom module holding the migration definition file. The second command shows a list of all migrations available in the system. For now, only one should be listed with the migration ID `udm_first`. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. **Congratulations, you have successfully run your first Drupal migration!!!**
_Or maybe not?_ Drupal migrations can fail in many ways. Sometimes the error messages are not very descriptive. In upcoming chapters, we will talk about recommended workflows and strategies for debugging migrations. For now, let's mention a couple of things that could go wrong with this example. If after running the `drush migrate:status` command, you do not see the `udm_first` migration, make sure that the `ud_migrations_first` module is enabled. If it is enabled, and you do not see it, rebuild the cache by running `drush cache:rebuild`.
If you see the migration, but you get a YAML parse error when running the `migrate:import` command, check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the "Basic page" content type is `page`, not `basic_page`. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: `drush migrate:rollback udm_first`, then fix the `default_bundle` value, rebuild the cache, and import again.
**Note**: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let's keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements to work on migration projects. If you decide to use Migrate Tools, make sure to uninstall Migrate Run. Both provide the same Drush commands and conflict with each other if the two are enabled.

123
03.md Normal file
View file

@ -0,0 +1,123 @@
# Using process plugins for data transformation in Drupal migrations
In the previous chapter, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination system or to meet business requirements. Now we will learn more about process plugins and how they work as part of the Drupal migration pipeline.
## Syntactic sugar
The Migrate API offers a lot of syntactic sugar to make it easier to write migration plugins. Field mappings in the process section are an example of this. Each of them requires a process plugin to be defined. If none is manually set, then the `get` plugin is assumed. The following two code snippets are equivalent in functionality.
```yaml
process:
title: creative_title
```
```yaml
process:
title:
plugin: get
source: creative_title
```
The `get` process plugin copies a value from the source to the destination without making any changes. Because this is a common operation, `get` is considered the default. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:
```yaml
process:
destination_field:
plugin: plugin_name
config_1: value_1
config_2: value_2
config_3: value_3
```
The process plugin is configured within an extra level of indentation under the destination field. The `plugin` key is required and determines which plugin to use. Then, a list of configuration options follows. Refer to the documentation of each plugin to know what options are available. Some configuration options will be required while others will be optional. For example, the `concat` plugin requires a `source`, but the `delimiter` is optional. An example of its use appears later in this chapter.
## Providing default values
Sometimes the destination requires a property or field to be set, but that information is not present in the source. Imagine you are migrating nodes. It is recommended to write one migration per content type. If you have a migration to nodes of type `Basic page`, it would be redundant to have a column in the source with the same value for every row. The data might not be needed. Or it might not exist. In any case, the `default_value` plugin can be used to provide a value when the data is not available in the source.
```yaml
source: ...
process:
type:
plugin: default_value
default_value: page
destination:
plugin: 'entity:node'
```
The above example sets the `type` property for all nodes in this migration to `page`, which is the machine name of the `Basic page` content type. Do not confuse the name of the plugin with the name of its configuration property as they happen to be the same: `default_value`. Also note that because `type` is manually set in the process section, the `default_bundle` key in the destination section is no longer required. You can see the latter being used in the example of the chapter 2. If `type` is defined in the `process` section and `default_bundle` in the `destination` section, the former takes precedence.
## Concatenating values
Consider the following migration request: you have a source listing people with first and last name in separate columns. Both are capitalized. The two values need to be put together (concatenated) and used as the title of nodes of type `Basic page`. The character casing needs to be changed so that only the first letter of each word is capitalized. If there is a need to display them in all caps, CSS can be used for presentation. For example: `FELIX DELATTRE` would be transformed to `Felix Delattre`.
_Tip_: Question business requirements when they might produce undesired results. For instance, if you were to implement this feature as requested `DAMIEN MCKENNA` would be transformed to `Damien Mckenna`. That is not the correct capitalization for the last name `McKenna`. If automatic transformation is not possible or feasible for all variations of the source data, take note and perform manual updates after the initial migration. Evaluate as many use cases as possible and bring them to the client's attention.
To implement this feature, let's create a new module `process_example`. Inside its `migrations` folder, and write a migration plugin called `process_example.yml`. Download the sample module from <https://www.drupal.org/project/migrate_examples> For this example, we assume a Drupal installation using the `standard` installation profile which comes with the `Basic Page` content type. Let's see how to handle the concatenation of first and last name.
```yaml
id: process_example
label: Process Plugins Example
source:
plugin: embedded_data
data_rows:
- unique_id: 1
first_name: FELIX
last_name: DELATTRE
- unique_id: 2
first_name: BENJAMIN
last_name: MELANÇON
- unique_id: 3
first_name: STEFAN
last_name: FREUDENBERG
ids:
unique_id:
type: integer
process:
type:
plugin: default_value
default_value: page
title:
plugin: concat
source:
- first_name
- last_name
delimiter: ' '
destination:
plugin: entity:node
```
The [concat](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Concat.php/class/Concat) plugin can be used to glue together an arbitrary number of strings. Its `source` property contains an array of all the values that you want put together. The `delimiter` is an optional parameter that defines a string to add between the elements as they are concatenated. If not set, there will be no separation between the elements in the concatenated result. This plugin has an **important limitation**. You cannot use strings literals as part of what you want to concatenate. For example, joining the string `Hello` with the value of the `first_name` column. All the values to concatenate need to provided by the source plugin or fields already available in the process pipeline. It is possible to leverage a feature called source constants to concatenate string literals. We will talk about this in chapter !!!.
To execute the above migration, you need to enable the `process_example` module. Open a terminal, switch directories to your Drupal's webroot, and execute the following command: `drush migrate:import process_example`. If the migration fails, refer to the end of the chapter 2 for debugging information. If it works, you will see three basic pages whose title contains the names of some of my Drupal mentors. #DrupalThanks
## Chaining process plugins
Good progress so far, but the feature has not been fully implemented. You still need to change the capitalization so that only the first letter of each word in the resulting title is uppercase. Thankfully, the Migrate API allows [**chaining of process plugins**](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/migrate-process-overview#full-pipeline). This works similarly to unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned to the destination field. Let's see this in action:
```yaml
id: process_example
label: Process Plugins Example
source: ...
process:
type: ...
title:
- plugin: concat
source:
- first_name
- last_name
delimiter: ' '
- plugin: callback
callable: mb_strtolower
- plugin: callback
callable: ucwords
destination: ...
```
The [callback](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) process plugin passes a value to a PHP function and returns its result. The function to call is specified in the `callable` configuration option. The `source` option holds a value provided by the source plugin or one from process pipeline. That value is sent as the first argument to the function. Because we are using the `callback` plugin as part of a chain, the source is assumed to be the output of the previous plugin. Hence, there is no need to define a `source`. The example concatenates the first and last names, make them all lowercase, and then capitalize each word.
Relying on direct PHP function calls should be a last resort. Better alternatives include writing your own process plugins which encapsulate your business logic. The `callback` plugin is versatile. It can public methods in a PHP class. If the `source` is an array, it can be expanded to be so each element is passed as a different argument to the callable. Refer to the [plugin documentation](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) for various examples.
_Tip_: You should have a good understanding of your source and destination formats. In this example, one of the values to transform is `MELANÇON`. Because of the cedilla (**ç**) using `strtolower` is not adequate in this case. It would leave that character uppercase (`melanÇon`). [Multibyte string functions](https://www.php.net/manual/en/ref.mbstring.php) (mb\_\*) are required for proper transformation. `ucwords` is not one of them and would present similar issues if the first letter of a word is special characters. Also, attention should be given to the character encoding of the tables in your destination database.
_Technical note_: `mb_strtolower` is a function provided by the [mbstring](https://www.php.net/manual/en/mbstring.installation.php) PHP extension. It does not come enabled by default or you might not have it installed altogether. In those cases, the function would not be available when Drupal tries to call it. The following error is produced when trying to call a function that is not available: `The "callable" must be a valid function or method`. For this example, the error would never be triggered even if the extension is missing. That is because Drupal core depends on some Symfony packages which in turn depend on the `symfony/polyfill-mbstring` package. The latter provides a polyfill for mb\_\* functions that has been leveraged since version 8.6.x of Drupal.

123
03.txt
View file

@ -1,123 +0,0 @@
# Using process plugins for data transformation in Drupal migrations
In the previous chapter, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination or to meet business requirements. Now we will learn more about process plugins and how they work as part of the Drupal migration pipeline.
## Syntactic sugar
The Migrate API offers a lot of syntactic sugar to make it easier to write migration definition files. Field mappings in the process section are an example of this. Each of them requires a process plugin to be defined. If none is manually set, then the `get` plugin is assumed. The following two code snippets are equivalent in functionality.
```yaml
process:
title: creative_title
```
```yaml
process:
title:
plugin: get
source: creative_title
```
The `get` process plugin simply copies a value from the source to the destination without making any changes. Because this is a common operation, `get` is considered the default. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:
```yaml
process:
destination_field:
plugin: plugin_name
config_1: value_1
config_2: value_2
config_3: value_3
```
The process plugin is configured within an extra level of indentation under the destination field. The `plugin` key is required and determines which plugin to use. Then, a list of configuration options follows. Refer to the documentation of each plugin to know what options are available. Some configuration options will be required while others will be optional. For example, the `concat` plugin requires a `source`, but the `delimiter` is optional. An example of its use appears later in this chapter.
## Providing default values
Sometimes, the destination requires a property or field to be set, but that information is not present in the source. Imagine you are migrating nodes. As we have mentioned, it is recommended to write one migration file per content type. If you know in advance that for a particular migration you will always create nodes of type `Basic page`, then it would be redundant to have a column in the source with the same value for every row. The data might not be needed. Or it might not exist. In any case, the `default_value` plugin can be used to provide a value when the data is not available in the source.
```yaml
source: ...
process:
type:
plugin: default_value
default_value: page
destination:
plugin: "entity:node"
```
The above example sets the `type` property for all nodes in this migration to `page`, which is the machine name of the `Basic page` content type. Do not confuse the name of the plugin with the name of its configuration property as they happen to be the same: `default_value`. Also note that because a (content) `type` is manually set in the process section, the `default_bundle` key in the destination section is no longer required. You can see the latter being used in the example of the chapter _Writing your Drupal migration_.
## Concatenating values
Consider the following migration request: you have a source listing people with first and last name in separate columns. Both are capitalized. The two values need to be put together (concatenated) and used as the title of nodes of type `Basic page`. The character casing needs to be changed so that only the first letter of each word is capitalized. If there is a need to display them in all caps, CSS can be used for presentation. For example: `FELIX DELATTRE` would be transformed to `Felix Delattre`.
_Tip_: Question business requirements when they might produce undesired results. For instance, if you were to implement this feature as requested `DAMIEN MCKENNA` would be transformed to `Damien Mckenna`. That is not the correct capitalization for the last name `McKenna`. If automatic transformation is not possible or feasible for all variations of the source data, take notes and perform manual updates after the initial migration. Evaluate as many use cases as possible and bring them to the client's attention.
To implement this feature, let's create a new module `ud_migrations_process_intro`, create a `migrations` folder, and write a migration definition file called `udm_process_intro.yml` inside it. Download the sample module from <https://github.com/dinarcon/ud_migrations> It is the one named `UD Process Plugins Introduction` and machine name `udm_process_intro`. For this example, we assume a Drupal installation using the `standard` installation profile which comes with the `Basic Page` content type. Let's see how to handle the concatenation of first an last name.
```yaml
id: udm_process_intro
label: "UD Process Plugins Introduction"
source:
plugin: embedded_data
data_rows:
- unique_id: 1
first_name: "FELIX"
last_name: "DELATTRE"
- unique_id: 2
first_name: "BENJAMIN"
last_name: "MELANÇON"
- unique_id: 3
first_name: "STEFAN"
last_name: "FREUDENBERG"
ids:
unique_id:
type: integer
process:
type:
plugin: default_value
default_value: page
title:
plugin: concat
source:
- first_name
- last_name
delimiter: " "
destination:
plugin: "entity:node"
```
The [concat](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Concat.php/class/Concat) plugin can be used to glue together an arbitrary number of strings. Its `source` property contains an array of all the values that you want put together. The `delimiter` is an optional parameter that defines a string to add between the elements as they are concatenated. If not set, there will be no separation between the elements in the concatenated result. This plugin has an **important limitation**. You cannot use strings literals as part of what you want to concatenate. For example, joining the string `Hello` with the value of the `first_name` column. All the values to concatenate need to be columns in the source or fields already available in the process pipeline. We will talk about the latter in a later chapter.
To execute the above migration, you need to enable the `ud_migrations_process_intro` module. Assuming you have `Migrate Run` installed, open a terminal, switch directories to your Drupal docroot, and execute the following command: `drush migrate:import udm_process_intro` Refer to the end of the _Writing your first Drupal migration_ chapter if it fails. If it works, you will see three basic pages whose title contains the names of some of my Drupal mentors. #DrupalThanks
## Chaining process plugins
Good progress so far, but the feature has not been fully implemented. You still need to change the capitalization so that only the first letter of each word in the resulting title is uppercase. Thankfully, the Migrate API allows [**chaining of process plugins**](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/migrate-process-overview#full-pipeline). This works similarly to unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned to the destination field. Let's see this in action:
```yaml
id: udm_process_intro
label: "UD Process Plugins Introduction"
source: ...
process:
type: ...
title:
- plugin: concat
source:
- first_name
- last_name
delimiter: " "
- plugin: callback
callable: mb_strtolower
- plugin: callback
callable: ucwords
destination: ...
```
The [callback](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) process plugin pass a value to a PHP function and returns its result. The function to call is specified in the `callable` configuration option. Note that this plugin expects a `source` option containing a column from the source or value of the process pipeline. That value is sent as the first argument to the function. Because we are using the `callback` plugin as part of a chain, the source is assumed to be the last output of the previous plugin. Hence, there is no need to define a `source`. So, we concatenate the columns, make them all lowercase, and then capitalize each word.
Relying on direct PHP function calls should be a last resort. Better alternatives include writing your own process plugins which encapsulates your business logic separate of the migration definition. The `callback` plugin comes with its own **limitation**. For example, you cannot pass extra parameters to the `callable` function. It will receive the specified value as its first argument and nothing else. In the above example, we could combine the calls to `mb_strtolower` and ucwords into a single call to mb_convert_case(\$source, MB_CASE_TITLE) if passing extra parameters were allowed.
_Tip_: You should have a good understanding of your source and destination formats. In this example, one of the values to want to transform is `MELANÇON`. Because of the cedilla (**ç**) using `strtolower` is not adequate in this case since it would leave that character uppercase (`melanÇon`). [Multibyte string functions](https://www.php.net/manual/en/ref.mbstring.php) (mb\_\*) are required for proper transformation. `ucwords` is not one of them and would present similar issues if the first letter of the words are special characters. Attention should be given to the character encoding of the tables in your destination database.
_Technical note_: `mb_strtolower` is a function provided by the [mbstring](https://www.php.net/manual/en/mbstring.installation.php) PHP extension. It does not come enabled by default or you might not have it installed altogether. In those cases, the function would not be available when Drupal tries to call it. The following error is produced when trying to call a function that is not available: `The "callable" must be a valid function or method`. For Drupal and this particular function that error would never be triggered, even if the extension is missing. That is because Drupal core depends on some Symfony packages which in turn depend on the `symfony/polyfill-mbstring` package. The latter provides a polyfill for mb\_\* functions that has been leveraged since version 8.6.x of Drupal.

126
04.md Normal file
View file

@ -0,0 +1,126 @@
# Migrating data into Drupal subfields
In the previous chapter, we learned how to use process plugins to transform data between source and destination. Some Drupal fields have multiple components. For example, formatted text fields store the text to display and the text format to apply. Image fields store a reference to the file, alternative and title text, width, and height. The migrate API refers to a field's component as **subfield**. In this chapter we will learn how to migrate into them and know which subfields are available.
Today's example will consist of migrating data into the `Body` and `Image` fields of the `Article` content type is provided by the `standard` installation profile. As in previous examples, we will create a new module and write a migration plugin. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://www.drupal.org/project/migrate_examples> The module name is `Migration Subfields Example` and its machine name is `subfields_example`. This example uses the [Migrate Files](https://www.drupal.org/project/migrate_file) module (explained later). Make sure to download and enable it. Otherwise, you will get an error like: `In DiscoveryTrait.php line 53: The "file_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are: ...`. Let's see part of the *source* definition:
```yaml
source:
plugin: embedded_data
data_rows:
-
unique_id: 1
name: Micky Metts
profile: <a href="https://www.drupal.org/u/freescholar" title="Micky on Drupal.org">freescholar</a> on Drupal.org
photo_url: https://udrupal.com/photos/freescholar.jpg
photo_description: Photo of Micky Metts
photo_width: 587
photo_height: 657
```
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image.
## Migrating formatted text
The `Body` field is of type `Text (formatted, long, with summary)`. This type of field has three components: the text *value* to present, a *summary* text, and the text *format* to use. The Migrate API allows you to write to each component separately defining subfields targets.
```yaml
process:
field_text_with_summary/value: source_value
field_text_with_summary/summary: source_summary
field_text_with_summary/format: source_format
```
The syntax to migrate into subfields is the machine name of the field and the subfield name separated by a *slash* (**/**). Then, a *colon* (**:**), a *space*, and the *value* to assign. You can set the value to a source field name for a verbatim copy or use any combination of process plugins in a chain. It is not required to migrate into all subfields. Each field determines what components are required so it is possible that not all subfields are set. In this example, only the value and text format will be set.
```yaml
process:
body/value: profile
body/format:
plugin: default_value
default_value: restricted_html
```
The `value` subfield is set to the `profile` source field. As you can see in the first snippet, it contains HTML markup. An `a` tag to be precise. Because we want the tag to be rendered as a link, a text format that allows such tag needs to be specified. There is no information about text formats in the source, but the `standard` installation of Drupal comes with a couple we can choose from. In this case, we use the `Restricted HTML` text format. The `default_value` plugin is used and set to `restricted_html`. When setting text formats, it is necessary to use their machine name. You can find them in the configuration page for each text format. For `Restricted HTML` that is /admin/config/content/formats/manage/restricted_html.
*Note*: Text formats are a whole different subject that even has security implications. To stay topic, we will only give some recommendations. When you need to migrate HTML markup, you need to know which tags appear in your source and which ones you want to allow in Drupal. Then, select a text format that accepts what you have allowed and filters out any dangerous tag like `script`. As a general rule, you should avoid setting the `format` subfield to use the `Full HTML` text format.
## Migrating images
There are [different approaches to migrating images](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-files-and-images). In this example we use the Migrate Files module. It is important to note that Drupal treats images as files with extra properties and behavior. Any approach used to migrate files can be adapted to migrate images.
```yaml
process:
field_image/target_id:
plugin: file_import
source: photo_url
file_exists: rename
id_only: TRUE
field_image/alt: photo_description
field_image/title: photo_description
field_image/width: photo_width
field_image/height: photo_height
```
When migrating any field, you have to use their *machine name* in the mapping section. For the `Image` field, the machine name is `field_image`. Knowing that, you set each of its subfields:
* `target_id` stores an integer number which Drupal uses as a reference to the file.
* `alt` stores a string that represents the alternative text. Always set one for better accessibility.
* `title` stores a string that represents the title attribute.
* `width` stores an integer number which represents the width in pixels.
* `height` stores an integer number which represents the height in pixels.
For the `target_id`, the plugin `file_import` is used. This plugin requires a `source` configuration value with a url to the file. In this case, the `photo_url` field from the *source* section is used. The `file_exists` configuration dictates what to do in case a file with the same name already exists. Valid options are `replace` to replace the existing file, `use existing` to reuse the file, and `rename` to append `_N` to the file name (where `N` is an incrementing number) until the filename is unique. When working on migrations, it is common to run them over and over until you get the expected results. Using the `use existing` option will avoid downloading multiple copies of image file. The `id_only` flag is set so that the plugin only returns that file identifier used by Drupal instead of an entity reference array. This is done because each subfield is being set manually. For the rest of the subfields (`alt`, `title`, `width`, and `height`) the value is a verbatim copy from the *source*.
!!! use existing vs replace
*Note*: The Migrate Files module offers another plugin named `image_import`. That one allows you to set all the subfields as part of the plugin configuration. An example of its use will be shown in the chapter !!!. This example uses the `file_import` plugin to emphasize the configuration of the image subfields.
## Which subfields are available?
Some fields have many subfields. [Address fields](https://www.drupal.org/project/address), for example, have 14 subfields. How can you know which ones are available? You can look for an !!!online reference or search for the information yourself by reviewing Drupal's source code. The subfields are defined in the class that provides the field type. Once you find the class, look for the `schema` method. The subfields are contained in the `columns` array of the value returned by that method. Let's see some examples:
* The `Text (plain)` field is provided by the StringItem class.
* The `Number (integer)` field is provided by the IntegerItem class.
* The `Text (formatted, long, with summary)` field is provided by the TextWithSummaryItem class.
* The `Image` field is provided by the ImageItem class.
The `schema` method defines the database columns used by the field to store its data. When migrating into subfields, processed data will ultimately be written into those database columns. Any restriction set by the database schema needs to be respected. That is why you do not use units when migrating width and height for images. The database only expects an integer number representing the corresponding values in pixels. Because of object oriented practices, sometimes you need to look at the parent class to know all the subfields that are available.
*Technical note*: By default, the Migrate API bypasses [Form API](https://api.drupal.org/api/drupal/elements/8.8.x) validations. For example, it is possible to migrate images without setting the `alt` subfield even if it marked as required in the field's configuration. If you try to edit a node that was created this way, you will get a field error indicating that the alternative text is required. Similarly, it is possible to write the `title` subfield even when the field is not expecting it, just like in today's example. If you were to enable the `title` text later, the information will be there already. For content migrations, you can enable validation by setting the `validate` configuration in the destination plugin:
```yaml
destination:
plugin: entity:node
validate: true
```
Another option is to connect to the database and check the table structures. For example, the `Image` field stores its data in the `node__field_image` table. Among others, this table has five columns named after the field's machine name and the subfield:
* field_image_target_id
* field_image_alt
* field_image_title
* field_image_width
* field_image_height
Looking at the source code or the database schema is arguably not straightforward. This information is included for reference to those who want to explore the Migrate API in more detail. You can look for migrations examples to see what subfields are available.
*Tip*: Many plugins are defined by classes whose name ends with the string `Item`. You can use your IDEs search feature to find the class using the name of the field as hint. Those classes would like in the `src/Plugin/Field/FieldType` folder of the module.
## Default subfields
Every Drupal field has at least one subfield. For example, `Text (plain)` and `Number (integer)` defines only the `value` subfield. The following code snippets are equivalent:
```yaml
process:
field_string/value: source_value_string
field_integer/value: source_value_integer
```
```yaml
process:
field_string: source_value_string
field_integer: source_value_integer
```
In previous chapters no subfield has been manually set, but Drupal knows what to do. The Migrate API offers syntactic sugar to write shorter migration plugins. This is another example. You can safely skip the default subfield and manually set the others as needed. For `File` and `Image` fields, the default subfield is `target_id`. How does the Migrate API know what subfield is the default? You need to check the code again.
The default subfield is determined by the return value of `mainPropertyName` method of the class providing the field type. Again, object oriented practices might require looking at parent classes to find this method. The `Image` field is provided by `ImageItem` which extends `FileItem` which itself extends `EntityReferenceItem`. It is the latter that contains the `mainPropertyName` returning the string `target_id`.

122
04.txt
View file

@ -1,122 +0,0 @@
# Migrating data into Drupal subfields
In the previous chapter, we learned how to use process plugins to transform data between source and destination. Some Drupal fields have multiple components. For example, formatted text fields store the text to display and the text format to apply. Image fields store a reference to the file, alternative and title text, width, and height. The migrate API refers to a field's component as **subfield**. In this chapter we will learn how to migrate into them and know which subfields are available.
## Getting the example code
Today's example will consist of migrating data into the `Body` and `Image` fields of the `Article` content type that are available out of the box. This assumes that Drupal was installed using the `standard` installation profile. As in previous examples, we will create a new module and write a migration definition file to perform the migration. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://github.com/dinarcon/ud_migrations> The module name is `UD Migration Subfields` and its machine name is `ud_migrations_subfields`. The `id` of the example migration is `udm_subfields`. This example uses the [Migrate Files](https://www.drupal.org/project/migrate_file) module (explained later). Make sure to download and enable it. Otherwise, you will get an error like: `In DiscoveryTrait.php line 53: The "file_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...`. Let's see part of the *source* definition:
```yaml
source:
plugin: embedded_data
data_rows:
-
unique_id: 1
name: 'Michele Metts'
profile: '<a href="https://www.drupal.org/u/freescholar" title="Michele on Drupal.org">freescholar</a> on Drupal.org'
photo_url: 'https://agaric.coop/sites/default/files/2018-12/micky-cropped.jpg'
photo_description: 'Photo of Michele Metts'
photo_width: '587'
photo_height: '657'
```
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image.
## Migrating formatted text
The `Body` field is of type `Text (formatted, long, with summary)`. This type of field has three components: the full text (*value*) to present, a *summary* text, and a text *format*. The Migrate API allows you to write to each component separately defining subfields targets. The next code snippets shows how to do it:
```yaml
process:
field_text_with_summary/value: source_value
field_text_with_summary/summary: source_summary
field_text_with_summary/format: source_format
```
The syntax to migrate into subfields is the machine name of the field and the subfield name separated by a *slash* (/). Then, a *colon* (:), a *space*, and the *value*. You can set the value to a source column name for a verbatim copy or use any combination of process plugins. It is not required to migrate into all subfields. Each field determines what components are required so it is possible that not all subfields are set. In this example, only the value and text format will be set.
```yaml
process:
body/value: profile
body/format:
plugin: default_value
default_value: restricted_html
```
The `value` subfield is set to the `profile` source column. As you can see in the first snippet, it contains HTML markup. An `a` tag to be precise. Because we want the tag to be rendered as a link, a text format that allows such tag needs to be specified. There is no information about text formats in the source, but Drupal comes with a couple we can choose from. In this case, we use the `Restricted HTML` text format. Note that the `default_value` plugin is used and set to `restricted_html`. When setting text formats, it is necessary to use their machine name. You can find them in the configuration page for each text format. For `Restricted HTML` that is /admin/config/content/formats/manage/restricted_html.
*Note*: Text formats are a whole different subject that even has security implications. To keep the discussion on topic, we will only give some recommendations. When you need to migrate HTML markup, you need to know which tags appear in your source, which ones you want to allow in Drupal, and select a text format that accepts what you have whitelisted and filter out any dangerous tag like `script`. As a general rule, you should avoid setting the `format` subfield to use the `Full HTML` text format.
## Migrating images
There are [different approaches to migrating images](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-files-and-images). Today, we are going to use the Migrate Files module. It is important to note that Drupal treats images as files with extra properties and behavior. Any approach used to migrate files can be adapted to migrate images.
```yaml
process:
field_image/target_id:
plugin: file_import
source: photo_url
reuse: TRUE
id_only: TRUE
field_image/alt: photo_description
field_image/title: photo_description
field_image/width: photo_width
field_image/height: photo_height
```
When migrating any field, you have to use their *machine name* in the mapping section. For the `Image` field, the machine name is `field_image`. Knowing that, you set each of its subfields:
* `target_id` stores an integer number which Drupal uses as a reference to the file.
* `alt` stores a string that represents the alternative text. Always set one for better accessibility.
* `title` stores a string that represents the title attribute.
* `width` stores an integer number which represents the width in pixels.
* `height` stores an integer number which represents the height in pixels.
For the `target_id`, the plugin `file_import` is used. This plugin requires a `source` configuration value with a url to the file. In this case, the `photo_url` column from the *source* section is used. The `reuse` flag indicates that if a file with the same location and name exists, it should be used instead of downloading a new copy. When working on migrations, it is common to run them over and over until you get the expected results. Using the `reuse` flag will avoid creating multiple references or copies of image file, depending on the plugin configuration. The `id_only` flag is set so that the plugin only returns that file identifier used by Drupal instead of an entity reference array. This is done because the each subfield is being set manually. For the rest of the subfields (`alt`, `title`, `width`, and `height`) the value is a verbatim copy from the *source*.
*Note*: The Migrate Files module offers another plugin named `image_import`. That one allows you to set all the subfields as part of the plugin configuration. An example of its use will be shown in the next article. This example uses the `file_import` plugin to emphasize the configuration of the image subfields.
## Which subfields are available?
Some fields have many subfields. [Address fields](https://www.drupal.org/project/address), for example, have 13 subfields. How can you know which ones are available? The answer is found in the class that provides the field type. Once you find the class, look for the `schema` method. The subfields are contained in the `columns` array of the value returned by that method. Let's see some examples:
* The `Text (plain)` field is provided by the StringItem class.
* The `Number (integer)` field is provided by the IntegerItem class.
* The `Text (formatted, long, with summary)` field is provided by the TextWithSummaryItem class.
* The `Image` field is provided by the ImageItem class.
The `schema` method defines the database columns used by the field to store its data. When migrating into subfields, you are actually migrating into those particular database columns. Any restriction set by the database schema needs to be respected. That is why you do not use units when migrating width and height for images. The database only expects an integer number representing the corresponding values in pixels. Because of object oriented practices, sometimes you need to look at the parent class to know all the subfields that are available.
*Technical note*: The Migrate API bypasses [Form API](https://api.drupal.org/api/drupal/elements/8.8.x) validations. For example, it is possible to migrate images without setting the `alt` subfield even if that is set as required in the field's configuration. If you try to edit a node that was created this way, you will get a field error indicating that the alternative text is required. Similarly, it is possible to write the `title` subfield even when the field is not expecting it, just like in today's example. If you were to enable the `title` text later, the information will be there already. Remember that when using the Migrate API you are writing directly to the database.
Another option is to connect to the database and check the table structures. For example, the `Image` field stores its data in the `node__field_image` table. Among others, this table has five columns named after the field's machine name and the subfield:
* field_image_target_id
* field_image_alt
* field_image_title
* field_image_width
* field_image_height
Looking at the source code or the database schema is arguably not straightforward. This information is included for reference to those who want to explore the Migrate API in more detail. You can look for migrations examples to see what subfields are available.
*Tip*: You can use [Drupal Console](https://drupalconsole.com/) for code introspection and analysis of database table structure. Also, many plugins are defined by classes that end with the string `Item`. You can use your IDEs search feature to find the class using the name of the field as hint.
## Default subfields
Every Drupal field has at least one subfield. For example, `Text (plain)` and `Number (integer)` defines only the `value` subfield. The following code snippets are equivalent:
```yaml
process:
field_string/value: source_value_string
field_integer/value: source_value_integer
```
```yaml
process:
field_string: source_value_string
field_integer: source_value_integer
```
In examples from previous days, no subfield has been manually set, but Drupal knows what to do. As we have mentioned, the Migrate API offers syntactic sugar to write shorter migration definition files. This is another example. You can safely skip the default subfield and manually set the others as needed. For `File` and `Image` fields, the default subfield is `target_id`. How does the Migrate API know what subfield is the default? You need to check the code again.
The default subfield is determined by the return value of `mainPropertyName` method of the class providing the field type. Again, object oriented practices might require looking at the parent classes to find this method. In the case of the `Image` field, it is provided by ImageItem which extends FileItem which extends EntityReferenceItem. It is the latter that contains the `mainPropertyName` returning the string `target_id`.

139
05.md Normal file
View file

@ -0,0 +1,139 @@
# Using constants and pseudofields as data placeholders in the Drupal migration process pipeline
So far we have learned how to write basic Drupal migrations and use process plugins to transform data meeting the format expected by the destination. In the previous chapter we learned one of many approaches to migrating images. Now we will change it a bit to introduce two new migration concepts: **constants** and **pseudofields**. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the **migrate process pipeline**.
## Setting and using source constants
In the Migrate API, **source constants** are _arbitrary values that can be used later in the process pipeline_. They are set as direct children of the source section. You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the _source_ section, they are independent of the source plugin in use. The following code snippet shows a generalization for settings and using _constants_:
```yaml
source:
constants:
MY_STRING: 'https://understanddrupal.com'
MY_INTEGER: 31
MY_DECIMAL: 3.1415927
MY_ARRAY:
- 'dinarcon'
- 'dinartecc'
plugin: source_plugin_name
source_plugin_config_1: source_config_value_1
source_plugin_config_2: source_config_value_2
process:
process_destination_1: constants/MY_INTEGER
process_destination_2:
plugin: concat
source: constants/MY_ARRAY
delimiter: ' '
```
You can set as many constants as you need. Although not required by the API, writing the constants' names in all uppercase and using _underscores_ (**\_**) to separate words makes it easy to identify them. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other field provided by the _source_ plugin. Note that to use the constant you need to name the full hierarchy under the source section. That is, the word `constants` plus the name itself separated by a _slash_ (**/**) symbol. Their value can be used varbatim or transform via process plugins.
_Technical note_: The word `constants` for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of the source plugin in use. A reason to use a different name is if your source actually contains a field named `constants`. In that case you could use `defaults` or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:
```yaml
source:
defaults:
MY_VALUE: 'http://understanddrupal.com'
plugin: source_plugin_name
source_plugin_config: source_config_value
process:
process_destination: defaults/MY_VALUE
```
## Setting and using pseudofields
Similar to source constants, **pseudofields** store _arbitrary values for use later in the process pipeline_. There are some key differences. Pseudofields are set in the _process_ section. The name can be arbitrary as long as it does not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the _source_ (a field or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for setting and using _pseudofields_:
```yaml
source:
constants:
MY_BASE_URL: 'https://understanddrupal.com'
plugin: source_plugin_name
source_plugin_config_1: source_config_value_1
source_plugin_config_2: source_config_value_2
process:
title: source_column_title
_pseudo_field_1:
plugin: concat
source:
- constants/MY_BASE_URL
- source_column_relative_url
delimiter: '/'
_pseudo_field_2:
plugin: urlencode
source: '@_pseudo_field_1'
field_link/uri: '@_pseudo_field_2'
field_link/title: '@title'
```
In the above example, `_pseudo_field_1` is set to the result of a `concat` process transformation that joins a constant and a field from the source section. The result value is later used as part of a `urlencode` process transformation. Note that to use the value from `_pseudo_field_1` you have to enclose it in _quotes_ (**'**) and prepend an _at sign_ (**@**) to the name. The `_pseudo_` prefix in the name is not required. It is used to make it easier to distinguish among pseudofields and regular property or field names. The new value obtained from URL encode operation is stored in `_pseudo_field_2`. This last pseudofield is used to set the value of the `uri` subfield for `field_link`. The example could be simplified by using a single pseudofield and chaining multiple process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.
!!! REVIEW!!!
_Technical note_: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You can look for an !!!online reference or review the class defining the entity and fields attached to it. In the case of a node migration, look at the `baseFieldDefinitions` method of the `Node` class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the `Manage fields` section of the content type you are migrating into. The [Field API](https://api.drupal.org/api/drupal/core!modules!field!field.module/group/field/8.8.x) prefixes any field created via the administration interface with the string `field_`. This reduces the likelihood of name clashes. Other than these two name restrictions, _anything else can be used_. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.
## Understanding Drupal Migrate API process pipeline
The migrate process pipeline is a mechanism by which the value of any **destination property**, **field**, or **pseudofield** that has been set **can be used by anything defined later in the process section**. The fact that using a pseudofield requires enclosing its name in _quotes_ and prepending an _at sign_ is actually a requirement of the process pipeline. Lets see some examples using a node migration:
- To use the `title` property of the node entity, you would write `@title`
- To use the `field_image` field of the `Article` content type, you would write `@field_image`
- To use the `_pseudo_temp_value` pseudofield, you would write `@_pseudo_temp_value`
In the process pipeline, these values can be used just like constants and fields from the source. The only restriction is that they need to be set before being used. For those familiar with the _rewrite results_ feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in _quotes_ and prepend it with an _at sign_, you are telling the migrate API to look for that element in the process section instead of the source section.
## Migrating images using the image_import plugin
Lets practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous chapter. The Migrate Files(!!!) module provides another process plugin named `image_import`. It allows you to directly set all the subfield values in the plugin configuration itself.
The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://www.drupal.org/project/migrate_examples> The module name is `Constants and Pseudofields Example` and its machine name is `constants_pseudofields_example`. This example uses the [Migrate Files](https://www.drupal.org/project/migrate_file) module. Make sure to download and enable it.
Let's see part of the _source_ definition:
```yaml
source:
constants:
BASE_URL: 'https://udrupal.com'
PHOTO_DESCRIPTION_PREFIX: 'Photo of'
plugin: embedded_data
data_rows:
- unique_id: 1
name: Michele Metts
photo_url: photos/freescholar.jpg
photo_width: 587
photo_height: 657
```
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name and details about the image. Note that this time, the `photo_url` does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is `https://udrupal.com` so that value is stored in the BASE_URL constant. This is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant will be used to assemble a description.
Now, let's see the _process_ definition:
```yaml
process:
title: name
_pseudo_image_url:
plugin: concat
source:
- constants/BASE_URL
- photo_url
delimiter: '/'
_pseudo_image_description:
plugin: concat
source:
- constants/PHOTO_DESCRIPTION_PREFIX
- name
delimiter: ' '
field_image:
plugin: image_import
source: '@_pseudo_image_url'
file_exists: 'use existing'
alt: '@_pseudo_image_description'
title: '@title'
width: photo_width
height: photo_height
```
The `title` node property is set directly to the value of the `name` field from the source. `_pseudo_image_url` stores a valid absolute URL to the image using the BASE_URL constant and the `photo_url` _field_ from the _source_. `_pseudo_image_description` uses the PHOTO_DESCRIPTION_PREFIX constant and the `name` _field_ from the _source_ to store a description for the image.
For the `field_image` field, the `image_import` process plugin is used. This time, the subfields are not set manually like in the previous chapter. The absence of the `id_only` configuration key allows you to assign values to subfields via the `image_import` plugin directly. The URL to the image is set in the `source` key and uses the `_pseudo_image_url` pseudofield. The `alt` key allows you to set the alternative attribute for the image using the `_pseudo_image_description` pseudofield. The `title` key expects the text to use for image's title. We are reusing the `title` node property which was set at the beginning of the process pipeline. Remember that destination properties, fields, and pseudofields are available as long as they were previosly defined in the pipeline. Finally, the `width` and `height` keys use fields from the source.
**Important**: By default, the migrate API will only expand the value of the `source` configuration. That is, replace its value either by a source field, source constant, or pseudofield. Any other configuration normally is not expanded and the its specified valued is passed verbatim to the process plugin. In the case of `image_import`, the plugin itself provides a mechanism to expand the values for the `alt`, `title`, `width`, and `height` configuration options. Most plugins do not this and will use the configured value literally.

136
05.txt
View file

@ -1,136 +0,0 @@
# Using constants and pseudofields as data placeholders in the Drupal migration process pipeline
So far we have learned how to write basic Drupal migrations and use process plugins to transform data to meet the format expected by the destination. In the previous chapter we learned one of many approaches to migrating images. Now we will change it a bit to introduce two new migration concepts: **constants** and **pseudofields**. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the **migrate process pipeline**.
## Setting and using source constants
In the Migrate API, **constant** are _arbitrary values that can be used later in the process pipeline_. They are set as direct children of the source section. You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the _source_ section, they are independent of the particular source plugin in use. The following code snippet shows a generalization for settings and using _constants_:
```yaml
source:
constants:
MY_STRING: "http://understanddrupal.com"
MY_INTEGER: 31
MY_DECIMAL: 3.1415927
MY_ARRAY:
- "dinarcon"
- "dinartecc"
plugin: source_plugin_name
source_plugin_config_1: source_config_value_1
source_plugin_config_2: source_config_value_2
process:
process_destination_1: constants/MY_INTEGER
process_destination_2:
plugin: concat
source: constants/MY_ARRAY
delimiter: " "
```
You can set as many constants as you need. Although not required by the API, it is a common convention to write the constant names in all uppercase and using _underscores_ (**\_**) to separate words. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other column provided by the _source_ plugin. Note that you use the constant you need to name the full hierarchy under the source section. That is, the word `constants` and the name itself separated by a _slash_ (**/**) symbol. They can be used to copy their value directly to the destination or as part of any process plugin configuration.
_Technical note_: The word `constants` for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of your particular source plugin. A reason to use a different name is that your source actually contains a column named `constants`. In that case you could use `defaults` or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:
```yaml
source:
defaults:
MY_VALUE: "http://understanddrupal.com"
plugin: source_plugin_name
source_plugin_config: source_config_value
process:
process_destination: defaults/MY_VALUE
```
## Setting and using pseudofields
Similar to constants, **pseudofields** store _arbitrary values for use later in the process pipeline_. There are some key differences. Pseudofields are set in the _process_ section. The name can be arbitrary as long as it does not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the _source_ (a column or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for settings and using _pseudofields_:
```yaml
source:
constants:
MY_BASE_URL: "http://understanddrupal.com"
plugin: source_plugin_name
source_plugin_config_1: source_config_value_1
source_plugin_config_2: source_config_value_2
process:
title: source_column_title
my_pseudofield_1:
plugin: concat
source:
- constants/MY_BASE_URL
- source_column_relative_url
delimiter: "/"
my_pseudofield_2:
plugin: urlencode
source: "@my_pseudofield_1"
field_link/uri: "@my_pseudofield_2"
field_link/title: "@title"
```
In the above example, `my_pseudofield_1` is set to the result of a `concat` process transformation that joins a constant and a column from the source section. The result value is later used as part of a `urlencode` process transformation. Note that to use the value from `my_pseudofield_1` you have to enclose it in _quotes_ (**'**) and prepend an _at sign_ (**@**) to the name. The `pseudo_` prefix in the name is not required. In this case it is used to make it easier to distinguish among pseudofields and regular property or field names. The new value obtained from URL encode operation is stored in `my_pseudofield_2`. This last pseudofield is used to set the value of the `uri` subfield for `field_link`. The example could be simplified, for example, by using a single pseudofield and chaining process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.
_Technical note_: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You might have to look at the source for the entity and the configuration of the bundle. In the case of a node migration, look at the `baseFieldDefinitions` method of the `Node` class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the "Manage fields" section of the content type you are migrating into. The [Field API](https://api.drupal.org/api/drupal/core!modules!field!field.module/group/field/8.8.x) prefixes any field created via the administration interface with the string `field_`. This reduces the likelihood of name clashes. Other than these two name restrictions, _anything else can be used_. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.
## Understanding Drupal Migrate API process pipeline
The migrate process pipeline is a mechanism by which the value of any **destination property**, **field**, or **pseudofield** that has been set **can be used by anything defined later in the process section**. The fact that using a pseudofield requires enclosing its name in quotes and prepending an at sign is actually a requirement of the process pipeline. Lets see some examples using a node migration:
- To use the `title` property of the node entity, you would write `@title`
- To use the `field_body` field of the `Basic page` content type, you would write `@field_body`
- To use the `my_temp_value` pseudofield, you would write `@my_temp_value`
In the process pipeline, these values can be used just like constants and columns from the source. The only restriction is that they need to be set before being used. For those familiar with the "_rewrite results_" feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in _quotes_ and prepend it with an _at sign_, you are telling the migrate API to look for that element in the process section instead of the source section.
## Migrating images using the image_import plugin
Lets practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous entry. The Migrate Files module provides another process plugin named `image_import` that allows you to directly set all the subfield values in the plugin configuration itself.
As in previous examples, we will create a new module and write a migration definition file to perform the migration. It is assumed that Drupal was installed using the `standard` installation profile. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://github.com/dinarcon/ud_migrations> The module name is `UD Migration constants and pseudofields` and its machine name is `ud_migrations_constants_pseudofields`. The `id` of the example migration is `udm_constants_pseudofields`. Make sure to download and enable the Migrate Files module. Otherwise, you will get an error like: "In DiscoveryTrait.php line 53: The "image_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...".
Lets see part of the _source_ definition:
```yaml
source:
constants:
BASE_URL: "https://agaric.coop"
PHOTO_DESCRIPTION_PREFIX: "Photo of"
plugin: embedded_data
data_rows:
- unique_id: 1
name: "Michele Metts"
photo_url: "sites/default/files/2018-12/micky-cropped.jpg"
photo_width: "587"
photo_height: "657"
```
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image. Note that this time, the `photo_url` does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is `https://agaric.coop` so that value is stored in the BASE_URL constant which is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant stores the prefix to add to the name to create a photo description.
Now, lets see the _process_ definition:
```yaml
process:
title: name
pseudo_image_url:
plugin: concat
source:
- constants/BASE_URL
- photo_url
delimiter: "/"
pseudo_image_description:
plugin: concat
source:
- constants/PHOTO_DESCRIPTION_PREFIX
- name
delimiter: " "
field_image:
plugin: image_import
source: "@pseudo_image_url"
reuse: TRUE
alt: "@pseudo_image_description"
title: "@title"
width: photo_width
height: photo_height
```
The `title` node property is set directly to the value of the `name` column from the source. Then, two pseudofields. `pseudo_image_url` stores a valid absolute URL to the image using the BASE_URL constant and the `photo_url` _column_ from the _source_. `pseudo_image_description` uses the PHOTO_DESCRIPTION_PREFIX constant and the `name` _column_ from the _source_ to store a description for the image.
For the `field_image` field, the `image_import` process plugin is used. This time, the subfields are not set manually like in the previous chapter. The absence of the `id_only` configuration key, allows you to assign values to subfields simply by configuring the `image_import` plugin. The URL to the image is set in the `source` key and uses the `pseudo_image_url` pseudofield. The `alt` key allows you to set the alternative attribute for the image and in this case the `pseudo_image_description` pseudofield is used. For the `title` subfield sets the text of a subfield with the same name and in this case it is assigned the value of the `title` node property which was set at the beginning of the process pipeline. Remember that not only psedufields are available. Finally, the `width` and `height` configuration uses the columns from the source to set the values of the corresponding subfields.

View file

@ -1,32 +1,30 @@
# Tips for writing Drupal migrations and understanding their workflow
We have presented several examples so far. They started very simple and have been increasing in complexity. Until now, we have been rather optimistic. Get the sample code, install any module dependency, enable the module that defines the migration, and execute it assuming everything works on the first try. But Drupal migrations often involve a bit of trial and error. At the very least, it is an iterative process. In this chapter we are going to see what happens after **import** and **rollback** operations, how to **recover from a failed migration**, and some **tips for writing definition files**.
We have presented several examples so far. They started very simple and have been increasing in complexity. Until now, we have been rather optimistic. Get the sample code, install any module dependency, enable the module that defines the migration, and execute it assuming everything works on the first try. But Drupal migrations often involve a bit of trial and error. At the very least, it is an iterative process. In this chapter we are going to see what happens after **import** and **rollback** operations, how to **recover from a failed migration**, and some **tips for writing the migration plugin files**.
## Importing and rolling back migrations
When working on a migration project, it is common to write many migration definition files. Even if you were to have only one, it is very likely that your destination will require many field mappings. Running an _import_ operation to get the data into Drupal is the first step. With so many moving parts, it is easy not to get the expected results on the first try. When that happens, you can run a _rollback_ operation. This instructs the system to revert anything that was introduced when then migration was initially imported. After rolling back, you can make changes to the migration definition file and rebuild Drupal's cache for the system to pick up your changes. Finally, you can do another _import_ operation. Repeat this process until you get the results you expect. The following code snippet shows a basic Drupal migration workflow:
When working on a migration project, it is common to write many migration plugin files. Even if you were to have only one, it is very likely that your destination will require many field mappings. Running an _import_ operation to get the data into Drupal is the first step. With so many moving parts, it is easy not to get the expected results on the first try. When that happens, you can run a _rollback_ operation. This instructs the system to revert anything that was introduced when then migration was initially imported. After rolling back, you can make changes to the migration plugin file and rebuild Drupal's cache for the system to pick up your changes. Finally, you can do another _import_ operation. Repeat this process until you get the results you expect. The following code snippet shows a basic Drupal migration workflow:
```console
# 1) Run the migration.
$ drush migrate:import udm_subfields
$ drush migrate:import first_example
# 2) Rollback migration because the expected results were not obtained.
$ drush migrate:rollback udm_subfields
$ drush migrate:rollback first_example
# 3) Change the migration definition file.
# 3) Change the migration plugin file.
# 4) Rebuild caches for changes to be picked up.
# 4) Rebuild caches for changes to take effect.
$ drush cache:rebuild
# 5) Run the migration again.
$ drush migrate:import udm_subfields
$ drush migrate:import first_example
```
The example above assumes you are using Drush to run the migration commands. Specifically, the commands provided by Migrate Run or Migrate Tools. You pick one or the other, but not both as the commands provided for two modules are the same. If you were to have both enabled, they will conflict with each other and fail. Another thing to note is that the example uses Drush 9. There were major refactorings between versions 8 and 9 which included changes to the name of the commands. Finally, `udm_subfields` is the `id` of the migration to run. You can find the full code in the Migrating data into Drupal subfields chapter.
In all cases, `first_example` is the `id` of the migration to run. You can find the full code in Chapter 2!!!. Anytime you modify the plugin files, you need to rebuild Drupal's caches for the changes to take effect. This is the procedure to follow when creating the YAML files using Migrate API core features and placing them under the `migrations` directory.
_Tip_: You can use Drush command aliases to write shorter commands. Type `drush [command-name] --help` for a list of the available aliases.
_Technical note_: To pick up changes to the definition file you need to rebuild Drupal's caches. This is the procedure to follow when creating the YAML files using Migrate API core features and placing them under the `migrations` directory. It is also possible to define migrations as configuration entities using the Migrate Plus module. In those cases, the YAML files follow a different naming convention and are placed under the `config/install` directory. For picking up changes in this case, you need to sync the YAML definition using [configuration management](https://www.drupal.org/docs/configuration-management/managing-your-sites-configuration) workflows. This will be covered in a future chapter.
_Note_: It is also possible to define migrations as configuration entities using the Migrate Plus module. In those cases, the YAML files follow a different naming convention and are placed under the `config/install` directory. For picking up changes in this case, you need to sync the YAML definition using [configuration management](https://www.drupal.org/docs/configuration-management/managing-your-sites-configuration) workflows. This will be covered in chapter !!!.
## Stopping and resetting migrations
@ -36,30 +34,42 @@ You can check the state of any migration by running the `drush migrate:status` c
```console
# 1) Run the migration.
$ drush migrate:import udm_process_intro
$ drush migrate:import first_example
# 2) Some non recoverable error occurs. Check the status of the migration.
$ drush migrate:status udm_process_intro
$ drush migrate:status first_example
# 3) Stop the migration.
$ drush migrate:stop udm_process_intro
$ drush migrate:stop first_example
# 4) Reset the status to idle.
$ drush migrate:reset-status udm_process_intro
$ drush migrate:reset-status first_example
# 5) Rollback migration because the expected results were not obtained.
$ drush migrate:rollback udm_process_intro
$ drush migrate:rollback first_example
# 6) Change the migration definition file.
# 6) Change the migration plugin file.
# 7) Rebuild caches for changes to be picked up.
# 7) Rebuild caches for changes to take effect.
$ drush cache:rebuild
# 8) Run the migration again.
$ drush migrate:import udm_process_intro
$ drush migrate:import first_example
```
_Tip_: The errors thrown by the Migrate API might not provide enough information to determine what went wrong. An excellent way to familiarize yourselves with the possible errors is by intentionally braking working migrations. In the example repository for this book there are many migrations you can modify. Try anything that comes to mind: not leaving a space after a _colon_ (**:**) in a key-value assignment; not using proper indentation; using wrong subfield names; using invalid values in property assignments; etc. You might be surprised by how Migrate API deals with such errors. Also note that many other Drupal APIs are involved. For example, you might get a YAML file parse error or an [Entity API](https://www.drupal.org/docs/8/api/entity-api) save error. When you have seen an error before, it is usually faster to identify the cause and fix it in the future.
All the commands above are provided by Drush itself. The Migrate Tools module is a migration runner that offers additional commands and provides extra features to some of the ones listed above. It adds the `migrate:tree` command to show a tree of migration dependencies. For `migrate:import`, it add additional flags like `--group` and `--continue-on-failure`. You can use the following to know which commands are available in your installation and get details on usage:
```console
# List of all migration related commands.
$ drush list --filter=migrate
# Get information on usage and available flags using --help
$ drush [command-name] --help
```
_Tip_: You can use Drush command aliases to write shorter commands. For example, `drush mim first_example` to start an import operation. When using `drush list` that aliases appear in parenthesis next to the command name.
The errors thrown by the Migrate API might not provide enough information to determine what went wrong. An excellent way to familiarize yourselves with possible errors is by intentionally braking working migrations. In the example repository for this book there are many migrations you can modify. Try anything that comes to mind: not leaving a space after a _colon_ (**:**) in a key-value assignment; not using proper indentation; using wrong subfield names; using invalid values in property assignments; etc. You might be surprised by how Migrate API deals with such errors. Also note that many other Drupal APIs are involved. For example, you might get a YAML file parse error or an [Entity API](https://www.drupal.org/docs/8/api/entity-api) save exception. When you have seen an error before, it is usually faster to identify the cause and fix it in the future. In chapters !!! we will talk about debugging migrations.
## What happens when you rollback a Drupal migration?
@ -69,21 +79,21 @@ For example, when using the `file_import` or `image_import` plugins provided by
In the next chapter we are going to start talking about migration dependencies. What happens with dependent migrations (e.g. files and paragraphs) when the migration for host entity (e.g. node) is rolled back? In this case, the Migrate API will perform an entity delete operation on the node. When this happens, referenced files are kept in the system, but paragraphs are automatically deleted. For the curious, this behavior for paragraphs is actually determined by its module dependency: [Entity Reference Revisions](https://www.drupal.org/project/entity_reference_revisions). We will talk more about paragraphs migrations in future chapters.
The moral of the story is that the behavior migration system might be affected by other Drupal APIs. And in the case of _rollback_ operations, make sure to read the documentation or test manually to find out when migrations clean after themselves and when they do not.
The moral of the story is that the resulting state of the system might be affected by other Drupal APIs. And in the case of _rollback_ operations, make sure to read the documentation or test manually to find out when migrations clean after themselves and when they do not.
_Note_: The focus of this section was [content entity](https://www.drupal.org/docs/8/api/entity-api/content-entity) migrations. The general idea can be applied to [configuration entities](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-configuration) or any custom target of the ETL process.
## Re-import or update migrations
We just mentioned that Migrate API issues an entity delete action when rolling back a migration. This has another important side effect. Entity IDs (nid, uid, tid, fid, etc.) are going to change every time you _rollback_ an _import_ again. Depending on auto generated IDs is generally not a good idea. But keep it in mind in case your workflow might be affected. For example, if you are running migrations in a content staging environment, references to the migrated entities can break if their IDs change. Also, if you were to manually update the migrated entities to clean up edge cases, those changes would be lost if you _rollback_ and _import_ again. Finally, keep in mind test data might remain in the system, as described in the previous section, which could find its way to production environments.
We just mentioned that Migrate API triggers an entity delete action when rolling back a migration. This has another important side effect. Entity IDs (`nid`, `uid`, `tid`, `fid`, `mid`, etc.) are going to change every time you _rollback_ and _import_ again. Depending on auto generated IDs is generally not a good idea. But keep it in mind in case your workflow might rely on them. For example, if you are running migrations in a content staging environment, references to the migrated entities can break if their IDs change. Also, if you were to manually update the migrated entities to clean up edge cases, those changes would be lost if you _rollback_ and _import_ again. As described in the previous section, test data might remain in the system after a rollback so make sure to clean things up when deploying to production environments.
An alternative to rolling back a migration is to not execute this operation at all. Instead, you run an _import_ operation again using the `update` flag. This tells the system that in addition to migrating unprocessed items from the source, you also want to update items that were previously imported using their current values. To do this, the Migrate API relies on _source identifiers_ and _map tables_. You might want to consider this option when your source changes overtime, when you have a large number of records to import, or when you want to execute the same migration many times on a schedule.
An alternative to rolling back a migration is to not execute this operation at all. Instead, you run an _import_ operation again using the `--update` flag. This tells the system that in addition to migrating unprocessed items from the source, you also want to update items that were previously imported using their current values. To do this, the Migrate API relies on _source identifiers_ and _map tables_. You might want to consider this option when your source changes overtime, when you have a large number of records to import, or when you want to execute the same migration many times on a schedule.
_Note_: On import operations, the Migrate API issues an entity save action.
## Tips for writing Drupal migrations
When working on migration projects, you might end up with many migration definition files. They can set dependencies on each other. Each file might contain a significant number of field mappings. There are many things you can do to make Drupal migrations more straightforward. For example, practicing with different migration scenarios and studying working examples. As a reference to help you in the process of migrating into Drupal, consider these tips:
When working on migration projects, you might end up with many migration plugin files. They can set dependencies on each other. Each file might contain a significant number of field mappings. There are many things you can do to make Drupal migrations more straightforward. For example, practicing with different migration scenarios and studying working examples. As a reference to help you in the process of migrating into Drupal, consider these tips:
- Start from an existing migration. Look for an example online that does something close to what you need and modify it to your requirements.
- Pay close attention to the syntax of the YAML file. An extraneous space or wrong indentation level can break the whole migration.

View file

@ -20,18 +20,18 @@ The _source_ of a migration is independent of its _destination_. The following c
```yaml
source:
constants:
SOURCE_DOMAIN: "https://agaric.coop"
SOURCE_DOMAIN: "https://udrupal.com"
DRUPAL_FILE_DIRECTORY: "public://portrait/"
plugin: embedded_data
data_rows:
- photo_id: "P01"
photo_url: "sites/default/files/2018-12/micky-cropped.jpg"
photo_url: "photos/freescholar.jpg"
- photo_id: "P02"
photo_url: ""
- photo_id: "P03"
photo_url: "sites/default/files/pictures/picture-94-1480090110.jpg"
photo_url: "photos/gnuget.jpg"
- photo_id: "P04"
photo_url: "sites/default/files/2019-01/clayton-profile-medium.jpeg"
photo_url: "photos/cedewey.jpg"
ids:
photo_id:
type: string
@ -103,7 +103,7 @@ psf_source_image_path:
- plugin: urlencode
```
The end result of this operation will be something like `https://agaric.coop/sites/default/files/2018-12/micky-cropped.jpg`. Note that the `concat` and `url_encode` plugins are used just like in the previous step. A subtle difference is that a `delimiter` is specifying in the concatenation step. This is because, contrary to the `DRUPAL_FILE_DIRECTORY` constant, the `SOURCE_DOMAIN` constant does not end with a _slash_ (**/**). This was done intentionally to highlight two things. First, it is important to understand your source data. Second, you can transform it as needed by using various process plugins.
The end result of this operation will be something like `https://udrupal.com/photos/freescholar.jpg`. Note that the `concat` and `url_encode` plugins are used just like in the previous step. A subtle difference is that a `delimiter` is specifying in the concatenation step. This is because, contrary to the `DRUPAL_FILE_DIRECTORY` constant, the `SOURCE_DOMAIN` constant does not end with a _slash_ (**/**). This was done intentionally to highlight two things. First, it is important to understand your source data. Second, you can transform it as needed by using various process plugins.
## Copying the image file to Drupal

View file

View file

View file

View file

@ -25,7 +25,7 @@ access: '@created'
login: '@created'
```
The `created`, *entity property* stores a [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) of when the user was added to Drupal. The value itself is an integer number representing the number of seconds since the [epoch](https://en.wikipedia.org/wiki/Epoch_\(computing\)). For example, `280299600` represents `Sun, 19 Nov 1978 05:00:00 GMT`. Kudos to the readers who knew this is [Drupal's default `expire` HTTP header](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php#L291). Bonus points if you knew it was chosen in honor of [someone's birthdate](https://dri.es/about). ;-)
The `created`, *entity property* stores a [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) of when the user was added to Drupal. The value itself is an integer number representing the number of seconds since the [epoch](<https://en.wikipedia.org/wiki/Epoch_(computing)>). For example, `280299600` represents `Sun, 19 Nov 1978 05:00:00 GMT`. Kudos to the readers who knew this is [Drupal's default `expire` HTTP header](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php#L291). Bonus points if you knew it was chosen in honor of [someone's birthdate](https://dri.es/about). ;-)
Back to the migration, you need to transform the provided date from `Month day, year` format to a UNIX timestamp. To do this, you use the [format_date](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21FormatDate.php/class/FormatDate) plugin. The `from_format` is set to `F j, Y` which means your source date consists of:

View file

View file

View file

View file

@ -98,9 +98,9 @@ The final example will show a slight variation of the previous configuration. Th
| | |
|-|-|-|
| P01 | https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg |
| P02 | https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg |
| P03 | https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg |
| P01 | https://udrupal.com/photos/freescholar.jpg |
| P02 | https://udrupal.com/photos/mlncn.jpg |
| P03 | https://udrupal.com/photos/sfreudenberg.jpg |
```yaml

View file

@ -41,7 +41,7 @@ This migration will reuse the same configuration from the [introduction to parag
"udm_photos": [
{
"photo_id": "P01",
"photo_url": "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg",
"photo_url": "https://udrupal.com/photos/freescholar.jpg",
"photo_dimensions": [240, 351]
},
{...},
@ -207,7 +207,7 @@ Let's consider an example where the records to migrate have *more data than need
"udm_photos": [
{
"photo_id": "P01",
"photo_url": "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg",
"photo_url": "https://udrupal.com/photos/freescholar.jpg",
"photo_dimensions": [240, 351]
},
{...},

View file

@ -44,7 +44,7 @@ This migration will reuse the same configuration from the [introduction to parag
</udm_book_paragraph>
<udm_photos>
<photo_id>P01</photo_id>
<photo_url>https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg</photo_url>
<photo_url>https://udrupal.com/photos/freescholar.jpg</photo_url>
<photo_dimensions>
<width>240</width>
<height>351</height>
@ -229,7 +229,7 @@ Let's consider an example where the elements to migrate have *more data than nee
<data>
<udm_photos>
<photo_id>P01</photo_id>
<photo_url>https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg</photo_url>
<photo_url>https://udrupal.com/photos/freescholar.jpg</photo_url>
<photo_dimensions>
<width>240</width>
<height>351</height>

View file

View file

View file

@ -78,9 +78,9 @@ destination:
Now let's consider an example of a spreadsheet file that does not have a header row. This example is for the *image* migration and uses a Microsoft Excel file. The following snippets shows the `UD Example Sheet` worksheet and the configuration of the *source* plugin:
```
P01, https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg
P02, https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg
P03, https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg
P01, https://udrupal.com/photos/freescholar.jpg
P02, https://udrupal.com/photos/mlncn.jpg
P03, https://udrupal.com/photos/sfreudenberg.jpg
```
```yaml

View file

View file

View file

View file

View file

View file

View file

View file

@ -42,7 +42,7 @@ $ drush migrate:import udm_csv_source_image --migrate-debug --limit=1
└──────────────────────────────────────────────────────────────────────────────┘
array (10) [
'photo_id' => string (3) "P01"
'photo_url' => string (74) "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg"
'photo_url' => string (74) "https://udrupal.com/photos/freescholar.jpg"
'path' => string (76) "modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv"
'ids' => array (1) [
string (8) "photo_id"
@ -69,7 +69,7 @@ array (10) [
array (4) [
'psf_destination_filename' => string (25) "picture-15-1421176712.jpg"
'psf_destination_full_path' => string (25) "picture-15-1421176712.jpg"
'psf_source_image_path' => string (74) "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg"
'psf_source_image_path' => string (74) "https://udrupal.com/photos/freescholar.jpg"
'uri' => string (29) "./picture-15-1421176712_6.jpg"
]
┌──────────────────────────────────────────────────────────────────────────────┐

View file

View file

View file

43
course.md Normal file
View file

@ -0,0 +1,43 @@
# Understand Drupal Migrations Course
![Understand Drupal Migrations Course](https://understanddrupal.com/sites/default/files/inline-images/drupal-migrations-course-banner.png)
Learn Drupal Migrations efficiently with practical examples with our video course. It distills years of experience into a series of video tutorials to help you leverage the Migrate API effectively! You will learn different migration concepts and how they work together through practical examples. Do you need to migrate data from CSV and JSON files? What about migrating into paragraphs or media entities? Are you having troubles updating previously migrated data? All of that and much more is covered. Along the way, we will give you tips for writing and executing migrations. We also explain how to debug migration when they fail. No previous migration experience is required! And no need to be a developer either. This course was prepared with site builders in mind. That being said, tips and tricks for developers are sprinkled all over the place.
I> Get the course at <https://understanddrupal.com/migrations>
## Content
- Import data from CSV and JSON files.
- Learn to run migrations from the user interface and the command line with Drush.
- Transform the data to populate taxonomy, date, image, file, and address fields.
- Get content into media entities.
- Get content into multi value, nested paragraphs.
- Migrate URL aliases and metatags.
- Write a custom migration plugin
- Parse HTML with custom attributes into separate Drupal fields.
- Update previously migrated data.
- Dynamically modify migrations to avoid leaking API credentials.
- Debugging procedure and recommendations.
Get the course at <https://understanddrupal.com/migrations>
## Testimonials
> Mauricio knows his stuff when it comes to migrations. He's taught lots of courses on the subject and I think you will find his course super valuable.
**Lucas Hedding** - Drupal's Migrate API co-maintainer
> Your training on migrations at @drupalconNA was one of the best trainings I have ever attended.
**Kaleem Clarkson** - Drupal front-end developer and expert site builder
> If you want to learn about the Migrate API in #Drupal, then you should sign up for @dinarcon's course! Mauricio and I both help out newbies on the migration channel in Drupal Slack. These days, I answer many questions by giving a link to his "31 days of Drupal migrations". This blog covers the basics and more, with links to working code. This series is a great resource for us all!
**Benji Fisher** - Drupal's Migrate API co-maintainer
> I have never been in a training with so much professionalism and clear instruction as I was with Mauricio in the trainings he provided. I am amazed at how organized his presentation was, and how well he was able to cover the information. I will definitely be telling all of my colleagues. Thank you so much!
**Travis Butterfield** - Web Site Technician at Arizona State University
Get the course at <https://understanddrupal.com/migrations>

View file

@ -1,3 +1,4 @@
course.txt
01.txt
02.txt
03.txt

5
scripts/export_to_txt.sh Executable file
View file

@ -0,0 +1,5 @@
#!/bin/bash
cd export
ls * | grep -v Book.txt | xargs rm
cd ..
for old in *.md; do cp $old export/`basename $old .md`.txt; done