Initial commit
This commit is contained in:
commit
ac03bd07a4
11 changed files with 1077 additions and 0 deletions
27
01.md
Normal file
27
01.md
Normal file
|
@ -0,0 +1,27 @@
|
||||||
|
# Drupal migrations: Understanding the ETL process
|
||||||
|
|
||||||
|
The Migrate API is a very flexible and powerful system that allows you to collect data from different locations and store them in Drupal. It is, in fact, a full-blown extract, transform, and load (ETL) framework. For instance, it could produce CSV files. Its primary use is to create Drupal content entities: nodes, users, files, comments, etc. The API is thoroughly [documented](https://www.drupal.org/docs/drupal-apis/migrate-api), and their maintainers are very active in the #migration [slack channel](https://www.drupal.org/slack) for those needing assistance. The use cases for the Migrate API are numerous and vary greatly. Today we are starting a blog post series that will cover different migrate concepts so that you can apply them to your particular project.
|
||||||
|
|
||||||
|
## Understanding the ETL process
|
||||||
|
|
||||||
|
Extract, transform, and load (ETL) is a procedure where data is collected from multiple sources, processed according to business needs, and its result stored for later use. This paradigm is not specific to Drupal. Books and frameworks abound on the topic. Let's try to understand the general idea by following a real life analogy: baking bread. To make some bread, you need to obtain various ingredients: wheat flour, salt, yeast, etc. (_extracting_). Then, you need to combine them in a process that involves mixing and baking (_transforming_). Finally, when the bread is ready, you put it into shelves for display in the bakery (_loading_). In Drupal, each step is performed by a Migrate plugin:
|
||||||
|
|
||||||
|
The extract step is provided by source plugins.
|
||||||
|
The transform step is provided by process plugins.
|
||||||
|
The load step is provided by destination plugins.
|
||||||
|
|
||||||
|
As it is the case with other systems, Drupal core offers some base functionality which can be extended by contributed modules or custom code. Out of the box, Drupal can connect to SQL databases including previous versions of Drupal. There are contributed modules to read from CSV files, XML documents, JSON and SOAP feeds, WordPress sites, LibreOffice Calc and Microsoft Office Excel files, Google Sheets, and much more.
|
||||||
|
|
||||||
|
The [list of core process plugins](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) is impressive. You can concatenate strings, explode or implode arrays, format dates, encode URLs, look up already migrated data, among other transform operations. [Migrate Plus](https://www.drupal.org/project/migrate_plus) offers more process plugins for DOM manipulation, string replacement, transliteration, etc.
|
||||||
|
|
||||||
|
Drupal core provides destination plugins for content and configuration entities. Most of the time, targets are content entities like nodes, users, taxonomy terms, comments, files, etc. It is also possible to import configuration entities like field and content type definitions. This is often used when upgrading sites from Drupal 6 or 7 to Drupal 8. Via a combination of source, process, and destination plugins, it is possible to write Commerce Product Variations, Paragraphs, and more.
|
||||||
|
|
||||||
|
Technical note: The Migrate API defines another plugin type: **id_map**. They are used to map source IDs to destination IDs. This allows the system to keep track of records that have been imported and roll them back if needed.
|
||||||
|
|
||||||
|
## Drupal migrations: a two step process
|
||||||
|
|
||||||
|
Performing a Drupal migration is a two step process: **writing** the migration definitions and **executing** them. Migration definitions are written in YAML format. These files contain information on how to fetch data from the _source_, how to _process_ the data, and how to store it in the _destination_. It is important to note that each migration file can only specify one source and one destination. That is, you cannot read from a CSV file and a JSON feed using the same migration definition file. Similarly, you cannot write to nodes and users from the same file. However, you can use **as many process plugins as needed** to convert your data from the format defined in the source to the format expected in the destination.
|
||||||
|
|
||||||
|
A typical migration project consists of several migration definition files. Although not required, it is recommended to write one migration file per entity bundle. If you are migrating nodes, that means writing one migration file per content type. The reason is that different content types will have different field configurations. It is easier to write and manage migrations when the destination is homogeneous. In this case, a single content type will have the same fields for all the elements to process in a particular migration.
|
||||||
|
|
||||||
|
Once all the migration definitions have been written, you need to execute the migrations. The most common way to do this is using the [Migrate Tools](https://www.drupal.org/project/migrate_tools) module which provides [Drush](https://www.drush.org/) commands and a user interface (UI) to run migrations. Note that the UI for running migrations only detect those that have been defined as configuration entities using the Migrate Plus module. This is a topic we will cover in the future. For now, we are going to stick to Drupal core's mechanisms of defining migrations. Contributed modules like Migrate Scheduler, Migrate Manifest, and Migrate Run offer alternatives for executing migrations.
|
95
02.md
Normal file
95
02.md
Normal file
|
@ -0,0 +1,95 @@
|
||||||
|
# Writing your first Drupal migration
|
||||||
|
|
||||||
|
In the previous chapter, we learned that the Migrate API is an implementation of an ETL framework. We also talked about the steps involved in writing and running migrations. Now, let's write our first Drupal migration. We are going to start with a very basic example: creating nodes out of hardcoded data. For this, we assume a Drupal installation using the `standard` installation profile, which comes with the `Basic Page` content type. As we progress through the book, the migrations will become more complete and more complex. Ideally, only one concept will be introduced at a time. When that is not possible, we will explain how different parts work together. The focus of this chapter is learning the structure of a migration definition file and how to run it.
|
||||||
|
|
||||||
|
## Writing the migration definition file
|
||||||
|
|
||||||
|
The migration definition file needs to live in a module. So, let's create a custom one named `ud_migrations_first` and set Drupal core's `migrate` module as dependencies in the \*.info.yml file.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
type: module
|
||||||
|
name: UD First Migration
|
||||||
|
description: 'Example of basic Drupal migration. Learn more at <a href="https://understanddrupal.com/migrations" title="Drupal Migrations">https://understanddrupal.com/migrations</a>.'
|
||||||
|
package: Understand Drupal
|
||||||
|
core: 8.x
|
||||||
|
dependencies:
|
||||||
|
- drupal:migrate
|
||||||
|
```
|
||||||
|
|
||||||
|
Now, let's create a folder called `migrations` and inside it, a file called `udm_first.yml`. Note that the extension is `yml` not `yaml`. The content of the file will be:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
id: udm_first
|
||||||
|
label: "UD First migration"
|
||||||
|
source:
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- unique_id: 1
|
||||||
|
creative_title: "The versatility of Drupal fields"
|
||||||
|
engaging_content: "Fields are Drupal's atomic data storage mechanism..."
|
||||||
|
- unique_id: 2
|
||||||
|
creative_title: "What is a view in Drupal? How do they work?"
|
||||||
|
engaging_content: "In Drupal, a view is a listing of information. It can a list of nodes, users, comments, taxonomy terms, files, etc..."
|
||||||
|
ids:
|
||||||
|
unique_id:
|
||||||
|
type: integer
|
||||||
|
process:
|
||||||
|
title: creative_title
|
||||||
|
body: engaging_content
|
||||||
|
destination:
|
||||||
|
plugin: "entity:node"
|
||||||
|
default_bundle: page
|
||||||
|
```
|
||||||
|
|
||||||
|
The final folder structure will look like:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
.
|
||||||
|
|-- core
|
||||||
|
|-- index.php
|
||||||
|
|-- modules
|
||||||
|
| `-- custom
|
||||||
|
| `-- ud_migrations
|
||||||
|
| `-- ud_migrations_first
|
||||||
|
| |-- migrations
|
||||||
|
| | `-- udm_first.yml
|
||||||
|
| `-- ud_migrations_first.info.yml
|
||||||
|
```
|
||||||
|
|
||||||
|
YAML is a key-value format with optional nesting of elements. It is **very sensitive to white spaces and indentation**. For example, it requires at least one space character after the colon symbol (**:**) that separates the key from the value. Also, note that each level in the hierarchy is indented by two spaces exactly. A common source of errors when writing migrations is improper spacing or indentation of the YAML files.
|
||||||
|
|
||||||
|
A quick glimpse at the file reveals the three major parts: source, process, and destination. Other keys provide extra information about the migration. There are more keys than the ones shown above. For example, it is possible to define dependencies among migrations. Another option is to tag migrations so they can be executed together. We are going to learn more about these options further in the book.
|
||||||
|
|
||||||
|
Let's review each key-value pair in the file. For the `id` key, it is customary to set its value to match the filename containing the migration definition, but without the `.yml` extension. This key serves as an internal identifier that Drupal and the Migrate API use to execute and keep track of the migration. The `id` value should be alphanumeric characters, optionally using underscores (**\_**) to separate words. As for the `label` key, it is a human readable string used to name the migration in various interfaces.
|
||||||
|
|
||||||
|
In this example, we are using the [`embedded_data`](https://api.drupal.org/api/drupal/core!modules!migrate!src!Plugin!migrate!source!EmbeddedDataSource.php/class/EmbeddedDataSource) source plugin. It allows you to define the data to migrate right inside the definition file. To configure it, you define a `data_rows` key whose value is an array of all the elements you want to migrate. Each element might contain an arbitrary number of key-value pairs representing "columns" of data to be imported.
|
||||||
|
|
||||||
|
A common use case for the `embedded_data` plugin is testing of the Migrate API itself. Another valid one is to create default content when the data is known in advance. I often present Drupal site building workshops. To save time, I use this plugin to create nodes which are later used when explaining how to create Views.
|
||||||
|
|
||||||
|
For the destination, we are using the `entity:node` plugin which allows you to create nodes of any content type. The `default_bundle` key indicates that all nodes to be created will be of type `Basic page`, by default. It is important to note that the value of the `default_bundle` key is the **machine name** of the content type. You can find it at `/admin/structure/types/manage/page`. In general, the Migrate API uses _machine names_ for the values. As we explore the system, we will point out when they are used and where to find the right ones.
|
||||||
|
|
||||||
|
In the process section, you map columns from the source to node properties and fields. The keys are entity property names or the fields' machine names. In this case, we are setting values for the `title` of the node and its `body` field. You can find the field machine names in the content type configuration page: `/admin/structure/types/manage/page/fields`. During the migration, values can be copied directly from the source or transformed via process plugins. This example makes a verbatim copy of the values from the source to the destination. The column names in the source are not required to match the destination property or field name. In this example, they are purposely different to make them easier to identify.
|
||||||
|
|
||||||
|
The repository, which will be used for many examples throughout the book, can be downloaded at <https://github.com/dinarcon/ud_migrations> Place it into the `./modules/custom` directory of the Drupal installation. The example above is part of the "UD First Migration" submodule so make sure to enable it.
|
||||||
|
|
||||||
|
## Running the migration
|
||||||
|
|
||||||
|
Let's use Drush to run the migrations with the commands provided by [Migrate Run](). Open a terminal, switch directories to Drupal's webroot, and execute the following commands.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
- $ drush pm:enable -y migrate migrate_run ud_migrations_first
|
||||||
|
- $ drush migrate:status
|
||||||
|
- $ drush migrate:import udm_first
|
||||||
|
```
|
||||||
|
|
||||||
|
**Note**: It is assumed that the Migrate Run module has been downloaded via composer or otherwise.
|
||||||
|
|
||||||
|
**Important**: All code snippets showing Drush commands assume version 10 unless otherwise noted. If you are using Drush 8 or lower, the commands' names and aliases are different. Usually, a hyphen (-) was used as delimiter in command names. For example, `pm-enable` in Drush 8 instead of `pm:enable` in Drush 10. Execute `drush list --filter=migrate` to verify the proper commands for your version of Drush.
|
||||||
|
|
||||||
|
The first command enables the core migrate module, the runner, and the custom module holding the migration definition file. The second command shows a list of all migrations available in the system. For now, only one should be listed with the migration ID `udm_first`. The third command executes the migration. If all goes well, you can visit the content overview page at /admin/content and see two basic pages created. **Congratulations, you have successfully run your first Drupal migration!!!**
|
||||||
|
|
||||||
|
_Or maybe not?_ Drupal migrations can fail in many ways. Sometimes the error messages are not very descriptive. In upcoming chapters, we will talk about recommended workflows and strategies for debugging migrations. For now, let's mention a couple of things that could go wrong with this example. If after running the `drush migrate:status` command, you do not see the `udm_first` migration, make sure that the `ud_migrations_first` module is enabled. If it is enabled, and you do not see it, rebuild the cache by running `drush cache:rebuild`.
|
||||||
|
|
||||||
|
If you see the migration, but you get a YAML parse error when running the `migrate:import` command, check your indentation. Copying and pasting from GitHub to your IDE/editor might change the spacing. An extraneous space can break the whole migration so pay close attention. If the command reports that it created the nodes, but you get a fatal error when trying to view one, it is because the content type was not set properly. Remember that the machine name of the "Basic page" content type is `page`, not `basic_page`. This error cannot be fixed from the administration interface. What you have to do is rollback the migration issuing the following command: `drush migrate:rollback udm_first`, then fix the `default_bundle` value, rebuild the cache, and import again.
|
||||||
|
|
||||||
|
**Note**: Migrate Tools could be used for running the migration. This module depends on Migrate Plus. For now, let's keep module dependencies to a minimum to focus on core Migrate functionality. Also, skipping them demonstrates that these modules, although quite useful, are not hard requirements to work on migration projects. If you decide to use Migrate Tools, make sure to uninstall Migrate Run. Both provide the same Drush commands and conflict with each other if the two are enabled.
|
123
03.md
Normal file
123
03.md
Normal file
|
@ -0,0 +1,123 @@
|
||||||
|
# Using process plugins for data transformation in Drupal migrations
|
||||||
|
|
||||||
|
In the previous chapter, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination or to meet business requirements. Now we will learn more about process plugins and how they work as part of the Drupal migration pipeline.
|
||||||
|
|
||||||
|
## Syntactic sugar
|
||||||
|
|
||||||
|
The Migrate API offers a lot of syntactic sugar to make it easier to write migration definition files. Field mappings in the process section are an example of this. Each of them requires a process plugin to be defined. If none is manually set, then the `get` plugin is assumed. The following two code snippets are equivalent in functionality.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
title: creative_title
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
title:
|
||||||
|
plugin: get
|
||||||
|
source: creative_title
|
||||||
|
```
|
||||||
|
|
||||||
|
The `get` process plugin simply copies a value from the source to the destination without making any changes. Because this is a common operation, `get` is considered the default. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
destination_field:
|
||||||
|
plugin: plugin_name
|
||||||
|
config_1: value_1
|
||||||
|
config_2: value_2
|
||||||
|
config_3: value_3
|
||||||
|
```
|
||||||
|
|
||||||
|
The process plugin is configured within an extra level of indentation under the destination field. The `plugin` key is required and determines which plugin to use. Then, a list of configuration options follows. Refer to the documentation of each plugin to know what options are available. Some configuration options will be required while others will be optional. For example, the `concat` plugin requires a `source`, but the `delimiter` is optional. An example of its use appears later in this chapter.
|
||||||
|
|
||||||
|
## Providing default values
|
||||||
|
|
||||||
|
Sometimes, the destination requires a property or field to be set, but that information is not present in the source. Imagine you are migrating nodes. As we have mentioned, it is recommended to write one migration file per content type. If you know in advance that for a particular migration you will always create nodes of type `Basic page`, then it would be redundant to have a column in the source with the same value for every row. The data might not be needed. Or it might not exist. In any case, the `default_value` plugin can be used to provide a value when the data is not available in the source.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source: ...
|
||||||
|
process:
|
||||||
|
type:
|
||||||
|
plugin: default_value
|
||||||
|
default_value: page
|
||||||
|
destination:
|
||||||
|
plugin: "entity:node"
|
||||||
|
```
|
||||||
|
|
||||||
|
The above example sets the `type` property for all nodes in this migration to `page`, which is the machine name of the `Basic page` content type. Do not confuse the name of the plugin with the name of its configuration property as they happen to be the same: `default_value`. Also note that because a (content) `type` is manually set in the process section, the `default_bundle` key in the destination section is no longer required. You can see the latter being used in the example of the chapter _Writing your Drupal migration_.
|
||||||
|
|
||||||
|
## Concatenating values
|
||||||
|
|
||||||
|
Consider the following migration request: you have a source listing people with first and last name in separate columns. Both are capitalized. The two values need to be put together (concatenated) and used as the title of nodes of type `Basic page`. The character casing needs to be changed so that only the first letter of each word is capitalized. If there is a need to display them in all caps, CSS can be used for presentation. For example: `FELIX DELATTRE` would be transformed to `Felix Delattre`.
|
||||||
|
|
||||||
|
_Tip_: Question business requirements when they might produce undesired results. For instance, if you were to implement this feature as requested `DAMIEN MCKENNA` would be transformed to `Damien Mckenna`. That is not the correct capitalization for the last name `McKenna`. If automatic transformation is not possible or feasible for all variations of the source data, take notes and perform manual updates after the initial migration. Evaluate as many use cases as possible and bring them to the client's attention.
|
||||||
|
|
||||||
|
To implement this feature, let's create a new module `ud_migrations_process_intro`, create a `migrations` folder, and write a migration definition file called `udm_process_intro.yml` inside it. Download the sample module from <https://github.com/dinarcon/ud_migrations> It is the one named `UD Process Plugins Introduction` and machine name `udm_process_intro`. For this example, we assume a Drupal installation using the `standard` installation profile which comes with the `Basic Page` content type. Let's see how to handle the concatenation of first an last name.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
id: udm_process_intro
|
||||||
|
label: "UD Process Plugins Introduction"
|
||||||
|
source:
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- unique_id: 1
|
||||||
|
first_name: "FELIX"
|
||||||
|
last_name: "DELATTRE"
|
||||||
|
- unique_id: 2
|
||||||
|
first_name: "BENJAMIN"
|
||||||
|
last_name: "MELANÇON"
|
||||||
|
- unique_id: 3
|
||||||
|
first_name: "STEFAN"
|
||||||
|
last_name: "FREUDENBERG"
|
||||||
|
ids:
|
||||||
|
unique_id:
|
||||||
|
type: integer
|
||||||
|
process:
|
||||||
|
type:
|
||||||
|
plugin: default_value
|
||||||
|
default_value: page
|
||||||
|
title:
|
||||||
|
plugin: concat
|
||||||
|
source:
|
||||||
|
- first_name
|
||||||
|
- last_name
|
||||||
|
delimiter: " "
|
||||||
|
destination:
|
||||||
|
plugin: "entity:node"
|
||||||
|
```
|
||||||
|
|
||||||
|
The [`concat`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Concat.php/class/Concat) plugin can be used to glue together an arbitrary number of strings. Its `source` property contains an array of all the values that you want put together. The `delimiter` is an optional parameter that defines a string to add between the elements as they are concatenated. If not set, there will be no separation between the elements in the concatenated result. This plugin has an **important limitation**. You cannot use strings literals as part of what you want to concatenate. For example, joining the string `Hello` with the value of the `first_name` column. All the values to concatenate need to be columns in the source or fields already available in the process pipeline. We will talk about the latter in a later chapter.
|
||||||
|
|
||||||
|
To execute the above migration, you need to enable the `ud_migrations_process_intro` module. Assuming you have `Migrate Run` installed, open a terminal, switch directories to your Drupal docroot, and execute the following command: `drush migrate:import udm_process_intro` Refer to the end of the _Writing your first Drupal migration_ chapter if it fails. If it works, you will see three basic pages whose title contains the names of some of my Drupal mentors. #DrupalThanks
|
||||||
|
|
||||||
|
## Chaining process plugins
|
||||||
|
|
||||||
|
Good progress so far, but the feature has not been fully implemented. You still need to change the capitalization so that only the first letter of each word in the resulting title is uppercase. Thankfully, the Migrate API allows [**chaining of process plugins**](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/migrate-process-overview#full-pipeline). This works similarly to unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned to the destination field. Let's see this in action:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
id: udm_process_intro
|
||||||
|
label: "UD Process Plugins Introduction"
|
||||||
|
source: ...
|
||||||
|
process:
|
||||||
|
type: ...
|
||||||
|
title:
|
||||||
|
- plugin: concat
|
||||||
|
source:
|
||||||
|
- first_name
|
||||||
|
- last_name
|
||||||
|
delimiter: " "
|
||||||
|
- plugin: callback
|
||||||
|
callable: mb_strtolower
|
||||||
|
- plugin: callback
|
||||||
|
callable: ucwords
|
||||||
|
destination: ...
|
||||||
|
```
|
||||||
|
|
||||||
|
The [`callback`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) process plugin pass a value to a PHP function and returns its result. The function to call is specified in the `callable` configuration option. Note that this plugin expects a `source` option containing a column from the source or value of the process pipeline. That value is sent as the first argument to the function. Because we are using the `callback` plugin as part of a chain, the source is assumed to be the last output of the previous plugin. Hence, there is no need to define a `source`. So, we concatenate the columns, make them all lowercase, and then capitalize each word.
|
||||||
|
|
||||||
|
Relying on direct PHP function calls should be a last resort. Better alternatives include writing your own process plugins which encapsulates your business logic separate of the migration definition. The `callback` plugin comes with its own **limitation**. For example, you cannot pass extra parameters to the `callable` function. It will receive the specified value as its first argument and nothing else. In the above example, we could combine the calls to mb_strtolower() and ucwords() into a single call to mb_convert_case(\$source, MB_CASE_TITLE) if passing extra parameters were allowed.
|
||||||
|
|
||||||
|
_Tip_: You should have a good understanding of your source and destination formats. In this example, one of the values to want to transform is `MELANÇON`. Because of the cedilla (**ç**) using strtolower() is not adequate in this case since it would leave that character uppercase (`melanÇon`). [Multibyte string functions](https://www.php.net/manual/en/ref.mbstring.php) (mb\_\*) are required for proper transformation. ucwords() is not one of them and would present similar issues if the first letter of the words are special characters. Attention should be given to the character encoding of the tables in your destination database.
|
||||||
|
|
||||||
|
_Technical note_: `mb_strtolower` is a function provided by the [`mbstring`](https://www.php.net/manual/en/mbstring.installation.php) PHP extension. It does not come enabled by default or you might not have it installed altogether. In those cases, the function would not be available when Drupal tries to call it. The following error is produced when trying to call a function that is not available: `The "callable" must be a valid function or method`. For Drupal and this particular function that error would never be triggered, even if the extension is missing. That is because Drupal core depends on some Symfony packages which in turn depend on the `symfony/polyfill-mbstring` package. The latter provides a polyfill for mb\_\* functions that has been leveraged since version 8.6.x of Drupal.
|
122
04.md
Normal file
122
04.md
Normal file
|
@ -0,0 +1,122 @@
|
||||||
|
# Migrating data into Drupal subfields
|
||||||
|
|
||||||
|
In the previous chapter, we learned how to use process plugins to transform data between source and destination. Some Drupal fields have multiple components. For example, formatted text fields store the text to display and the text format to apply. Image fields store a reference to the file, alternative and title text, width, and height. The migrate API refers to a field's component as **subfield**. In this chapter we will learn how to migrate into them and know which subfields are available.
|
||||||
|
|
||||||
|
## Getting the example code
|
||||||
|
|
||||||
|
Today's example will consist of migrating data into the `Body` and `Image` fields of the `Article` content type that are available out of the box. This assumes that Drupal was installed using the `standard` installation profile. As in previous examples, we will create a new module and write a migration definition file to perform the migration. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://github.com/dinarcon/ud_migrations> The module name is `UD Migration Subfields` and its machine name is `ud_migrations_subfields`. The `id` of the example migration is `udm_subfields`. This example uses the [Migrate Files](https://www.drupal.org/project/migrate_file) module (explained later). Make sure to download and enable it. Otherwise, you will get an error like: `In DiscoveryTrait.php line 53: The "file_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...`. Let's see part of the *source* definition:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
-
|
||||||
|
unique_id: 1
|
||||||
|
name: 'Michele Metts'
|
||||||
|
profile: '<a href="https://www.drupal.org/u/freescholar" title="Michele on Drupal.org">freescholar</a> on Drupal.org'
|
||||||
|
photo_url: 'https://agaric.coop/sites/default/files/2018-12/micky-cropped.jpg'
|
||||||
|
photo_description: 'Photo of Michele Metts'
|
||||||
|
photo_width: '587'
|
||||||
|
photo_height: '657'
|
||||||
|
```
|
||||||
|
|
||||||
|
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image.
|
||||||
|
|
||||||
|
## Migrating formatted text
|
||||||
|
|
||||||
|
The `Body` field is of type `Text (formatted, long, with summary)`. This type of field has three components: the full text (*value*) to present, a *summary* text, and a text *format*. The Migrate API allows you to write to each component separately defining subfields targets. The next code snippets shows how to do it:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
field_text_with_summary/value: source_value
|
||||||
|
field_text_with_summary/summary: source_summary
|
||||||
|
field_text_with_summary/format: source_format
|
||||||
|
```
|
||||||
|
|
||||||
|
The syntax to migrate into subfields is the machine name of the field and the subfield name separated by a *slash* (/). Then, a *colon* (:), a *space*, and the *value*. You can set the value to a source column name for a verbatim copy or use any combination of process plugins. It is not required to migrate into all subfields. Each field determines what components are required so it is possible that not all subfields are set. In this example, only the value and text format will be set.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
body/value: profile
|
||||||
|
body/format:
|
||||||
|
plugin: default_value
|
||||||
|
default_value: restricted_html
|
||||||
|
```
|
||||||
|
|
||||||
|
The `value` subfield is set to the `profile` source column. As you can see in the first snippet, it contains HTML markup. An `a` tag to be precise. Because we want the tag to be rendered as a link, a text format that allows such tag needs to be specified. There is no information about text formats in the source, but Drupal comes with a couple we can choose from. In this case, we use the `Restricted HTML` text format. Note that the `default_value` plugin is used and set to `restricted_html`. When setting text formats, it is necessary to use their machine name. You can find them in the configuration page for each text format. For `Restricted HTML` that is /admin/config/content/formats/manage/restricted_html.
|
||||||
|
|
||||||
|
*Note*: Text formats are a whole different subject that even has security implications. To keep the discussion on topic, we will only give some recommendations. When you need to migrate HTML markup, you need to know which tags appear in your source, which ones you want to allow in Drupal, and select a text format that accepts what you have whitelisted and filter out any dangerous tag like `script`. As a general rule, you should avoid setting the `format` subfield to use the `Full HTML` text format.
|
||||||
|
|
||||||
|
## Migrating images
|
||||||
|
|
||||||
|
There are [different approaches to migrating images](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-files-and-images). Today, we are going to use the Migrate Files module. It is important to note that Drupal treats images as files with extra properties and behavior. Any approach used to migrate files can be adapted to migrate images.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
field_image/target_id:
|
||||||
|
plugin: file_import
|
||||||
|
source: photo_url
|
||||||
|
reuse: TRUE
|
||||||
|
id_only: TRUE
|
||||||
|
field_image/alt: photo_description
|
||||||
|
field_image/title: photo_description
|
||||||
|
field_image/width: photo_width
|
||||||
|
field_image/height: photo_height
|
||||||
|
```
|
||||||
|
|
||||||
|
When migrating any field, you have to use their *machine name* in the mapping section. For the `Image` field, the machine name is `field_image`. Knowing that, you set each of its subfields:
|
||||||
|
|
||||||
|
* `target_id` stores an integer number which Drupal uses as a reference to the file.
|
||||||
|
* `alt` stores a string that represents the alternative text. Always set one for better accessibility.
|
||||||
|
* `title` stores a string that represents the title attribute.
|
||||||
|
* `width` stores an integer number which represents the width in pixels.
|
||||||
|
* `height` stores an integer number which represents the height in pixels.
|
||||||
|
|
||||||
|
For the `target_id`, the plugin `file_import` is used. This plugin requires a `source` configuration value with a url to the file. In this case, the `photo_url` column from the *source* section is used. The `reuse` flag indicates that if a file with the same location and name exists, it should be used instead of downloading a new copy. When working on migrations, it is common to run them over and over until you get the expected results. Using the `reuse` flag will avoid creating multiple references or copies of image file, depending on the plugin configuration. The `id_only` flag is set so that the plugin only returns that file identifier used by Drupal instead of an entity reference array. This is done because the each subfield is being set manually. For the rest of the subfields (`alt`, `title`, `width`, and `height`) the value is a verbatim copy from the *source*.
|
||||||
|
|
||||||
|
*Note*: The Migrate Files module offers another plugin named `image_import`. That one allows you to set all the subfields as part of the plugin configuration. An example of its use will be shown in the next article. This example uses the `file_import` plugin to emphasize the configuration of the image subfields.
|
||||||
|
|
||||||
|
## Which subfields are available?
|
||||||
|
|
||||||
|
Some fields have many subfields. [Address fields](https://www.drupal.org/project/address), for example, have 13 subfields. How can you know which ones are available? The answer is found in the class that provides the field type. Once you find the class, look for the `schema` method. The subfields are contained in the `columns` array of the value returned by that method. Let's see some examples:
|
||||||
|
|
||||||
|
* The `Text (plain)` field is provided by the StringItem class.
|
||||||
|
* The `Number (integer)` field is provided by the IntegerItem class.
|
||||||
|
* The `Text (formatted, long, with summary)` field is provided by the TextWithSummaryItem class.
|
||||||
|
* The `Image` field is provided by the ImageItem class.
|
||||||
|
|
||||||
|
The `schema` method defines the database columns used by the field to store its data. When migrating into subfields, you are actually migrating into those particular database columns. Any restriction set by the database schema needs to be respected. That is why you do not use units when migrating width and height for images. The database only expects an integer number representing the corresponding values in pixels. Because of object oriented practices, sometimes you need to look at the parent class to know all the subfields that are available.
|
||||||
|
|
||||||
|
*Technical note*: The Migrate API bypasses [Form API](https://api.drupal.org/api/drupal/elements/8.8.x) validations. For example, it is possible to migrate images without setting the `alt` subfield even if that is set as required in the field's configuration. If you try to edit a node that was created this way, you will get a field error indicating that the alternative text is required. Similarly, it is possible to write the `title` subfield even when the field is not expecting it, just like in today's example. If you were to enable the `title` text later, the information will be there already. Remember that when using the Migrate API you are writing directly to the database.
|
||||||
|
|
||||||
|
Another option is to connect to the database and check the table structures. For example, the `Image` field stores its data in the `node__field_image` table. Among others, this table has five columns named after the field's machine name and the subfield:
|
||||||
|
|
||||||
|
* field_image_target_id
|
||||||
|
* field_image_alt
|
||||||
|
* field_image_title
|
||||||
|
* field_image_width
|
||||||
|
* field_image_height
|
||||||
|
|
||||||
|
Looking at the source code or the database schema is arguably not straightforward. This information is included for reference to those who want to explore the Migrate API in more detail. You can look for migrations examples to see what subfields are available.
|
||||||
|
|
||||||
|
*Tip*: You can use [Drupal Console](https://drupalconsole.com/) for code introspection and analysis of database table structure. Also, many plugins are defined by classes that end with the string `Item`. You can use your IDEs search feature to find the class using the name of the field as hint.
|
||||||
|
|
||||||
|
## Default subfields
|
||||||
|
|
||||||
|
Every Drupal field has at least one subfield. For example, `Text (plain)` and `Number (integer)` defines only the `value` subfield. The following code snippets are equivalent:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
field_string/value: source_value_string
|
||||||
|
field_integer/value: source_value_integer
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
field_string: source_value_string
|
||||||
|
field_integer: source_value_integer
|
||||||
|
```
|
||||||
|
|
||||||
|
In examples from previous days, no subfield has been manually set, but Drupal knows what to do. As we have mentioned, the Migrate API offers syntactic sugar to write shorter migration definition files. This is another example. You can safely skip the default subfield and manually set the others as needed. For `File` and `Image` fields, the default subfield is `target_id`. How does the Migrate API know what subfield is the default? You need to check the code again.
|
||||||
|
|
||||||
|
The default subfield is determined by the return value of `mainPropertyName` method of the class providing the field type. Again, object oriented practices might require looking at the parent classes to find this method. In the case of the `Image` field, it is provided by ImageItem which extends FileItem which extends EntityReferenceItem. It is the latter that contains the `mainPropertyName` returning the string `target_id`.
|
136
05.md
Normal file
136
05.md
Normal file
|
@ -0,0 +1,136 @@
|
||||||
|
# Using constants and pseudofields as data placeholders in the Drupal migration process pipeline
|
||||||
|
|
||||||
|
So far we have learned how to write basic Drupal migrations and use process plugins to transform data to meet the format expected by the destination. In the previous chapter we learned one of many approaches to migrating images. Now we will change it a bit to introduce two new migration concepts: **constants** and **pseudofields**. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the **migrate process pipeline**.
|
||||||
|
|
||||||
|
## Setting and using source constants
|
||||||
|
|
||||||
|
In the Migrate API, **constant** are _arbitrary values that can be used later in the process pipeline_. They are set as direct children of the source section. You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the _source_ section, they are independent of the particular source plugin in use. The following code snippet shows a generalization for settings and using _constants_:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
constants:
|
||||||
|
MY_STRING: "http://understanddrupal.com"
|
||||||
|
MY_INTEGER: 31
|
||||||
|
MY_DECIMAL: 3.1415927
|
||||||
|
MY_ARRAY:
|
||||||
|
- "dinarcon"
|
||||||
|
- "dinartecc"
|
||||||
|
plugin: source_plugin_name
|
||||||
|
source_plugin_config_1: source_config_value_1
|
||||||
|
source_plugin_config_2: source_config_value_2
|
||||||
|
process:
|
||||||
|
process_destination_1: constants/MY_INTEGER
|
||||||
|
process_destination_2:
|
||||||
|
plugin: concat
|
||||||
|
source: constants/MY_ARRAY
|
||||||
|
delimiter: " "
|
||||||
|
```
|
||||||
|
|
||||||
|
You can set as many constants as you need. Although not required by the API, it is a common convention to write the constant names in all uppercase and using _underscores_ (**\_**) to separate words. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other column provided by the _source_ plugin. Note that you use the constant you need to name the full hierarchy under the source section. That is, the word `constants` and the name itself separated by a _slash_ (**/**) symbol. They can be used to copy their value directly to the destination or as part of any process plugin configuration.
|
||||||
|
|
||||||
|
_Technical note_: The word `constants` for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of your particular source plugin. A reason to use a different name is that your source actually contains a column named `constants`. In that case you could use `defaults` or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
defaults:
|
||||||
|
MY_VALUE: "http://understanddrupal.com"
|
||||||
|
plugin: source_plugin_name
|
||||||
|
source_plugin_config: source_config_value
|
||||||
|
process:
|
||||||
|
process_destination: defaults/MY_VALUE
|
||||||
|
```
|
||||||
|
|
||||||
|
## Setting and using pseudofields
|
||||||
|
|
||||||
|
Similar to constants, **pseudofields** store _arbitrary values for use later in the process pipeline_. There are some key differences. Pseudofields are set in the _process_ section. The name can be arbitrary as long as it does not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the _source_ (a column or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for settings and using _pseudofields_:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
constants:
|
||||||
|
MY_BASE_URL: "http://understanddrupal.com"
|
||||||
|
plugin: source_plugin_name
|
||||||
|
source_plugin_config_1: source_config_value_1
|
||||||
|
source_plugin_config_2: source_config_value_2
|
||||||
|
process:
|
||||||
|
title: source_column_title
|
||||||
|
my_pseudofield_1:
|
||||||
|
plugin: concat
|
||||||
|
source:
|
||||||
|
- constants/MY_BASE_URL
|
||||||
|
- source_column_relative_url
|
||||||
|
delimiter: "/"
|
||||||
|
my_pseudofield_2:
|
||||||
|
plugin: urlencode
|
||||||
|
source: "@my_pseudofield_1"
|
||||||
|
field_link/uri: "@my_pseudofield_2"
|
||||||
|
field_link/title: "@title"
|
||||||
|
```
|
||||||
|
|
||||||
|
In the above example, `my_pseudofield_1` is set to the result of a `concat` process transformation that joins a constant and a column from the source section. The result value is later used as part of a `urlencode` process transformation. Note that to use the value from `my_pseudofield_1` you have to enclose it in _quotes_ (**'**) and prepend an _at sign_ (**@**) to the name. The `pseudo_` prefix in the name is not required. In this case it is used to make it easier to distinguish among pseudofields and regular property or field names. The new value obtained from URL encode operation is stored in `my_pseudofield_2`. This last pseudofield is used to set the value of the `uri` subfield for `field_link`. The example could be simplified, for example, by using a single pseudofield and chaining process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.
|
||||||
|
|
||||||
|
_Technical note_: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You might have to look at the source for the entity and the configuration of the bundle. In the case of a node migration, look at the `baseFieldDefinitions()` method of the `Node` class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the "Manage fields" section of the content type you are migrating into. The [Field API](https://api.drupal.org/api/drupal/core!modules!field!field.module/group/field/8.8.x) prefixes any field created via the administration interface with the string `field_`. This reduces the likelihood of name clashes. Other than these two name restrictions, _anything else can be used_. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.
|
||||||
|
|
||||||
|
## Understanding Drupal Migrate API process pipeline
|
||||||
|
|
||||||
|
The migrate process pipeline is a mechanism by which the value of any **destination property**, **field**, or **pseudofield** that has been set **can be used by anything defined later in the process section**. The fact that using a pseudofield requires enclosing its name in quotes and prepending an at sign is actually a requirement of the process pipeline. Let’s see some examples using a node migration:
|
||||||
|
|
||||||
|
- To use the `title` property of the node entity, you would write `@title`
|
||||||
|
- To use the `field_body` field of the `Basic page` content type, you would write `@field_body`
|
||||||
|
- To use the `my_temp_value` pseudofield, you would write `@my_temp_value`
|
||||||
|
|
||||||
|
In the process pipeline, these values can be used just like constants and columns from the source. The only restriction is that they need to be set before being used. For those familiar with the "_rewrite results_" feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in _quotes_ and prepend it with an _at sign_, you are telling the migrate API to look for that element in the process section instead of the source section.
|
||||||
|
|
||||||
|
## Migrating images using the image_import plugin
|
||||||
|
|
||||||
|
Let’s practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous entry. The Migrate Files module provides another process plugin named `image_import` that allows you to directly set all the subfield values in the plugin configuration itself.
|
||||||
|
|
||||||
|
As in previous examples, we will create a new module and write a migration definition file to perform the migration. It is assumed that Drupal was installed using the `standard` installation profile. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://github.com/dinarcon/ud_migrations> The module name is `UD Migration constants and pseudofields` and its machine name is `ud_migrations_constants_pseudofields`. The `id` of the example migration is `udm_constants_pseudofields`. Make sure to download and enable the Migrate Files module. Otherwise, you will get an error like: "In DiscoveryTrait.php line 53: The "image_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...".
|
||||||
|
|
||||||
|
Let’s see part of the _source_ definition:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
constants:
|
||||||
|
BASE_URL: "https://agaric.coop"
|
||||||
|
PHOTO_DESCRIPTION_PREFIX: "Photo of"
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- unique_id: 1
|
||||||
|
name: "Michele Metts"
|
||||||
|
photo_url: "sites/default/files/2018-12/micky-cropped.jpg"
|
||||||
|
photo_width: "587"
|
||||||
|
photo_height: "657"
|
||||||
|
```
|
||||||
|
|
||||||
|
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image. Note that this time, the `photo_url` does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is `https://agaric.coop` so that value is stored in the BASE_URL constant which is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant stores the prefix to add to the name to create a photo description.
|
||||||
|
|
||||||
|
Now, let’s see the _process_ definition:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
title: name
|
||||||
|
pseudo_image_url:
|
||||||
|
plugin: concat
|
||||||
|
source:
|
||||||
|
- constants/BASE_URL
|
||||||
|
- photo_url
|
||||||
|
delimiter: "/"
|
||||||
|
pseudo_image_description:
|
||||||
|
plugin: concat
|
||||||
|
source:
|
||||||
|
- constants/PHOTO_DESCRIPTION_PREFIX
|
||||||
|
- name
|
||||||
|
delimiter: " "
|
||||||
|
field_image:
|
||||||
|
plugin: image_import
|
||||||
|
source: "@pseudo_image_url"
|
||||||
|
reuse: TRUE
|
||||||
|
alt: "@pseudo_image_description"
|
||||||
|
title: "@title"
|
||||||
|
width: photo_width
|
||||||
|
height: photo_height
|
||||||
|
```
|
||||||
|
|
||||||
|
The `title` node property is set directly to the value of the `name` column from the source. Then, two pseudofields. `pseudo_image_url` stores a valid absolute URL to the image using the BASE_URL constant and the `photo_url` _column_ from the _source_. `pseudo_image_description` uses the PHOTO_DESCRIPTION_PREFIX constant and the `name` _column_ from the _source_ to store a description for the image.
|
||||||
|
|
||||||
|
For the `field_image` field, the `image_import` process plugin is used. This time, the subfields are not set manually like in the previous chapter. The absence of the `id_only` configuration key, allows you to assign values to subfields simply by configuring the `image_import` plugin. The URL to the image is set in the `source` key and uses the `pseudo_image_url` pseudofield. The `alt` key allows you to set the alternative attribute for the image and in this case the `pseudo_image_description` pseudofield is used. For the `title` subfield sets the text of a subfield with the same name and in this case it is assigned the value of the `title` node property which was set at the beginning of the process pipeline. Remember that not only psedufields are available. Finally, the `width` and `height` configuration uses the columns from the source to set the values of the corresponding subfields.
|
99
06.md
Normal file
99
06.md
Normal file
|
@ -0,0 +1,99 @@
|
||||||
|
# Tips for writing Drupal migrations and understanding their workflow
|
||||||
|
|
||||||
|
We have presented several examples so far. They started very simple and have been increasing in complexity. Until now, we have been rather optimistic. Get the sample code, install any module dependency, enable the module that defines the migration, and execute it assuming everything works on the first try. But Drupal migrations often involve a bit of trial and error. At the very least, it is an iterative process. In this chapter we are going to see what happens after **import** and **rollback** operations, how to **recover from a failed migration**, and some **tips for writing definition files**.
|
||||||
|
|
||||||
|
## Importing and rolling back migrations
|
||||||
|
|
||||||
|
When working on a migration project, it is common to write many migration definition files. Even if you were to have only one, it is very likely that your destination will require many field mappings. Running an _import_ operation to get the data into Drupal is the first step. With so many moving parts, it is easy not to get the expected results on the first try. When that happens, you can run a _rollback_ operation. This instructs the system to revert anything that was introduced when then migration was initially imported. After rolling back, you can make changes to the migration definition file and rebuild Drupal's cache for the system to pick up your changes. Finally, you can do another _import_ operation. Repeat this process until you get the results you expect. The following code snippet shows a basic Drupal migration workflow:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 1) Run the migration.
|
||||||
|
$ drush migrate:import udm_subfields
|
||||||
|
|
||||||
|
# 2) Rollback migration because the expected results were not obtained.
|
||||||
|
$ drush migrate:rollback udm_subfields
|
||||||
|
|
||||||
|
# 3) Change the migration definition file.
|
||||||
|
|
||||||
|
# 4) Rebuild caches for changes to be picked up.
|
||||||
|
$ drush cache:rebuild
|
||||||
|
|
||||||
|
# 5) Run the migration again.
|
||||||
|
$ drush migrate:import udm_subfields
|
||||||
|
```
|
||||||
|
|
||||||
|
The example above assumes you are using Drush to run the migration commands. Specifically, the commands provided by Migrate Run or Migrate Tools. You pick one or the other, but not both as the commands provided for two modules are the same. If you were to have both enabled, they will conflict with each other and fail. Another thing to note is that the example uses Drush 9. There were major refactorings between versions 8 and 9 which included changes to the name of the commands. Finally, `udm_subfields` is the `id` of the migration to run. You can find the full code in the Migrating data into Drupal subfields chapter.
|
||||||
|
|
||||||
|
_Tip_: You can use Drush command aliases to write shorter commands. Type `drush [command-name] --help` for a list of the available aliases.
|
||||||
|
|
||||||
|
_Technical note_: To pick up changes to the definition file you need to rebuild Drupal's caches. This is the procedure to follow when creating the YAML files using Migrate API core features and placing them under the `migrations` directory. It is also possible to define migrations as configuration entities using the Migrate Plus module. In those cases, the YAML files follow a different naming convention and are placed under the `config/install` directory. For picking up changes in this case, you need to sync the YAML definition using [configuration management](https://www.drupal.org/docs/configuration-management/managing-your-sites-configuration) workflows. This will be covered in a future chapter.
|
||||||
|
|
||||||
|
## Stopping and resetting migrations
|
||||||
|
|
||||||
|
Sometimes, you do not get the expected results due to an oversight in setting a value. On other occasions, fatal PHP errors can occur when running the migration. The Migrate API might not be able to recover from such errors. For example, using a non-existent PHP function with the `callback` plugin. When these errors happen, the migration is left in a state where no _import_ or _rollback_ operations could be performed.
|
||||||
|
|
||||||
|
You can check the state of any migration by running the `drush migrate:status` command. Ideally, you want them in `Idle` state. When something fails during import or rollback you would get the `Importing` or `Rolling back` states. To get the migration back to `Idle` you stop the migration and reset its status. The following snippet shows how to do it:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
# 1) Run the migration.
|
||||||
|
$ drush migrate:import udm_process_intro
|
||||||
|
|
||||||
|
# 2) Some non recoverable error occurs. Check the status of the migration.
|
||||||
|
$ drush migrate:status udm_process_intro
|
||||||
|
|
||||||
|
# 3) Stop the migration.
|
||||||
|
$ drush migrate:stop udm_process_intro
|
||||||
|
|
||||||
|
# 4) Reset the status to idle.
|
||||||
|
$ drush migrate:reset-status udm_process_intro
|
||||||
|
|
||||||
|
# 5) Rollback migration because the expected results were not obtained.
|
||||||
|
$ drush migrate:rollback udm_process_intro
|
||||||
|
|
||||||
|
# 6) Change the migration definition file.
|
||||||
|
|
||||||
|
# 7) Rebuild caches for changes to be picked up.
|
||||||
|
$ drush cache:rebuild
|
||||||
|
|
||||||
|
# 8) Run the migration again.
|
||||||
|
$ drush migrate:import udm_process_intro
|
||||||
|
```
|
||||||
|
|
||||||
|
_Tip_: The errors thrown by the Migrate API might not provide enough information to determine what went wrong. An excellent way to familiarize yourselves with the possible errors is by intentionally braking working migrations. In the example repository for this book there are many migrations you can modify. Try anything that comes to mind: not leaving a space after a _colon_ (**:**) in a key-value assignment; not using proper indentation; using wrong subfield names; using invalid values in property assignments; etc. You might be surprised by how Migrate API deals with such errors. Also note that many other Drupal APIs are involved. For example, you might get a YAML file parse error or an [Entity API](https://www.drupal.org/docs/8/api/entity-api) save error. When you have seen an error before, it is usually faster to identify the cause and fix it in the future.
|
||||||
|
|
||||||
|
## What happens when you rollback a Drupal migration?
|
||||||
|
|
||||||
|
In an ideal scenario, when a migration is rolled back it _cleans after itself_. That means, it removes any entity that was created during the _import_ operation: nodes, taxonomy terms, files, etc. Unfortunately, that is not always the case. It is very important to understand this when planning and executing migrations. For example, you might not want to leave taxonomy terms or files that are no longer in use. Whether any dependent entity is removed or not has to do with how plugins or entities work.
|
||||||
|
|
||||||
|
For example, when using the `file_import` or `image_import` plugins provided by [Migrate File](https://www.drupal.org/project/migrate_file), the created files and images are not removed from the system upon rollback. When using the `entity_generate` plugin from Migrate Plus, the created entity also remains in the system after a _rollback_ operation.
|
||||||
|
|
||||||
|
In the next chapter we are going to start talking about migration dependencies. What happens with dependent migrations (e.g. files and paragraphs) when the migration for host entity (e.g. node) is rolled back? In this case, the Migrate API will perform an entity delete operation on the node. When this happens, referenced files are kept in the system, but paragraphs are automatically deleted. For the curious, this behavior for paragraphs is actually determined by its module dependency: [Entity Reference Revisions](https://www.drupal.org/project/entity_reference_revisions). We will talk more about paragraphs migrations in future chapters.
|
||||||
|
|
||||||
|
The moral of the story is that the behavior migration system might be affected by other Drupal APIs. And in the case of _rollback_ operations, make sure to read the documentation or test manually to find out when migrations clean after themselves and when they do not.
|
||||||
|
|
||||||
|
_Note_: The focus of this section was [content entity](https://www.drupal.org/docs/8/api/entity-api/content-entity) migrations. The general idea can be applied to [configuration entities](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-configuration) or any custom target of the ETL process.
|
||||||
|
|
||||||
|
## Re-import or update migrations
|
||||||
|
|
||||||
|
We just mentioned that Migrate API issues an entity delete action when rolling back a migration. This has another important side effect. Entity IDs (nid, uid, tid, fid, etc.) are going to change every time you _rollback_ an _import_ again. Depending on auto generated IDs is generally not a good idea. But keep it in mind in case your workflow might be affected. For example, if you are running migrations in a content staging environment, references to the migrated entities can break if their IDs change. Also, if you were to manually update the migrated entities to clean up edge cases, those changes would be lost if you _rollback_ and _import_ again. Finally, keep in mind test data might remain in the system, as described in the previous section, which could find its way to production environments.
|
||||||
|
|
||||||
|
An alternative to rolling back a migration is to not execute this operation at all. Instead, you run an _import_ operation again using the `update` flag. This tells the system that in addition to migrating unprocessed items from the source, you also want to update items that were previously imported using their current values. To do this, the Migrate API relies on _source identifiers_ and _map tables_. You might want to consider this option when your source changes overtime, when you have a large number of records to import, or when you want to execute the same migration many times on a schedule.
|
||||||
|
|
||||||
|
_Note_: On import operations, the Migrate API issues an entity save action.
|
||||||
|
|
||||||
|
## Tips for writing Drupal migrations
|
||||||
|
|
||||||
|
When working on migration projects, you might end up with many migration definition files. They can set dependencies on each other. Each file might contain a significant number of field mappings. There are many things you can do to make Drupal migrations more straightforward. For example, practicing with different migration scenarios and studying working examples. As a reference to help you in the process of migrating into Drupal, consider these tips:
|
||||||
|
|
||||||
|
- Start from an existing migration. Look for an example online that does something close to what you need and modify it to your requirements.
|
||||||
|
- Pay close attention to the syntax of the YAML file. An extraneous space or wrong indentation level can break the whole migration.
|
||||||
|
- Read the documentation to know which source, process, and destination plugins are available. One might exist already that does exactly what you need.
|
||||||
|
- Make sure to read the documentation for the specific plugins you are using. Many times a plugin offer optional configurations. Understand the tools at your disposal and find creative ways to combine them.
|
||||||
|
- Look for [contributed modules](https://www.drupal.org/project/project_module?f%5B0%5D=&f%5B1%5D=&f%5B2%5D=im_vid_3%3A64&f%5B3%5D=drupal_core%3A7234&f%5B4%5D=sm_field_project_type%3Afull&f%5B5%5D=&f%5B6%5D=&text=&solrsort=iss_project_release_usage+desc&op=Search) that might offer more plugins or upgrade paths from previous versions of Drupal. The Migrate ecosystem is vibrant and lots of people are contributing to it.
|
||||||
|
- When writing the migration pipeline, map one field at a time. Problems are easier to isolate if there is only one thing that could break at a time.
|
||||||
|
- When mapping a field, work on one subfield at a time if possible. Some field types like images and addresses offer many subfields. Again, try to isolate errors by introducing individual changes each time.
|
||||||
|
- There is no need to do every data transformation using the Migrate API. When there are edge cases, you can manually update those after the automated migration is **completed**. That is, no more rollback operations. You can also clean up the source data in advance to make it easier to process in Drupal.
|
||||||
|
- Commit to your code repository any and every change that produces right results. That way you can go back in time and recover a partially working migration.
|
||||||
|
- Learn about [debugging migrations](https://www.drupal.org/docs/8/api/migrate-api/debugging-migrations). We will talk about this topic in a future chapter.
|
||||||
|
- See help from the community. Migrate maintainers and enthusiasts are very active and responsive in the #migrate channel of Drupal slack.
|
||||||
|
- If you feel stuck, take a break from the computer and come back to it later. Resting can do wonders in finding solutions to hard problems.
|
128
07.md
Normal file
128
07.md
Normal file
|
@ -0,0 +1,128 @@
|
||||||
|
# Migrating files and images into Drupal
|
||||||
|
|
||||||
|
We have already covered two of many ways to migrate images into Drupal. One example allows you to set the image subfields manually. The other example uses a process plugin that accomplishes the same result using plugin configuration options. Although valid ways to migrate images, these approaches have an important limitation. The files and images are _not removed from the system upon rollback_. In the previous chapter, we talked further about this topic. Today, we are going to perform an image migration that will clear after itself when it is rolled back. Note that in Drupal images are a special case of files. Even though the example will migrate images, the same approach can be used to import any type of file. This migration will also serve as the basis for explaining migration dependencies in the next chapter.
|
||||||
|
|
||||||
|
## File entity migrate destination
|
||||||
|
|
||||||
|
All the examples so far have been about creating nodes. The migrate API is a full ETL framework able to write to different destinations. In the case of Drupal, the target can be other content entities like files, users, taxonomy terms, comments, etc. Writing to content entities is straightforward. For example, to migrate into files, the process section is configured like this:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
destination:
|
||||||
|
plugin: "entity:file"
|
||||||
|
```
|
||||||
|
|
||||||
|
You use a plugin whose name is `entity:` followed by the machine name of your target entity. In this case `file`. Other possible values are `user`, `taxonomy_term`, and `comment`. Remember that each migration definition file can only write to one destination.
|
||||||
|
|
||||||
|
## Source section definition
|
||||||
|
|
||||||
|
The _source_ of a migration is independent of its _destination_. The following code snippet shows the _source_ definition for the image migration example:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
constants:
|
||||||
|
SOURCE_DOMAIN: "https://agaric.coop"
|
||||||
|
DRUPAL_FILE_DIRECTORY: "public://portrait/"
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- photo_id: "P01"
|
||||||
|
photo_url: "sites/default/files/2018-12/micky-cropped.jpg"
|
||||||
|
- photo_id: "P02"
|
||||||
|
photo_url: ""
|
||||||
|
- photo_id: "P03"
|
||||||
|
photo_url: "sites/default/files/pictures/picture-94-1480090110.jpg"
|
||||||
|
- photo_id: "P04"
|
||||||
|
photo_url: "sites/default/files/2019-01/clayton-profile-medium.jpeg"
|
||||||
|
ids:
|
||||||
|
photo_id:
|
||||||
|
type: string
|
||||||
|
```
|
||||||
|
|
||||||
|
Note that the source contains relative paths to the images. Eventually, we will need an absolute path to them. Therefore, the `SOURCE_DOMAIN` constant is created to assemble the absolute path in the process pipeline. Also, note that one of the rows contains an empty `photo_url`. No file can be created without a proper URL. In the _process_ section we will accommodate for this. An alternative could be to filter out invalid data in a source clean up operation before executing the migration.
|
||||||
|
|
||||||
|
Another important thing to note is that the row identifier `photo_id` is of type _string_. You need to explicitly tell the system the name and type of the identifiers you want to use. The configuration for this varies slightly from one source plugin to another. For the `embedded_data` plugin, you do it using the `ids` configuration key. It is possible to have more than one source column as an identifier. For example, if the combination of two columns (e.g. name and date of birth) are required to _uniquely identify_ each element (e.g. person) in the _source_.
|
||||||
|
|
||||||
|
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD migration dependencies introduction` whose machine name is `ud_migrations_dependencies_intro`. The migration to run is `udm_dependencies_intro_image`.
|
||||||
|
|
||||||
|
## Process section definition
|
||||||
|
|
||||||
|
The fields to map in the _process_ section will depend on the target. For files and images, only one entity property is required: `uri`. Its value should be set to the file path within Drupal using stream wrappers. In this example, the public stream (`public://`) is used to store the images in a location that is publicly accessible by any visitor to the site. If the file was already in the system and we knew the path the whole _process_ section for this migration could be reduced to two lines:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
uri: source_column_file_uri
|
||||||
|
```
|
||||||
|
|
||||||
|
That is rarely the case though. Fortunately, there are many process plugins that allow you to transform the available data. When combined with constants and pseudofields, you can come up with creative solutions to produce the format expected by your destination.
|
||||||
|
|
||||||
|
## Skipping invalid records
|
||||||
|
|
||||||
|
The _source_ for this migration contains one record that lacks the URL to the photo. No image can be imported without a valid path. Let’s accommodate for this. In the same step, a pseudofield will be created to extract the name of the file out of its path:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
psf_destination_filename:
|
||||||
|
- plugin: callback
|
||||||
|
callable: basename
|
||||||
|
source: photo_url
|
||||||
|
- plugin: skip_on_empty
|
||||||
|
method: row
|
||||||
|
message: "Cannot import empty image filename."
|
||||||
|
```
|
||||||
|
|
||||||
|
The `psf_destination_filename` pseudofield uses the `callback` plugin to derive the filename from the relative path to the image. This is accomplished using the `basename` PHP function. Also, taking advantage of plugin chaining, the system is instructed to skip process the row if no filename could be obtained. For example, because an empty source value was provided. This is done by the `skip_on_empty` which is also configured log a message to indicate what happened. In this case, the message is hardcoded. You can make it dynamic to include the ID of the row that was skipped using other process plugins. This is left as an exercise to the curious reader.
|
||||||
|
|
||||||
|
_Tip_: To read the messages log during any migration, execute the following Drush command: `drush migrate:messages [migration-id]`.
|
||||||
|
|
||||||
|
## Creating the destination URI
|
||||||
|
|
||||||
|
The next step is to create the location where the file is going to be saved in the system. For this, the `psf_destination_full_path` pseudofield is used to concatenate the value of a constant defined in the source and the file named obtained in the previous step. As explained before, order is important when using pseudofields as part of the migrate process pipeline. The following snippet shows how to do it:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
psf_destination_full_path:
|
||||||
|
- plugin: concat
|
||||||
|
source:
|
||||||
|
- constants/DRUPAL_FILE_DIRECTORY
|
||||||
|
- "@psf_destination_filename"
|
||||||
|
- plugin: urlencode
|
||||||
|
```
|
||||||
|
|
||||||
|
The end result of this operation would be something like `public://portrait/micky-cropped.jpg`. The URI specifies that the image should be stored inside a `portrait` subdirectory inside Drupal’s public file system. Copying files to specific subdirectories is not required, but it helps with file organizations. Also, some hosting providers might impose limitations on the number of files per directory. Specifying subdirectories for your file migrations is a recommended practice.
|
||||||
|
|
||||||
|
Also note that after the URI is created, it gets encoded using the [`url_encode`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21UrlEncode.php/class/UrlEncode) plugin. This will replace special characters to an equivalent string literal. For example, `é` and `ç` will be converted to `%C3%A9` and `%C3%A7` respectively. Space characters will be changed to `%20`. The end result is an equivalent URI that can be used inside Drupal, as part of an email, or via another medium. Always encode any URI when working with Drupal migrations.
|
||||||
|
|
||||||
|
## Creating the source URI
|
||||||
|
|
||||||
|
The next step is to create assemble an absolute path for the source image. For this, you concatenate the domain stored in a source constant and the image relative path stored in a source column. The following snippet shows how to do it:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
psf_source_image_path:
|
||||||
|
- plugin: concat
|
||||||
|
delimiter: "/"
|
||||||
|
source:
|
||||||
|
- constants/SOURCE_DOMAIN
|
||||||
|
- photo_url
|
||||||
|
- plugin: urlencode
|
||||||
|
```
|
||||||
|
|
||||||
|
The end result of this operation will be something like `https://agaric.coop/sites/default/files/2018-12/micky-cropped.jpg`. Note that the `concat` and `url_encode` plugins are used just like in the previous step. A subtle difference is that a `delimiter` is specifying in the concatenation step. This is because, contrary to the `DRUPAL_FILE_DIRECTORY` constant, the `SOURCE_DOMAIN` constant does not end with a _slash_ (**/**). This was done intentionally to highlight two things. First, it is important to understand your source data. Second, you can transform it as needed by using various process plugins.
|
||||||
|
|
||||||
|
## Copying the image file to Drupal
|
||||||
|
|
||||||
|
Only two tasks remain to complete this image migration: download the image and assign the `uri` property of the file entity. Luckily, both steps can be accomplished at the same time using the [`file_copy`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21FileCopy.php/class/FileCopy) plugin. The following snippet shows how to do it:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
uri:
|
||||||
|
plugin: file_copy
|
||||||
|
source:
|
||||||
|
- "@psf_source_image_path"
|
||||||
|
- "@psf_destination_full_path"
|
||||||
|
file_exists: "rename"
|
||||||
|
move: FALSE
|
||||||
|
```
|
||||||
|
|
||||||
|
The source configuration of `file_copy` plugin expects an array of two values: the URI to copy the file from and the URI to copy the file to. Optionally, you can specify what happens if a file with the same name exists in the destination directory. In this case, we are instructing the system to rename the file to prevent name clashes. The way this is done is appending the string `_X` to the filename and before the file extension. The `X` is a number starting with zero (0) that keeps incrementing until the filename is unique. The `move` flag is also optional. If set to `TRUE` it tells the system that the file should be moved instead of copied. As you can guess, Drupal does not have access to the file system in the remote server. The configuration option is shown for completeness, but does not have any effect in this example.
|
||||||
|
|
||||||
|
In addition to downloading the image and place it inside Drupal’s file system, the `file_copy` also returns the destination URI. That is why this plugin can be used to assign the `uri` destination property. And that’s it, you have successfully imported images into Drupal! Clever use of the process pipeline, isn’t it? ;-)
|
||||||
|
|
||||||
|
One important thing to note is an image’s alternative text, title, width, and height are not associated with the **file entity**. That information is actually stored in a **field of type image**. This will be illustrated in the next chapter. To reiterate, the same approach to migrate images can be used to migrate any file type.
|
||||||
|
|
||||||
|
_Technical note_: The file entity contains other properties you can write to. For a list of available options check the `baseFieldDefinitions()` method of the `File` class defining the entity. Note that more properties can be available up in the class hierarchy. Also, this entity does not have multiple bundles like the node entity does.
|
120
08.md
Normal file
120
08.md
Normal file
|
@ -0,0 +1,120 @@
|
||||||
|
# Introduction to migration dependencies in Drupal
|
||||||
|
|
||||||
|
One of Drupal's biggest strengths is its data modeling capabilities. You can break the information that you need to store into individual fields and group them in content types. You can also take advantage of default behavior provided by entities like nodes, users, taxonomy terms, files, etc. Once the data has been modeled and saved into the system, Drupal will keep track of the relationship between them. In this chapter we will learn about **migration dependencies** in Drupal.
|
||||||
|
|
||||||
|
As we have seen throughout the book, the Migrate API can be used to write to different entities. One restriction though is that each migration definition can only target one type of entity at a time. Sometimes, a piece of content has references to other elements. For example, a node that includes _entity reference_ fields to users, taxonomy terms, and images. The recommended way to get them into Drupal is writing one migration definition for each. Then, you specify the relationships that exist among them.
|
||||||
|
|
||||||
|
## Breaking up migrations
|
||||||
|
|
||||||
|
When you break up your migration project into multiple, smaller migrations they are easier to manage and you have more control of process pipeline. Depending on how you write them, you can rest assured that imported data is properly deleted if you ever have to rollback the migration. You can also enforce that certain elements exist in the system before others that depend on them can be created. In this example, we are going to leverage the example from the chapter to demonstrate this. The portraits imported in the _file_ migration will be used in the _image_ field of nodes of type _article_.
|
||||||
|
|
||||||
|
You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is `UD migration dependencies introduction` whose machine name is `ud_migrations_dependencies_intro`. Last time the `udm_dependencies_intro_image` was imported. This time `udm_dependencies_intro_node` will be executed. Notice that both migrations belong to the same module.
|
||||||
|
|
||||||
|
## Writing the source and destination definition
|
||||||
|
|
||||||
|
To keep things simple, the example will only write the node title and assign the image field. A constant will be provided to create the alternative text for the images. The following snippet shows how the source section is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
constants:
|
||||||
|
PHOTO_DESCRIPTION_PREFIX: "Photo of"
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- unique_id: 1
|
||||||
|
name: "Michele Metts"
|
||||||
|
photo_file: "P01"
|
||||||
|
- unique_id: 2
|
||||||
|
name: "David Valdez"
|
||||||
|
photo_file: "P03"
|
||||||
|
- unique_id: 3
|
||||||
|
name: "Clayton Dewey"
|
||||||
|
photo_file: "P04"
|
||||||
|
ids:
|
||||||
|
unique_id:
|
||||||
|
type: integer
|
||||||
|
```
|
||||||
|
|
||||||
|
Remember that in this migration you want to use files that have already been imported. Therefore, no URLs to the image files are provided. Instead, you need a reference to the other migration. Particularly, you need a reference to the _unique identifiers_ for each element of the file migration. In the _process_ section, this value will be used to look up the portrait that will be assigned to the _image_ field.
|
||||||
|
|
||||||
|
The destination section is quite short. You only specify that the target is a _node_ entity and the content type is _article_. Remember that you need to use the machine name of the content type. The following snippet shows how the _destination_ section is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
destination:
|
||||||
|
plugin: "entity:node"
|
||||||
|
default_bundle: article
|
||||||
|
```
|
||||||
|
|
||||||
|
## Using previously imported files in image fields
|
||||||
|
|
||||||
|
To be able to reuse the previously imported files, the [`migrate_lookup`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21MigrationLookup.php/class/MigrationLookup) plugin is used. Additionally, an alternative text for the image is created using the `concat` plugin. The following snippet shows how the process section is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
title: name
|
||||||
|
field_image/target_id:
|
||||||
|
plugin: migration_lookup
|
||||||
|
migration: udm_dependencies_intro_image
|
||||||
|
source: photo_file
|
||||||
|
field_image/alt:
|
||||||
|
plugin: concat
|
||||||
|
source:
|
||||||
|
- constants/PHOTO_DESCRIPTION_PREFIX
|
||||||
|
- name
|
||||||
|
delimiter: " "
|
||||||
|
```
|
||||||
|
|
||||||
|
In Drupal, _files_ and _images_ are _entity reference_ fields. That means, they only store a pointer to the file, not the file itself. The pointer is an integer number representing the **file ID** (`fid`) inside Drupal. The `migration_lookup` plugin allows you to query the _file_ migration so imported elements can be reused in node migration.
|
||||||
|
|
||||||
|
The `migration` option indicates which migration to query specifying its **migration id**. Additionally, you indicate which columns in your _source_ match the _unique identifiers_ of the migration to query. In this case, the values of the `photo_file` column in `udm_dependencies_intro_node` matches those of the `photo_url` column in `udm_dependencies_intro_image`. If a match is found, this plugin will return the _file ID_ which can be directly assigned to the `target_id` of the image field. That is how the relationship between the two migrations is established.
|
||||||
|
|
||||||
|
_Note_: The `migration_lookup` plugin allows you to query more than one migration at a time. Refer to the documentation for details on how to set that up and why you would do it. It also offers additional configuration options.
|
||||||
|
|
||||||
|
As a good accessibility practice, an alternative text is set for the image using the `alt` subfield. Other than that, only the node _title_ is set. And with that, you have two migrations connected between them. If you were to rollback both of them, no file or node would remain in the system.
|
||||||
|
|
||||||
|
## Being explicit about migration dependencies
|
||||||
|
|
||||||
|
The _node_ migration depends on the _file_ migration. It is required for the _files_ to be migrated first before they can be used to as images for the _nodes_. In fact, in the provided example, if you were to import the nodes before the _files_, the migration would fail and no _node_ would be created. You can be explicit about migration dependencies. To do it, add a new configuration option to the node migration that lists which migrations it depends on. The following snippet shows how this is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
migration_dependencies:
|
||||||
|
required:
|
||||||
|
- udm_dependencies_intro_image
|
||||||
|
optional: []
|
||||||
|
```
|
||||||
|
|
||||||
|
The `migration_dependencies` key goes at the root level of the YAML definition file. It accepts two configuration options: `required` and `optional`. Both accept an array of **migration ids**. The `required` migrations are hard prerequisites. They need to be executed in advance or the system will refuse to import the current one. The `optional` migrations do not have to be executed in advance. But if you were to execute multiple migrations at a time, the system will run them in the order suggested by the dependency hierarchy. [^8-migdep][^8-prob]
|
||||||
|
|
||||||
|
[^8-migdep]: Learn more about migration dependencies in this article <https://www.drupal.org/docs/8/api/migrate-api/writing-migrations-for-contributed-and-custom-modules>
|
||||||
|
[^8-prob]: Check this comment on Drupal.org in case you have problems where the system reports that certain dependencies are not met: <https://www.drupal.org/project/drupal/issues/2797505#comment-12129356>
|
||||||
|
|
||||||
|
Now that the dependency among migrations has been explicitly established you have two options. Either import each migration manually in the expected order. Or, import the parent migration using the `--execute-dependencies` flag. When you do that, the system will take care of determining the order in which all migrations need to be imported. The following two snippets will produce the same result for the demo module:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
$ drush migrate:import udm_dependencies_intro_image
|
||||||
|
$ drush migrate:import udm_dependencies_intro_node
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
$ drush migrate:import udm_dependencies_intro_node --execute-dependencies
|
||||||
|
```
|
||||||
|
|
||||||
|
In this example, there are only two migrations, but you can have as many as needed. For example, a node with references to users, taxonomy terms, paragraphs, etc. Also note that the parent entity does not have to be a node. Users, taxonomy terms, and paragraphs are all fieldable entities. They can contain references the same way nodes do. In further chapters, we will talk again about migration dependencies and provide more examples.
|
||||||
|
|
||||||
|
## Tagging migrations
|
||||||
|
|
||||||
|
The core Migrate API offers another mechanism to execute multiple migrations at a time. You can **tag** them. To do that you add a `migration_tags` key at the root level of the YML definition file. Its value an array of arbitrary tag names to assign to the migration. Once set, you run them using the migrate import command with the `--tag` flag. You can also rollback migrations per tag. The first snippet shows how to set the tags and the second how to execute them:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
migration_tags:
|
||||||
|
- UD Articles
|
||||||
|
- UD Example
|
||||||
|
```
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
$ drush migrate:import --tag='UD Articles,UD Example'
|
||||||
|
$ drush migrate:rollback --tag='UD Articles,UD Example'
|
||||||
|
```
|
||||||
|
|
||||||
|
It is important to note that _tags_ and _dependencies_ are different concepts. They allow you to run multiple migrations at a time. It is possible that a migration definition file contains both, either, or neither. The tag system is used extensively in Drupal core for migrations related to upgrading to Drupal 8 from previous versions. For example, you might want to run all migrations tagged 'Drupal 7' if you are coming from that version. It is possible to specify more than one tag when running the migrate import command separating each with a _comma_ (**,**).
|
||||||
|
|
||||||
|
Note: The Migrate Plus module offers _migration groups_ to organize migrations similarly to how _tags_ work. This will be covered in a future entry. Just keep in mind that _tags_ are provided out of the box by the Migrate API. On the other hand, _migrations groups_ depend on a contributed module.
|
124
09.md
Normal file
124
09.md
Normal file
|
@ -0,0 +1,124 @@
|
||||||
|
# Migrating taxonomy terms and multivalue fields into Drupal
|
||||||
|
|
||||||
|
Today we continue the conversation about migration dependencies with a **hierarchical taxonomy terms** example. Along the way, we will present the process and syntax for migrating into **multivalue fields**. The example consists of two separate migrations. One to import taxonomy terms accounting for term hierarchy. And another to import into a multivalue taxonomy term field. Following this approach, any node and taxonomy term created by the migration process will be removed from the system upon rollback.
|
||||||
|
|
||||||
|
## Getting the code
|
||||||
|
|
||||||
|
You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is `UD multivalue taxonomy terms` whose machine name is `ud_migrations_multivalue_terms`. The two migrations to execute are `udm_dependencies_multivalue_term` and `udm_dependencies_multivalue_node`. Notice that both migrations belong to the same module.
|
||||||
|
|
||||||
|
The example assumes Drupal was installed using the `standard` installation profile. Particularly, a Tags (`tags`) taxonomy vocabulary, an Article (`article`) content type, and a Tags (`field_tags`) field that accepts multiple values. The words in parenthesis represent the machine name of each element.
|
||||||
|
|
||||||
|
## Migrating taxonomy terms and their hierarchy
|
||||||
|
|
||||||
|
The example data for the taxonomy terms migration is fruits and fruit varieties. Each row will contain the name and description of the fruit. Additionally, it is possible to define a parent term to establish hierarchy. For example, "Red grape" is a child of "Grape". Note that no _numerical identifier_ is provided. Instead, the value of the `name` is used as a `string` _identifier_ for the migration. If term names could change over time, it is recommended to have another column that did not change (e.g., an autoincrementing number). The following snippet shows how the _source_ section is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- fruit_name: "Grape"
|
||||||
|
fruit_description: "Eat fresh or prepare some jelly."
|
||||||
|
- fruit_name: "Red grape"
|
||||||
|
fruit_description: "Sweet grape"
|
||||||
|
fruit_parent: "Grape"
|
||||||
|
- fruit_name: "Pear"
|
||||||
|
fruit_description: "Eat fresh or prepare a jam."
|
||||||
|
ids:
|
||||||
|
fruit_name:
|
||||||
|
type: string
|
||||||
|
```
|
||||||
|
|
||||||
|
The destination is quite short. The target entity is set to _taxonomy terms_. Additionally, you indicate which _vocabulary_ to migrate into. If you have terms that would be stored in different vocabularies, you can use the `vid` property in the process section to assign the target vocabulary. If you write to a single one, the `default_bundle` key in the destination can be used instead. The following snippet shows how the _destination_ section is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
destination:
|
||||||
|
plugin: "entity:taxonomy_term"
|
||||||
|
default_bundle: tags
|
||||||
|
```
|
||||||
|
|
||||||
|
For the _process_ section, three entity properties are set: _name_, _description_, and _parent_. The first two are strings copied directly from the source. In the case of `parent`, it is an _entity reference_ to another _taxonomy term_. It stores the **taxonomy term id** (`tid`) of the _parent_ term. To assign its value, the `migration_lookup` plugin is configured similar to the example in the previous chapter. The difference is that, in this case, the migration to reference is the same one being defined. This sets an important consideration. _Parent terms should be migrated before their children_. This way, they can be found by the look up operation. Also note that the look up value is the term name itself, because that is what this migration set as the _unique identifier_ in the _source_ section. The following snippet shows how the _process_ section is configured:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
name: fruit_name
|
||||||
|
description: fruit_description
|
||||||
|
parent:
|
||||||
|
plugin: migration_lookup
|
||||||
|
migration: udm_dependencies_multivalue_term
|
||||||
|
source: fruit_parent
|
||||||
|
```
|
||||||
|
|
||||||
|
_Technical note_: The _taxonomy term_ entity contains other properties you can write to. For a list of available options check the `baseFieldDefinitions()` method of the `Term` class defining the entity. Note that more properties can be available up in the class hierarchy.
|
||||||
|
|
||||||
|
## Migrating multivalue taxonomy terms fields
|
||||||
|
|
||||||
|
The next step is to create a _node_ migration that can write to a _multivalue taxonomy term field_. To stay on point, only one more field will be set: the _title_, which is required by the _node_ entity.[^9-change] The following snippet shows how the _source_ section is configured for the _node_ migration:
|
||||||
|
|
||||||
|
[^9-change]: Read this change record for more information on how the Migrate API processes Entity API validation: <https://www.drupal.org/node/3073707>
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- unique_id: 1
|
||||||
|
thoughtful_title: "Amazing recipe"
|
||||||
|
fruit_list: "Green apple, Banana, Pear"
|
||||||
|
- unique_id: 2
|
||||||
|
thoughtful_title: "Fruit-less recipe"
|
||||||
|
ids:
|
||||||
|
unique_id:
|
||||||
|
type: integer
|
||||||
|
```
|
||||||
|
|
||||||
|
The `fruits` column contains a comma separated list of taxonomies to apply. Note that the values match the _unique identifiers_ of the _taxonomy term migration_. If you had used numbers as migration identifiers there, you would have to use those numbers in this migration to refer to the terms. An example of that was presented in the previous chapter. Also note that there is one record that has no terms associated. This will be considered during the field mapping. The following snippet shows how the _process_ section is configured for the _node_ migration:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
title: thoughtful_title
|
||||||
|
field_tags:
|
||||||
|
- plugin: skip_on_empty
|
||||||
|
source: fruit_list
|
||||||
|
method: process
|
||||||
|
message: "Row does not contain fruit_list."
|
||||||
|
- plugin: explode
|
||||||
|
delimiter: ","
|
||||||
|
- plugin: callback
|
||||||
|
callable: trim
|
||||||
|
- plugin: migration_lookup
|
||||||
|
migration: udm_dependencies_multivalue_term
|
||||||
|
no_stub: true
|
||||||
|
```
|
||||||
|
|
||||||
|
The _title_ of the _node_ is a verbatim copy of the `thoughtful_title` column. The _Tags_ fields, mapped using its machine name `field_tags`, uses three chained process plugins. The `skip_on_empty` plugin reads the value of the `fruit_list` column and skips the processing of this field if no value is provided. This is done to accommodate the fact that some records in the _source_ do not specify tags. Note that the `method` configuration key is set to `process`. This indicates that only this field should be skipped and not the entire record. Ultimately, tags are optional in this context and _nodes_ should still be imported even if _no tag is associated_.
|
||||||
|
|
||||||
|
The [`explode`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Explode.php/class/Explode) plugin allows you to break a string into an _array_, using a `delimiter` to determine where to make the cut. Later, the `callback` plugin will use the `trim` PHP function to remove any space from the start or end of the exploded taxonomy term name. Finally, this _array_ is passed to the `migration_lookup` plugin specifying the term migration as the one to use for the look up operation. Again, the taxonomy term names are used here because they are the _unique identifiers_ of the _term migration_. The `no_stub` configuration should be set to `true` to prevent terms to be created if they are not found by the plugin. This would not occur in the example because we make sure a match is found. If we did not set this configuration and we do not include the trim step, some new terms would be created with spaces at the beginning. Note that neither of these plugins has a `source` configuration. This is because when process plugins are chained, the result of one plugin is sent as source to be transformed by the next one in line. The end result is an _array_ of **taxonomy term ids** that will be assigned to `field_tags`. The `migration_lookup` is able to process _single values_ and _arrays_.
|
||||||
|
|
||||||
|
The last part of the migration is specifying the _destination_ section and any _dependencies_. The following snippet shows how both are configured for the node migration:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
destination:
|
||||||
|
plugin: "entity:node"
|
||||||
|
default_bundle: article
|
||||||
|
migration_dependencies:
|
||||||
|
required:
|
||||||
|
- udm_dependencies_multivalue_term
|
||||||
|
optional: []
|
||||||
|
```
|
||||||
|
|
||||||
|
## More syntactic sugar
|
||||||
|
|
||||||
|
One way to set multivalue fields in Drupal migrations is assigning its value to an _array_. Another option is to set each value manually using **field deltas**. _Deltas_ are integer numbers starting with zero (**0**) and incrementing by one (**1**) for each element of a multivalue field. Although you could set any delta in the Migrate API, consider the field definition in Drupal. It is possible that limits had been set to the number of values a field could hold. You can specify _deltas_ and _subfields_ at the same time. The full syntax is `field_name/field_delta/subfield`. The following example shows the syntax for a multivalue image field:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
field_photos/0/target_id: source_fid_first
|
||||||
|
field_photos/0/alt: source_alt_first
|
||||||
|
field_photos/1/target_id: source_fid_second
|
||||||
|
field_photos/1/alt: source_alt_second
|
||||||
|
field_photos/2/target_id: source_fid_third
|
||||||
|
field_photos/2/alt: source_alt_third
|
||||||
|
```
|
||||||
|
|
||||||
|
Manually setting a multivalue field is less flexible and error-prone. In today’s example, we showed how to accommodate for the list of terms not being provided. Imagine having to that for each _delta_ and _subfield_ combination, but the functionality is there in case you need it. In the end, Drupal offers more _syntactic sugar_ so you can write shorted field mappings. Additionally, there are various process plugins that can handle _arrays_ for setting multivalue fields.
|
||||||
|
|
||||||
|
_Note_: There are other ways to migrate multivalue fields. For example, when using the [`entity_generate`](https://git.drupalcode.org/project/migrate_plus/blob/HEAD/src/Plugin/migrate/process/EntityGenerate.php) plugin provided by Migrate Plus, there is no need to create a separate taxonomy term migration. This plugin is able to create the terms on the fly while running the import process. The caveat is that terms created this way are not deleted upon rollback.
|
103
10.md
Normal file
103
10.md
Normal file
|
@ -0,0 +1,103 @@
|
||||||
|
# Migrating users into Drupal - Part 1
|
||||||
|
|
||||||
|
It is time we are how to migrate users into Drupal. In this case, the explanation will be divided into two chapters. In this one, we cover the migration of email, timezone, username, password, and status. In the next one, we will cover creation date, roles, and profile pictures. Several techniques will be implemented to ensure that the migrated data is valid. For example, making sure that usernames are not duplicated.
|
||||||
|
|
||||||
|
## Getting the code
|
||||||
|
|
||||||
|
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD users` whose machine name is `ud_migrations_users`. The two migrations to execute are `udm_user_pictures` and `udm_users`. Notice that both migrations belong to the same module.
|
||||||
|
|
||||||
|
The example assumes Drupal was installed using the `standard` installation profile. Particularly, we depend on a Picture (`user_picture`) _image_ field attached to the user entity. The word in parenthesis represents the _machine name_ of the image field.
|
||||||
|
|
||||||
|
The explanation below is only for the user migration. It depends on a file migration to get the profile pictures. One motivation to have two migrations is for the images to be deleted if the file migration is rolled back. Note that other techniques exist for migrating images without having to create a separate migration. We have covered two of them in the chapters about `subfields` and `constants and pseudofields`.
|
||||||
|
|
||||||
|
## Understanding the source
|
||||||
|
|
||||||
|
It is very important to understand the format of your _source_ data. This will guide the transformation process required to produce the expected destination format. For this example, it is assumed that the legacy system from which users are being imported did not have unique usernames. Emails were used to uniquely identify users, but that is not desired in the new Drupal site. Instead, a username will be created from a `public_name` source column. Special measures will be taken to prevent duplication as Drupal usernames must be unique. Two more things to consider. First, source passwords are provided in _plain_ text (never do this!). Second, some elements might be missing in the source like roles and profile picture. The following snippet shows a sample record for the _source_ section:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
source:
|
||||||
|
plugin: embedded_data
|
||||||
|
data_rows:
|
||||||
|
- legacy_id: 101
|
||||||
|
public_name: "Michele"
|
||||||
|
user_email: "micky@example.com"
|
||||||
|
timezone: "America/New_York"
|
||||||
|
user_password: "totally insecure password 1"
|
||||||
|
user_status: "active"
|
||||||
|
member_since: "January 1, 2011"
|
||||||
|
user_roles: "forum moderator, forum admin"
|
||||||
|
user_photo: "P01"
|
||||||
|
ids:
|
||||||
|
legacy_id:
|
||||||
|
type: integer
|
||||||
|
```
|
||||||
|
|
||||||
|
## Configuring the destination and dependencies
|
||||||
|
|
||||||
|
The _destination_ section specifies that _user_ is the target entity. When that is the case, you can set an optional `md5_passwords` configuration. If it is set to `true`, the system will take an MD5 hashed password and convert it to the encryption algorithm that Drupal uses.[^10-passmig] To migrate the profile pictures, a separate migration is created. The dependency of user on file is added explicitly. The following code snippet shows how the destination and dependencies are set:
|
||||||
|
|
||||||
|
[^10-passmig]: For more information password migrations refer to these articles for basic: <https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users> and advanced: <https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users-advanced-password> use cases.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
destination:
|
||||||
|
plugin: "entity:user"
|
||||||
|
md5_passwords: true
|
||||||
|
migration_dependencies:
|
||||||
|
required:
|
||||||
|
- udm_user_pictures
|
||||||
|
optional: []
|
||||||
|
```
|
||||||
|
|
||||||
|
## Processing the fields
|
||||||
|
|
||||||
|
The interesting part of a _user_ migration is the field mapping. The specific transformation will depend on your _source_, but some arguably complex cases will be addressed in the example. Let’s start with the basics: verbatim copies from source to destination. The following snippet shows three mappings:
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
mail: user_email
|
||||||
|
init: user_email
|
||||||
|
timezone: user_timezone
|
||||||
|
```
|
||||||
|
|
||||||
|
The `mail`, `init`, and `timezone` entity properties are copied directly from the source. Both `mail` and `init` are _email addresses_. The difference is that `mail` stores the current email, while `init` stores the one used when the account was first created. The former might change if the user updates its profile, while the latter will never change. The `timezone` needs to be a string taken from a specific set of values. [^10-timezone]
|
||||||
|
|
||||||
|
[^10-timezone]: Refer to this page for a list of supported timezones: <https://www.php.net/manual/en/timezones.php>
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
name:
|
||||||
|
- plugin: machine_name
|
||||||
|
source: public_name
|
||||||
|
- plugin: make_unique_entity_field
|
||||||
|
entity_type: user
|
||||||
|
field: name
|
||||||
|
postfix: _
|
||||||
|
```
|
||||||
|
|
||||||
|
The `name`, _entity property_ stores the _username_. This has to be unique in the system. If the _source_ data contained a unique value for each record, it could be used to set the username. None of the unique source columns (eg., `legacy_id`) is suitable to be used as username. Therefore, extra processing is needed. The [`machine_name`](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21MachineName.php/class/MachineName) plugin converts the `public_name` _source_ column into transliterated string with some restrictions: any character that is not a number or letter will be converted to an underscore. The transformed value is sent to the `make_unique_entity_field`. This plugin makes sure its input value is not repeated in the whole system for a particular entity field. In this example, the username will be unique. The plugin is configured indicating which _entity type_ and _field_ (property) you want to check. If an equal value already exists, a new one is created appending what you define as `postfix` plus a number. In this example, there are two records with `public_name` set to `Benjamin`. Eventually, the usernames produced by running the process plugins chain will be: `benjamin` and `benjamin_1`.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
process:
|
||||||
|
pass:
|
||||||
|
plugin: callback
|
||||||
|
callable: md5
|
||||||
|
source: user_password
|
||||||
|
destination:
|
||||||
|
plugin: "entity:user"
|
||||||
|
md5_passwords: true
|
||||||
|
```
|
||||||
|
|
||||||
|
The `pass`, entity property stores the user’s password. In this example, the source provides the passwords in plain text. Needless to say, that is a terrible idea. But let’s work with it for now. Drupal uses portable PHP password hashes implemented by PhpassHashedPassword. Understanding the details of how Drupal converts one algorithm to another will be left as an exercise for the curious reader. In this example, we are going to take advantage of a feature provided by the migrate API to automatically convert MD5 hashes to the algorithm used by Drupal. The `callback` plugin is configured to use the `md5` PHP function to convert the plain text password into a hashed version. The last part of the puzzle is set, in the _process_ section, the `md5_passwords` configuration to `true`. This will take care of converting the already md5-hashed password to the value expected by Drupal. The migrate API documentation provides more examples for migrating already [MD5 hashed passwords](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users) and [other hashing algorithms](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users-advanced-password).
|
||||||
|
|
||||||
|
_Note_: MD5-hash passwords are insecure. In the example, the password is encrypted with MD5 as an **intermediate step only**. Drupal uses other algorithms to store passwords securely.
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
status:
|
||||||
|
plugin: static_map
|
||||||
|
source: user_status
|
||||||
|
map:
|
||||||
|
inactive: 0
|
||||||
|
active: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
The `status`, _entity property_ stores whether a user is active or blocked from the system. The source `user_status` values are strings, but Drupal stores this data as a boolean. A value of zero (**0**) indicates that the user is _blocked_ while a value of one (**1**) indicates that it is _active_. The [`static_map`])(https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21StaticMap.php/class/StaticMap) plugin is used to manually map the values from source to destination. This plugin expects a `map` configuration containing an _array of key-value mappings_. The value from the source is on the left. The value expected by Drupal is on the right.
|
||||||
|
|
||||||
|
_Technical note_: Booleans are `true` or `false` values. Even though Drupal treats the `status` property as a boolean, it is internally stored as a `tiny int` in the database. That is why the numbers zero or one are used in the example. For this particular case, using a number or a boolean value on the right side of the mapping produces the same result.
|
BIN
resources/understanddrupalcom_blue.png
Normal file
BIN
resources/understanddrupalcom_blue.png
Normal file
Binary file not shown.
After Width: | Height: | Size: 104 KiB |
Loading…
Reference in a new issue