31dom/03.md
Mauricio Dinarte 2569fbaeea Update articles
2023-08-05 05:25:40 -06:00

123 lines
10 KiB
Markdown

# Using process plugins for data transformation in Drupal migrations
In the previous chapter, we wrote our first Drupal migration. In that example, we copied verbatim values from the source to the destination. More often than not, the data needs to be transformed in some way or another to match the format expected by the destination system or to meet business requirements. Now we will learn more about process plugins and how they work as part of the Drupal migration pipeline.
## Syntactic sugar
The Migrate API offers a lot of syntactic sugar to make it easier to write migration plugins. Field mappings in the process section are an example of this. Each of them requires a process plugin to be defined. If none is manually set, then the `get` plugin is assumed. The following two code snippets are equivalent in functionality.
```yaml
process:
title: creative_title
```
```yaml
process:
title:
plugin: get
source: creative_title
```
The `get` process plugin copies a value from the source to the destination without making any changes. Because this is a common operation, `get` is considered the default. There are many process plugins provided by Drupal core and contributed modules. Their configuration can be generalized as follows:
```yaml
process:
destination_field:
plugin: plugin_name
config_1: value_1
config_2: value_2
config_3: value_3
```
The process plugin is configured within an extra level of indentation under the destination field. The `plugin` key is required and determines which plugin to use. Then, a list of configuration options follows. Refer to the documentation of each plugin to know what options are available. Some configuration options will be required while others will be optional. For example, the `concat` plugin requires a `source`, but the `delimiter` is optional. An example of its use appears later in this chapter.
## Providing default values
Sometimes the destination requires a property or field to be set, but that information is not present in the source. Imagine you are migrating nodes. It is recommended to write one migration per content type. If you have a migration to nodes of type `Basic page`, it would be redundant to have a column in the source with the same value for every row. The data might not be needed. Or it might not exist. In any case, the `default_value` plugin can be used to provide a value when the data is not available in the source.
```yaml
source: ...
process:
type:
plugin: default_value
default_value: page
destination:
plugin: 'entity:node'
```
The above example sets the `type` property for all nodes in this migration to `page`, which is the machine name of the `Basic page` content type. Do not confuse the name of the plugin with the name of its configuration property as they happen to be the same: `default_value`. Also note that because `type` is manually set in the process section, the `default_bundle` key in the destination section is no longer required. You can see the latter being used in the example of the chapter 2. If `type` is defined in the `process` section and `default_bundle` in the `destination` section, the former takes precedence.
## Concatenating values
Consider the following migration request: you have a source listing people with first and last name in separate columns. Both are capitalized. The two values need to be put together (concatenated) and used as the title of nodes of type `Basic page`. The character casing needs to be changed so that only the first letter of each word is capitalized. If there is a need to display them in all caps, CSS can be used for presentation. For example: `FELIX DELATTRE` would be transformed to `Felix Delattre`.
_Tip_: Question business requirements when they might produce undesired results. For instance, if you were to implement this feature as requested `DAMIEN MCKENNA` would be transformed to `Damien Mckenna`. That is not the correct capitalization for the last name `McKenna`. If automatic transformation is not possible or feasible for all variations of the source data, take note and perform manual updates after the initial migration. Evaluate as many use cases as possible and bring them to the client's attention.
To implement this feature, let's create a new module `process_example`. Inside its `migrations` folder, and write a migration plugin called `process_example.yml`. Download the sample module from <https://www.drupal.org/project/migrate_examples> For this example, we assume a Drupal installation using the `standard` installation profile which comes with the `Basic Page` content type. Let's see how to handle the concatenation of first and last name.
```yaml
id: process_example
label: Process Plugins Example
source:
plugin: embedded_data
data_rows:
- unique_id: 1
first_name: FELIX
last_name: DELATTRE
- unique_id: 2
first_name: BENJAMIN
last_name: MELANÇON
- unique_id: 3
first_name: STEFAN
last_name: FREUDENBERG
ids:
unique_id:
type: integer
process:
type:
plugin: default_value
default_value: page
title:
plugin: concat
source:
- first_name
- last_name
delimiter: ' '
destination:
plugin: entity:node
```
The [concat](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Concat.php/class/Concat) plugin can be used to glue together an arbitrary number of strings. Its `source` property contains an array of all the values that you want put together. The `delimiter` is an optional parameter that defines a string to add between the elements as they are concatenated. If not set, there will be no separation between the elements in the concatenated result. This plugin has an **important limitation**. You cannot use strings literals as part of what you want to concatenate. For example, joining the string `Hello` with the value of the `first_name` column. All the values to concatenate need to provided by the source plugin or fields already available in the process pipeline. It is possible to leverage a feature called source constants to concatenate string literals. We will talk about this in chapter !!!.
To execute the above migration, you need to enable the `process_example` module. Open a terminal, switch directories to your Drupal's webroot, and execute the following command: `drush migrate:import process_example`. If the migration fails, refer to the end of the chapter 2 for debugging information. If it works, you will see three basic pages whose title contains the names of some of my Drupal mentors. #DrupalThanks
## Chaining process plugins
Good progress so far, but the feature has not been fully implemented. You still need to change the capitalization so that only the first letter of each word in the resulting title is uppercase. Thankfully, the Migrate API allows [**chaining of process plugins**](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/migrate-process-overview#full-pipeline). This works similarly to unix pipelines in that the output of one process plugin becomes the input of the next one in the chain. When the last plugin in the chain completes its transformation, the return value is assigned to the destination field. Let's see this in action:
```yaml
id: process_example
label: Process Plugins Example
source: ...
process:
type: ...
title:
- plugin: concat
source:
- first_name
- last_name
delimiter: ' '
- plugin: callback
callable: mb_strtolower
- plugin: callback
callable: ucwords
destination: ...
```
The [callback](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) process plugin passes a value to a PHP function and returns its result. The function to call is specified in the `callable` configuration option. The `source` option holds a value provided by the source plugin or one from process pipeline. That value is sent as the first argument to the function. Because we are using the `callback` plugin as part of a chain, the source is assumed to be the output of the previous plugin. Hence, there is no need to define a `source`. The example concatenates the first and last names, make them all lowercase, and then capitalize each word.
Relying on direct PHP function calls should be a last resort. Better alternatives include writing your own process plugins which encapsulate your business logic. The `callback` plugin is versatile. It can public methods in a PHP class. If the `source` is an array, it can be expanded to be so each element is passed as a different argument to the callable. Refer to the [plugin documentation](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) for various examples.
_Tip_: You should have a good understanding of your source and destination formats. In this example, one of the values to transform is `MELANÇON`. Because of the cedilla (**ç**) using `strtolower` is not adequate in this case. It would leave that character uppercase (`melanÇon`). [Multibyte string functions](https://www.php.net/manual/en/ref.mbstring.php) (mb\_\*) are required for proper transformation. `ucwords` is not one of them and would present similar issues if the first letter of a word is special characters. Also, attention should be given to the character encoding of the tables in your destination database.
_Technical note_: `mb_strtolower` is a function provided by the [mbstring](https://www.php.net/manual/en/mbstring.installation.php) PHP extension. It does not come enabled by default or you might not have it installed altogether. In those cases, the function would not be available when Drupal tries to call it. The following error is produced when trying to call a function that is not available: `The "callable" must be a valid function or method`. For this example, the error would never be triggered even if the extension is missing. That is because Drupal core depends on some Symfony packages which in turn depend on the `symfony/polyfill-mbstring` package. The latter provides a polyfill for mb\_\* functions that has been leveraged since version 8.6.x of Drupal.