Update articles

This commit is contained in:
Mauricio Dinarte 2023-08-05 08:00:35 -06:00
parent 2569fbaeea
commit 0a46f584fb
2 changed files with 52 additions and 51 deletions

12
04.md
View file

@ -2,8 +2,6 @@
In the previous chapter, we learned how to use process plugins to transform data between source and destination. Some Drupal fields have multiple components. For example, formatted text fields store the text to display and the text format to apply. Image fields store a reference to the file, alternative and title text, width, and height. The migrate API refers to a field's component as **subfield**. In this chapter we will learn how to migrate into them and know which subfields are available.
## Getting the example code
Today's example will consist of migrating data into the `Body` and `Image` fields of the `Article` content type is provided by the `standard` installation profile. As in previous examples, we will create a new module and write a migration plugin. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://www.drupal.org/project/migrate_examples> The module name is `Migration Subfields Example` and its machine name is `subfields_example`. This example uses the [Migrate Files](https://www.drupal.org/project/migrate_file) module (explained later). Make sure to download and enable it. Otherwise, you will get an error like: `In DiscoveryTrait.php line 53: The "file_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are: ...`. Let's see part of the *source* definition:
```yaml
@ -73,12 +71,12 @@ When migrating any field, you have to use their *machine name* in the mapping se
* `height` stores an integer number which represents the height in pixels.
For the `target_id`, the plugin `file_import` is used. This plugin requires a `source` configuration value with a url to the file. In this case, the `photo_url` field from the *source* section is used. The `file_exists` configuration dictates what to do in case a file with the same name already exists. Valid options are `replace` to replace the existing file, `use existing` to reuse the file, and `rename` to append `_N` to the file name (where `N` is an incrementing number) until the filename is unique. When working on migrations, it is common to run them over and over until you get the expected results. Using the `use existing` option will avoid downloading multiple copies of image file. The `id_only` flag is set so that the plugin only returns that file identifier used by Drupal instead of an entity reference array. This is done because each subfield is being set manually. For the rest of the subfields (`alt`, `title`, `width`, and `height`) the value is a verbatim copy from the *source*.
!!! use existing vs replace
*Note*: The Migrate Files module offers another plugin named `image_import`. That one allows you to set all the subfields as part of the plugin configuration. An example of its use will be shown in the chapter !!!. This example uses the `file_import` plugin to emphasize the configuration of the image subfields.
## Which subfields are available?
Some fields have many subfields. [Address fields](https://www.drupal.org/project/address), for example, have 14 subfields. How can you know which ones are available? You can look for an !!!online reference or look for the info yourself by looking at Drupal's source code. The subfields are defined in the class that provides the field type. Once you find the class, look for the `schema` method. The subfields are contained in the `columns` array of the value returned by that method. Let's see some examples:
Some fields have many subfields. [Address fields](https://www.drupal.org/project/address), for example, have 14 subfields. How can you know which ones are available? You can look for an !!!online reference or search for the information yourself by reviewing Drupal's source code. The subfields are defined in the class that provides the field type. Once you find the class, look for the `schema` method. The subfields are contained in the `columns` array of the value returned by that method. Let's see some examples:
* The `Text (plain)` field is provided by the StringItem class.
* The `Number (integer)` field is provided by the IntegerItem class.
@ -105,7 +103,7 @@ Another option is to connect to the database and check the table structures. For
Looking at the source code or the database schema is arguably not straightforward. This information is included for reference to those who want to explore the Migrate API in more detail. You can look for migrations examples to see what subfields are available.
*Tip*: You can use [Drupal Console](https://drupalconsole.com/) for code introspection and analysis of database table structure. Also, many plugins are defined by classes that end with the string `Item`. You can use your IDEs search feature to find the class using the name of the field as hint.
*Tip*: Many plugins are defined by classes whose name ends with the string `Item`. You can use your IDEs search feature to find the class using the name of the field as hint. Those classes would like in the `src/Plugin/Field/FieldType` folder of the module.
## Default subfields
@ -123,6 +121,6 @@ process:
field_integer: source_value_integer
```
In examples from previous days, no subfield has been manually set, but Drupal knows what to do. As we have mentioned, the Migrate API offers syntactic sugar to write shorter migration definition files. This is another example. You can safely skip the default subfield and manually set the others as needed. For `File` and `Image` fields, the default subfield is `target_id`. How does the Migrate API know what subfield is the default? You need to check the code again.
In previous chapters no subfield has been manually set, but Drupal knows what to do. The Migrate API offers syntactic sugar to write shorter migration plugins. This is another example. You can safely skip the default subfield and manually set the others as needed. For `File` and `Image` fields, the default subfield is `target_id`. How does the Migrate API know what subfield is the default? You need to check the code again.
The default subfield is determined by the return value of `mainPropertyName` method of the class providing the field type. Again, object oriented practices might require looking at the parent classes to find this method. In the case of the `Image` field, it is provided by ImageItem which extends FileItem which extends EntityReferenceItem. It is the latter that contains the `mainPropertyName` returning the string `target_id`.
The default subfield is determined by the return value of `mainPropertyName` method of the class providing the field type. Again, object oriented practices might require looking at parent classes to find this method. The `Image` field is provided by `ImageItem` which extends `FileItem` which itself extends `EntityReferenceItem`. It is the latter that contains the `mainPropertyName` returning the string `target_id`.

91
05.md
View file

@ -1,20 +1,20 @@
# Using constants and pseudofields as data placeholders in the Drupal migration process pipeline
So far we have learned how to write basic Drupal migrations and use process plugins to transform data to meet the format expected by the destination. In the previous chapter we learned one of many approaches to migrating images. Now we will change it a bit to introduce two new migration concepts: **constants** and **pseudofields**. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the **migrate process pipeline**.
So far we have learned how to write basic Drupal migrations and use process plugins to transform data meeting the format expected by the destination. In the previous chapter we learned one of many approaches to migrating images. Now we will change it a bit to introduce two new migration concepts: **constants** and **pseudofields**. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the **migrate process pipeline**.
## Setting and using source constants
In the Migrate API, **constant** are _arbitrary values that can be used later in the process pipeline_. They are set as direct children of the source section. You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the _source_ section, they are independent of the particular source plugin in use. The following code snippet shows a generalization for settings and using _constants_:
In the Migrate API, **source constants** are _arbitrary values that can be used later in the process pipeline_. They are set as direct children of the source section. You write a `constants` key whose value is a list of name-value pairs. Even though they are defined in the _source_ section, they are independent of the source plugin in use. The following code snippet shows a generalization for settings and using _constants_:
```yaml
source:
constants:
MY_STRING: "http://understanddrupal.com"
MY_STRING: 'https://understanddrupal.com'
MY_INTEGER: 31
MY_DECIMAL: 3.1415927
MY_ARRAY:
- "dinarcon"
- "dinartecc"
- 'dinarcon'
- 'dinartecc'
plugin: source_plugin_name
source_plugin_config_1: source_config_value_1
source_plugin_config_2: source_config_value_2
@ -23,17 +23,17 @@ process:
process_destination_2:
plugin: concat
source: constants/MY_ARRAY
delimiter: " "
delimiter: ' '
```
You can set as many constants as you need. Although not required by the API, it is a common convention to write the constant names in all uppercase and using _underscores_ (**\_**) to separate words. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other column provided by the _source_ plugin. Note that you use the constant you need to name the full hierarchy under the source section. That is, the word `constants` and the name itself separated by a _slash_ (**/**) symbol. They can be used to copy their value directly to the destination or as part of any process plugin configuration.
You can set as many constants as you need. Although not required by the API, writing the constants' names in all uppercase and using _underscores_ (**\_**) to separate words makes it easy to identify them. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other field provided by the _source_ plugin. Note that to use the constant you need to name the full hierarchy under the source section. That is, the word `constants` plus the name itself separated by a _slash_ (**/**) symbol. Their value can be used varbatim or transform via process plugins.
_Technical note_: The word `constants` for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of your particular source plugin. A reason to use a different name is that your source actually contains a column named `constants`. In that case you could use `defaults` or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:
_Technical note_: The word `constants` for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of the source plugin in use. A reason to use a different name is if your source actually contains a field named `constants`. In that case you could use `defaults` or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:
```yaml
source:
defaults:
MY_VALUE: "http://understanddrupal.com"
MY_VALUE: 'http://understanddrupal.com'
plugin: source_plugin_name
source_plugin_config: source_config_value
process:
@ -42,95 +42,98 @@ process:
## Setting and using pseudofields
Similar to constants, **pseudofields** store _arbitrary values for use later in the process pipeline_. There are some key differences. Pseudofields are set in the _process_ section. The name can be arbitrary as long as it does not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the _source_ (a column or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for settings and using _pseudofields_:
Similar to source constants, **pseudofields** store _arbitrary values for use later in the process pipeline_. There are some key differences. Pseudofields are set in the _process_ section. The name can be arbitrary as long as it does not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the _source_ (a field or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for setting and using _pseudofields_:
```yaml
source:
constants:
MY_BASE_URL: "http://understanddrupal.com"
MY_BASE_URL: 'https://understanddrupal.com'
plugin: source_plugin_name
source_plugin_config_1: source_config_value_1
source_plugin_config_2: source_config_value_2
process:
title: source_column_title
my_pseudofield_1:
_pseudo_field_1:
plugin: concat
source:
- constants/MY_BASE_URL
- source_column_relative_url
delimiter: "/"
my_pseudofield_2:
delimiter: '/'
_pseudo_field_2:
plugin: urlencode
source: "@my_pseudofield_1"
field_link/uri: "@my_pseudofield_2"
field_link/title: "@title"
source: '@_pseudo_field_1'
field_link/uri: '@_pseudo_field_2'
field_link/title: '@title'
```
In the above example, `my_pseudofield_1` is set to the result of a `concat` process transformation that joins a constant and a column from the source section. The result value is later used as part of a `urlencode` process transformation. Note that to use the value from `my_pseudofield_1` you have to enclose it in _quotes_ (**'**) and prepend an _at sign_ (**@**) to the name. The `pseudo_` prefix in the name is not required. In this case it is used to make it easier to distinguish among pseudofields and regular property or field names. The new value obtained from URL encode operation is stored in `my_pseudofield_2`. This last pseudofield is used to set the value of the `uri` subfield for `field_link`. The example could be simplified, for example, by using a single pseudofield and chaining process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.
In the above example, `_pseudo_field_1` is set to the result of a `concat` process transformation that joins a constant and a field from the source section. The result value is later used as part of a `urlencode` process transformation. Note that to use the value from `_pseudo_field_1` you have to enclose it in _quotes_ (**'**) and prepend an _at sign_ (**@**) to the name. The `_pseudo_` prefix in the name is not required. It is used to make it easier to distinguish among pseudofields and regular property or field names. The new value obtained from URL encode operation is stored in `_pseudo_field_2`. This last pseudofield is used to set the value of the `uri` subfield for `field_link`. The example could be simplified by using a single pseudofield and chaining multiple process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values.
!!! REVIEW!!!
_Technical note_: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You might have to look at the source for the entity and the configuration of the bundle. In the case of a node migration, look at the `baseFieldDefinitions` method of the `Node` class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the "Manage fields" section of the content type you are migrating into. The [Field API](https://api.drupal.org/api/drupal/core!modules!field!field.module/group/field/8.8.x) prefixes any field created via the administration interface with the string `field_`. This reduces the likelihood of name clashes. Other than these two name restrictions, _anything else can be used_. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.
_Technical note_: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You can look for an !!!online reference or review the class defining the entity and fields attached to it. In the case of a node migration, look at the `baseFieldDefinitions` method of the `Node` class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the `Manage fields` section of the content type you are migrating into. The [Field API](https://api.drupal.org/api/drupal/core!modules!field!field.module/group/field/8.8.x) prefixes any field created via the administration interface with the string `field_`. This reduces the likelihood of name clashes. Other than these two name restrictions, _anything else can be used_. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.
## Understanding Drupal Migrate API process pipeline
The migrate process pipeline is a mechanism by which the value of any **destination property**, **field**, or **pseudofield** that has been set **can be used by anything defined later in the process section**. The fact that using a pseudofield requires enclosing its name in quotes and prepending an at sign is actually a requirement of the process pipeline. Lets see some examples using a node migration:
The migrate process pipeline is a mechanism by which the value of any **destination property**, **field**, or **pseudofield** that has been set **can be used by anything defined later in the process section**. The fact that using a pseudofield requires enclosing its name in _quotes_ and prepending an _at sign_ is actually a requirement of the process pipeline. Lets see some examples using a node migration:
- To use the `title` property of the node entity, you would write `@title`
- To use the `field_body` field of the `Basic page` content type, you would write `@field_body`
- To use the `my_temp_value` pseudofield, you would write `@my_temp_value`
- To use the `field_image` field of the `Article` content type, you would write `@field_image`
- To use the `_pseudo_temp_value` pseudofield, you would write `@_pseudo_temp_value`
In the process pipeline, these values can be used just like constants and columns from the source. The only restriction is that they need to be set before being used. For those familiar with the "_rewrite results_" feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in _quotes_ and prepend it with an _at sign_, you are telling the migrate API to look for that element in the process section instead of the source section.
In the process pipeline, these values can be used just like constants and fields from the source. The only restriction is that they need to be set before being used. For those familiar with the _rewrite results_ feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in _quotes_ and prepend it with an _at sign_, you are telling the migrate API to look for that element in the process section instead of the source section.
## Migrating images using the image_import plugin
Lets practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous entry. The Migrate Files module provides another process plugin named `image_import` that allows you to directly set all the subfield values in the plugin configuration itself.
Lets practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous chapter. The Migrate Files(!!!) module provides another process plugin named `image_import`. It allows you to directly set all the subfield values in the plugin configuration itself.
As in previous examples, we will create a new module and write a migration definition file to perform the migration. It is assumed that Drupal was installed using the `standard` installation profile. The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://github.com/dinarcon/ud_migrations> The module name is `UD Migration constants and pseudofields` and its machine name is `ud_migrations_constants_pseudofields`. The `id` of the example migration is `udm_constants_pseudofields`. Make sure to download and enable the Migrate Files module. Otherwise, you will get an error like: "In DiscoveryTrait.php line 53: The "image_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:...".
The code snippets will be compact to focus on particular elements of the migration. The full code is available at <https://www.drupal.org/project/migrate_examples> The module name is `Constants and Pseudofields Example` and its machine name is `constants_pseudofields_example`. This example uses the [Migrate Files](https://www.drupal.org/project/migrate_file) module. Make sure to download and enable it.
Lets see part of the _source_ definition:
Let's see part of the _source_ definition:
```yaml
source:
constants:
BASE_URL: "https://udrupal.com"
PHOTO_DESCRIPTION_PREFIX: "Photo of"
BASE_URL: 'https://udrupal.com'
PHOTO_DESCRIPTION_PREFIX: 'Photo of'
plugin: embedded_data
data_rows:
- unique_id: 1
name: "Michele Metts"
photo_url: "photos/freescholar.jpg"
photo_width: "587"
photo_height: "657"
name: Michele Metts
photo_url: photos/freescholar.jpg
photo_width: 587
photo_height: 657
```
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image. Note that this time, the `photo_url` does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is `https://udrupal.com` so that value is stored in the BASE_URL constant which is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant stores the prefix to add to the name to create a photo description.
Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name and details about the image. Note that this time, the `photo_url` does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is `https://udrupal.com` so that value is stored in the BASE_URL constant. This is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant will be used to assemble a description.
Now, lets see the _process_ definition:
Now, let's see the _process_ definition:
```yaml
process:
title: name
pseudo_image_url:
_pseudo_image_url:
plugin: concat
source:
- constants/BASE_URL
- photo_url
delimiter: "/"
pseudo_image_description:
delimiter: '/'
_pseudo_image_description:
plugin: concat
source:
- constants/PHOTO_DESCRIPTION_PREFIX
- name
delimiter: " "
delimiter: ' '
field_image:
plugin: image_import
source: "@pseudo_image_url"
reuse: TRUE
alt: "@pseudo_image_description"
title: "@title"
source: '@_pseudo_image_url'
file_exists: 'use existing'
alt: '@_pseudo_image_description'
title: '@title'
width: photo_width
height: photo_height
```
The `title` node property is set directly to the value of the `name` column from the source. Then, two pseudofields. `pseudo_image_url` stores a valid absolute URL to the image using the BASE_URL constant and the `photo_url` _column_ from the _source_. `pseudo_image_description` uses the PHOTO_DESCRIPTION_PREFIX constant and the `name` _column_ from the _source_ to store a description for the image.
The `title` node property is set directly to the value of the `name` field from the source. `_pseudo_image_url` stores a valid absolute URL to the image using the BASE_URL constant and the `photo_url` _field_ from the _source_. `_pseudo_image_description` uses the PHOTO_DESCRIPTION_PREFIX constant and the `name` _field_ from the _source_ to store a description for the image.
For the `field_image` field, the `image_import` process plugin is used. This time, the subfields are not set manually like in the previous chapter. The absence of the `id_only` configuration key, allows you to assign values to subfields simply by configuring the `image_import` plugin. The URL to the image is set in the `source` key and uses the `pseudo_image_url` pseudofield. The `alt` key allows you to set the alternative attribute for the image and in this case the `pseudo_image_description` pseudofield is used. For the `title` subfield sets the text of a subfield with the same name and in this case it is assigned the value of the `title` node property which was set at the beginning of the process pipeline. Remember that not only psedufields are available. Finally, the `width` and `height` configuration uses the columns from the source to set the values of the corresponding subfields.
For the `field_image` field, the `image_import` process plugin is used. This time, the subfields are not set manually like in the previous chapter. The absence of the `id_only` configuration key allows you to assign values to subfields via the `image_import` plugin directly. The URL to the image is set in the `source` key and uses the `_pseudo_image_url` pseudofield. The `alt` key allows you to set the alternative attribute for the image using the `_pseudo_image_description` pseudofield. The `title` key expects the text to use for image's title. We are reusing the `title` node property which was set at the beginning of the process pipeline. Remember that destination properties, fields, and pseudofields are available as long as they were previosly defined in the pipeline. Finally, the `width` and `height` keys use fields from the source.
**Important**: By default, the migrate API will only expand the value of the `source` configuration. That is, replace its value either by a source field, source constant, or pseudofield. Any other configuration normally is not expanded and the its specified valued is passed verbatim to the process plugin. In the case of `image_import`, the plugin itself provides a mechanism to expand the values for the `alt`, `title`, `width`, and `height` configuration options. Most plugins do not this and will use the configured value literally.