31dom/05.md
Mauricio Dinarte 0a46f584fb Update articles
2023-08-05 08:00:35 -06:00

11 KiB
Raw Permalink Blame History

Using constants and pseudofields as data placeholders in the Drupal migration process pipeline

So far we have learned how to write basic Drupal migrations and use process plugins to transform data meeting the format expected by the destination. In the previous chapter we learned one of many approaches to migrating images. Now we will change it a bit to introduce two new migration concepts: constants and pseudofields. Both can be used as data placeholders in the migration timeline. Along with other process plugins, they allow you to build dynamic values that can be used as part of the migrate process pipeline.

Setting and using source constants

In the Migrate API, source constants are arbitrary values that can be used later in the process pipeline. They are set as direct children of the source section. You write a constants key whose value is a list of name-value pairs. Even though they are defined in the source section, they are independent of the source plugin in use. The following code snippet shows a generalization for settings and using constants:

source:
  constants:
    MY_STRING: 'https://understanddrupal.com'
    MY_INTEGER: 31
    MY_DECIMAL: 3.1415927
    MY_ARRAY:
      - 'dinarcon'
      - 'dinartecc'
  plugin: source_plugin_name
  source_plugin_config_1: source_config_value_1
  source_plugin_config_2: source_config_value_2
process:
  process_destination_1: constants/MY_INTEGER
  process_destination_2:
    plugin: concat
    source: constants/MY_ARRAY
    delimiter: ' '

You can set as many constants as you need. Although not required by the API, writing the constants' names in all uppercase and using underscores (_) to separate words makes it easy to identify them. The value can be set to anything you need to use later. In the example above, there are strings, integers, decimals, and arrays. To use a constant in the process section you type its name, just like any other field provided by the source plugin. Note that to use the constant you need to name the full hierarchy under the source section. That is, the word constants plus the name itself separated by a slash (/) symbol. Their value can be used varbatim or transform via process plugins.

Technical note: The word constants for storing the values in the source section is not special. You can use any word you want as long as it does not collide with another configuration key of the source plugin in use. A reason to use a different name is if your source actually contains a field named constants. In that case you could use defaults or something else. The one restriction is that whatever value you use, you have to use it in the process section to refer to any constant. For example:

source:
  defaults:
    MY_VALUE: 'http://understanddrupal.com'
  plugin: source_plugin_name
  source_plugin_config: source_config_value
process:
  process_destination: defaults/MY_VALUE

Setting and using pseudofields

Similar to source constants, pseudofields store arbitrary values for use later in the process pipeline. There are some key differences. Pseudofields are set in the process section. The name can be arbitrary as long as it does not conflict with a property name or field name in the destination. The value can be set to a verbatim copy from the source (a field or a constant) or they can use process plugins for data transformations. The following code snippet shows a generalization for setting and using pseudofields:

source:
  constants:
    MY_BASE_URL: 'https://understanddrupal.com'
  plugin: source_plugin_name
  source_plugin_config_1: source_config_value_1
  source_plugin_config_2: source_config_value_2
process:
  title: source_column_title
  _pseudo_field_1:
    plugin: concat
    source:
      - constants/MY_BASE_URL
      - source_column_relative_url
    delimiter: '/'
  _pseudo_field_2:
    plugin: urlencode
    source: '@_pseudo_field_1'
  field_link/uri: '@_pseudo_field_2'
  field_link/title: '@title'

In the above example, _pseudo_field_1 is set to the result of a concat process transformation that joins a constant and a field from the source section. The result value is later used as part of a urlencode process transformation. Note that to use the value from _pseudo_field_1 you have to enclose it in quotes (') and prepend an at sign (@) to the name. The _pseudo_ prefix in the name is not required. It is used to make it easier to distinguish among pseudofields and regular property or field names. The new value obtained from URL encode operation is stored in _pseudo_field_2. This last pseudofield is used to set the value of the uri subfield for field_link. The example could be simplified by using a single pseudofield and chaining multiple process plugins. It is presented that way to demonstrate that a pseudofield could be used as direct assignments or as part of process plugin configuration values. !!! REVIEW!!!

Technical note: If the name of the subfield can be arbitrary, how can you prevent name clashes with destination property names and field names? You can look for an !!!online reference or review the class defining the entity and fields attached to it. In the case of a node migration, look at the baseFieldDefinitions method of the Node class for a list of property names. Be mindful of class inheritance and method overriding. For a list of fields and their machine names, look at the Manage fields section of the content type you are migrating into. The Field API prefixes any field created via the administration interface with the string field_. This reduces the likelihood of name clashes. Other than these two name restrictions, anything else can be used. In this case, the Migrate API will eventually perform an entity save operation which will discard the pseudofields.

Understanding Drupal Migrate API process pipeline

The migrate process pipeline is a mechanism by which the value of any destination property, field, or pseudofield that has been set can be used by anything defined later in the process section. The fact that using a pseudofield requires enclosing its name in quotes and prepending an at sign is actually a requirement of the process pipeline. Lets see some examples using a node migration:

  • To use the title property of the node entity, you would write @title
  • To use the field_image field of the Article content type, you would write @field_image
  • To use the _pseudo_temp_value pseudofield, you would write @_pseudo_temp_value

In the process pipeline, these values can be used just like constants and fields from the source. The only restriction is that they need to be set before being used. For those familiar with the rewrite results feature of Views, it follows the same idea. You have access to everything defined previously. Anytime you use enclose a name in quotes and prepend it with an at sign, you are telling the migrate API to look for that element in the process section instead of the source section.

Migrating images using the image_import plugin

Lets practice the concepts of constants, pseudofields, and the migrate process pipeline by modifying the example of the previous chapter. The Migrate Files(!!!) module provides another process plugin named image_import. It allows you to directly set all the subfield values in the plugin configuration itself.

The code snippets will be compact to focus on particular elements of the migration. The full code is available at https://www.drupal.org/project/migrate_examples The module name is Constants and Pseudofields Example and its machine name is constants_pseudofields_example. This example uses the Migrate Files module. Make sure to download and enable it.

Let's see part of the source definition:

source:
  constants:
    BASE_URL: 'https://udrupal.com'
    PHOTO_DESCRIPTION_PREFIX: 'Photo of'
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      name: Michele Metts
      photo_url: photos/freescholar.jpg
      photo_width: 587
      photo_height: 657

Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name and details about the image. Note that this time, the photo_url does not provide an absolute URL. Instead, it is a relative path from the domain hosting the images. In this example, the domain is https://udrupal.com so that value is stored in the BASE_URL constant. This is later used to assemble a valid absolute URL to the image. Also, there is no photo description, but one can be created by concatenating some strings. The PHOTO_DESCRIPTION_PREFIX constant will be used to assemble a description.

Now, let's see the process definition:

process:
  title: name
  _pseudo_image_url:
    plugin: concat
    source:
      - constants/BASE_URL
      - photo_url
    delimiter: '/'
  _pseudo_image_description:
    plugin: concat
    source:
      - constants/PHOTO_DESCRIPTION_PREFIX
      - name
    delimiter: ' '
  field_image:
    plugin: image_import
    source: '@_pseudo_image_url'
    file_exists: 'use existing'
    alt: '@_pseudo_image_description'
    title: '@title'
    width: photo_width
    height: photo_height

The title node property is set directly to the value of the name field from the source. _pseudo_image_url stores a valid absolute URL to the image using the BASE_URL constant and the photo_url field from the source. _pseudo_image_description uses the PHOTO_DESCRIPTION_PREFIX constant and the name field from the source to store a description for the image.

For the field_image field, the image_import process plugin is used. This time, the subfields are not set manually like in the previous chapter. The absence of the id_only configuration key allows you to assign values to subfields via the image_import plugin directly. The URL to the image is set in the source key and uses the _pseudo_image_url pseudofield. The alt key allows you to set the alternative attribute for the image using the _pseudo_image_description pseudofield. The title key expects the text to use for image's title. We are reusing the title node property which was set at the beginning of the process pipeline. Remember that destination properties, fields, and pseudofields are available as long as they were previosly defined in the pipeline. Finally, the width and height keys use fields from the source.

Important: By default, the migrate API will only expand the value of the source configuration. That is, replace its value either by a source field, source constant, or pseudofield. Any other configuration normally is not expanded and the its specified valued is passed verbatim to the process plugin. In the case of image_import, the plugin itself provides a mechanism to expand the values for the alt, title, width, and height configuration options. Most plugins do not this and will use the configured value literally.