31dom/04.md
Mauricio Dinarte ba253739b1 Rename files
2023-08-04 10:43:39 -06:00

11 KiB

Migrating data into Drupal subfields

In the previous chapter, we learned how to use process plugins to transform data between source and destination. Some Drupal fields have multiple components. For example, formatted text fields store the text to display and the text format to apply. Image fields store a reference to the file, alternative and title text, width, and height. The migrate API refers to a field's component as subfield. In this chapter we will learn how to migrate into them and know which subfields are available.

Getting the example code

Today's example will consist of migrating data into the Body and Image fields of the Article content type that are available out of the box. This assumes that Drupal was installed using the standard installation profile. As in previous examples, we will create a new module and write a migration definition file to perform the migration. The code snippets will be compact to focus on particular elements of the migration. The full code is available at https://github.com/dinarcon/ud_migrations The module name is UD Migration Subfields and its machine name is ud_migrations_subfields. The id of the example migration is udm_subfields. This example uses the Migrate Files module (explained later). Make sure to download and enable it. Otherwise, you will get an error like: In DiscoveryTrait.php line 53: The "file_import" plugin does not exist. Valid plugin IDs for Drupal\migrate\Plugin\MigratePluginManager are:.... Let's see part of the source definition:

source:
  plugin: embedded_data
  data_rows:
    -
      unique_id: 1
      name: 'Michele Metts'
      profile: '<a href="https://www.drupal.org/u/freescholar" title="Michele on Drupal.org">freescholar</a> on Drupal.org'
      photo_url: 'https://udrupal.com/photos/freescholar.jpg'
      photo_description: 'Photo of Michele Metts'
      photo_width: '587'
      photo_height: '657'

Only one record is presented to keep snippet short, but more exist. In addition to having a unique identifier, each record includes a name, a short profile, and details about the image.

Migrating formatted text

The Body field is of type Text (formatted, long, with summary). This type of field has three components: the full text (value) to present, a summary text, and a text format. The Migrate API allows you to write to each component separately defining subfields targets. The next code snippets shows how to do it:

process:
  field_text_with_summary/value: source_value
  field_text_with_summary/summary: source_summary
  field_text_with_summary/format: source_format

The syntax to migrate into subfields is the machine name of the field and the subfield name separated by a slash (/). Then, a colon (:), a space, and the value. You can set the value to a source column name for a verbatim copy or use any combination of process plugins. It is not required to migrate into all subfields. Each field determines what components are required so it is possible that not all subfields are set. In this example, only the value and text format will be set.

process:
  body/value: profile
  body/format:
    plugin: default_value
    default_value: restricted_html

The value subfield is set to the profile source column. As you can see in the first snippet, it contains HTML markup. An a tag to be precise. Because we want the tag to be rendered as a link, a text format that allows such tag needs to be specified. There is no information about text formats in the source, but Drupal comes with a couple we can choose from. In this case, we use the Restricted HTML text format. Note that the default_value plugin is used and set to restricted_html. When setting text formats, it is necessary to use their machine name. You can find them in the configuration page for each text format. For Restricted HTML that is /admin/config/content/formats/manage/restricted_html.

Note: Text formats are a whole different subject that even has security implications. To keep the discussion on topic, we will only give some recommendations. When you need to migrate HTML markup, you need to know which tags appear in your source, which ones you want to allow in Drupal, and select a text format that accepts what you have whitelisted and filter out any dangerous tag like script. As a general rule, you should avoid setting the format subfield to use the Full HTML text format.

Migrating images

There are different approaches to migrating images. Today, we are going to use the Migrate Files module. It is important to note that Drupal treats images as files with extra properties and behavior. Any approach used to migrate files can be adapted to migrate images.

process:
  field_image/target_id:
    plugin: file_import
    source: photo_url
    reuse: TRUE
    id_only: TRUE
  field_image/alt: photo_description
  field_image/title: photo_description
  field_image/width: photo_width
  field_image/height: photo_height

When migrating any field, you have to use their machine name in the mapping section. For the Image field, the machine name is field_image. Knowing that, you set each of its subfields:

  • target_id stores an integer number which Drupal uses as a reference to the file.
  • alt stores a string that represents the alternative text. Always set one for better accessibility.
  • title stores a string that represents the title attribute.
  • width stores an integer number which represents the width in pixels.
  • height stores an integer number which represents the height in pixels.

For the target_id, the plugin file_import is used. This plugin requires a source configuration value with a url to the file. In this case, the photo_url column from the source section is used. The reuse flag indicates that if a file with the same location and name exists, it should be used instead of downloading a new copy. When working on migrations, it is common to run them over and over until you get the expected results. Using the reuse flag will avoid creating multiple references or copies of image file, depending on the plugin configuration. The id_only flag is set so that the plugin only returns that file identifier used by Drupal instead of an entity reference array. This is done because the each subfield is being set manually. For the rest of the subfields (alt, title, width, and height) the value is a verbatim copy from the source.

Note: The Migrate Files module offers another plugin named image_import. That one allows you to set all the subfields as part of the plugin configuration. An example of its use will be shown in the next article. This example uses the file_import plugin to emphasize the configuration of the image subfields.

Which subfields are available?

Some fields have many subfields. Address fields, for example, have 13 subfields. How can you know which ones are available? The answer is found in the class that provides the field type. Once you find the class, look for the schema method. The subfields are contained in the columns array of the value returned by that method. Let's see some examples:

  • The Text (plain) field is provided by the StringItem class.
  • The Number (integer) field is provided by the IntegerItem class.
  • The Text (formatted, long, with summary) field is provided by the TextWithSummaryItem class.
  • The Image field is provided by the ImageItem class.

The schema method defines the database columns used by the field to store its data. When migrating into subfields, you are actually migrating into those particular database columns. Any restriction set by the database schema needs to be respected. That is why you do not use units when migrating width and height for images. The database only expects an integer number representing the corresponding values in pixels. Because of object oriented practices, sometimes you need to look at the parent class to know all the subfields that are available.

Technical note: The Migrate API bypasses Form API validations. For example, it is possible to migrate images without setting the alt subfield even if that is set as required in the field's configuration. If you try to edit a node that was created this way, you will get a field error indicating that the alternative text is required. Similarly, it is possible to write the title subfield even when the field is not expecting it, just like in today's example. If you were to enable the title text later, the information will be there already. Remember that when using the Migrate API you are writing directly to the database.

Another option is to connect to the database and check the table structures. For example, the Image field stores its data in the node__field_image table. Among others, this table has five columns named after the field's machine name and the subfield:

  • field_image_target_id
  • field_image_alt
  • field_image_title
  • field_image_width
  • field_image_height

Looking at the source code or the database schema is arguably not straightforward. This information is included for reference to those who want to explore the Migrate API in more detail. You can look for migrations examples to see what subfields are available.

Tip: You can use Drupal Console for code introspection and analysis of database table structure. Also, many plugins are defined by classes that end with the string Item. You can use your IDEs search feature to find the class using the name of the field as hint.

Default subfields

Every Drupal field has at least one subfield. For example, Text (plain) and Number (integer) defines only the value subfield. The following code snippets are equivalent:

process:
  field_string/value: source_value_string
  field_integer/value: source_value_integer
process:
  field_string: source_value_string
  field_integer: source_value_integer

In examples from previous days, no subfield has been manually set, but Drupal knows what to do. As we have mentioned, the Migrate API offers syntactic sugar to write shorter migration definition files. This is another example. You can safely skip the default subfield and manually set the others as needed. For File and Image fields, the default subfield is target_id. How does the Migrate API know what subfield is the default? You need to check the code again.

The default subfield is determined by the return value of mainPropertyName method of the class providing the field type. Again, object oriented practices might require looking at the parent classes to find this method. In the case of the Image field, it is provided by ImageItem which extends FileItem which extends EntityReferenceItem. It is the latter that contains the mainPropertyName returning the string target_id.