31dom/14.md
Mauricio Dinarte ba253739b1 Rename files
2023-08-04 10:43:39 -06:00

16 KiB

Introduction to paragraphs migrations in Drupal

Today we will present an introduction to paragraphs migrations in Drupal. The example consists of migrating paragraphs of one type, then connecting the migrated paragraphs to nodes. A separate image migration is included to demonstrate how they are different. At the end, we will talk about behavior that deletes paragraphs when the host entity is deleted. Let's get started.

Getting the code

You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD paragraphs migration introduction whose machine name is ud_migrations_paragraph_intro. It comes with three migrations: ud_migrations_paragraph_intro_paragraph, ud_migrations_paragraph_intro_image, and ud_migrations_paragraph_intro_node. One content type, one paragraph type, and four fields will be created when the module is installed.

The ud_migrations_paragraph_intro only defines the migrations. But the destination content type, fields, and paragraphs type need to be created as well. That is the job of the ud_migrations_paragraph_config module which is a dependency of ud_migrations_paragraph_intro. That reason to create the configuration in a separate module is because later articles in the series will make use of the same configuration. So, any example that depends on those content type, fields, and paragraph type can set a dependency on the ud_migrations_paragraph_config to make them available.

Note: Configuration placed in a module's config/install directory will be copied to Drupal's active configuration. And if those files have a dependencies/enforced/module key, the configuration will be removed when the listed modules are uninstalled. That is how the content type, the paragraph type, and the fields are automatically created and deleted.

You can get the Paragraph module using composer: composer require drupal/paragraphs. This will also download its dependency: the Entity Reference Revisions module. If your Drupal site is not composer-based, you can get the code for both modules manually.

Understanding the example set up

The example code creates one paragraph type named UD book paragraph (ud_book_paragraph). It has two "Text (plain)" fields: Title (field_ud_book_paragraph_title) and Author (field_ud_book_paragraph_author). A new UD Paragraphs (ud_paragraphs) content type is also created. This has two fields: Image (field_ud_image) and Favorite book (field_ud_favorite_book) containing references to images and book paragraphs imported in separate migrations. The words in parenthesis represent the machine names of the different elements.

The paragraph migration

Migrating into a paragraph type is very similar to migrating into a content type. You specify the source, process the fields making any required transformation, and set the destination entity and bundle. The following code snippet shows the source, process, and destination sections:

source:
  plugin: embedded_data
  data_rows:
    - book_id: 'B10'
      book_title: 'The definite guide to Drupal 7'
      book_author: 'Benjamin Melançon et al.'
    - book_id: 'B20'
      book_title: 'Understanding Drupal Views'
      book_author: 'Carlos Dinarte'
    - book_id: 'B30'
      book_title: 'Understanding Drupal Migrations'
      book_author: 'Mauricio Dinarte'
  ids:
    book_id:
      type: string
process:
  field_ud_book_paragraph_title: book_title
  field_ud_book_paragraph_author: book_author
destination:
  plugin: 'entity_reference_revisions:paragraph'
  default_bundle: ud_book_paragraph

The most important part of a paragraph migration is setting the destination plugin to entity_reference_revisions:paragraph. This plugin is actually provided by the Entity Reference Revisions module. It is very important to note that paragraphs entities are revisioned. This means that when you want to create a reference to them, you need to provide two IDs: target_id and target_revision_id. Regular entity reference fields like files, images, and taxonomy terms only require the target_id. This will be further explained with the node migration.

The other configuration that you can optionally set in the destination section is default_bundle. The value will be the machine name of the paragraph type you are migrating into. You can do this when all the paragraphs for a particular migration definition file will be of the same type. If that is not the case, you can leave out the default_bundle configuration and add a mapping for the type entity property in the process section.

You can execute the paragraph migration with this command: drush migrate:import ud_migrations_paragraph_intro_paragraph. After running the migration, there is not much you can do to verify that it worked. Contrary to other entities, there is no user interface, available out of the box, that lists all paragraphs in the system. One way to verify if the migration worked is to manually create a View that shows paragraphs. Another way is to query the database directly. You can inspect the tables that store the paragraph fields' data. In this example, the tables would be:

  • paragraph__field_ud_book_paragraph_author for the current author.
  • paragraph__field_ud_book_paragraph_title for the current title.
  • paragraph_r__8c3a9563ac for all the author revisions.
  • paragraph_r__3fa7e9863a for all the title revisions.

Each of those tables contains information about the bundle (paragraph type), the entity id, the revision id, and the migrated field value. Table names are derived from the machine names of the fields. If they are too long, the field name will be hashed to produce a shorter table name. Having to query the database is not ideal. Unfortunately, the options available to check if a paragraph migration worked are limited at the moment.

The node migration

The node migration will serve as the host for both referenced entities: images and paragraphs. The image migration is very similar to the one explained in a previous article. This time, the focus will be the paragraph migration. Both of them are set as dependencies of the node migration, so they need to be executed in advance. The following snippet shows how the source, destinations, and dependencies are set:

source:
  plugin: embedded_data
  data_rows:
    - unique_id: 1
      name: 'Michele Metts'
      photo_file: 'P01'
      book_ref: 'B10'
    - unique_id: 2
      name: 'Benjamin Melançon'
      photo_file: 'P02'
      book_ref: 'B20'
    - unique_id: 3
      name: 'Stefan Freudenberg'
      photo_file: 'P03'
      book_ref: 'B30'
  ids:
    unique_id:
      type: integer
destination:
  plugin: 'entity:node'
  default_bundle: ud_paragraphs
migration_dependencies:
  required:
    - ud_migrations_paragraph_intro_image
    - ud_migrations_paragraph_intro_paragraph
  optional: []

Note that photo_file and book_ref both contain the unique identifier of records in the image and paragraph migrations, respectively. These can be used with the migration_lookup plugin to map the reference fields in the nodes to be migrated. ud_paragraphs is the machine name of the target content type.

The mapping of the image reference field follows the same pattern than the one explained in the article on migration dependencies. Using the migration_lookup plugin, you indicate which is the migration that should be searched for the images. You also specify which source column contains the unique identifiers that match those in the image migration. This operation will return a single value: the file ID (fid) of the image. This value can be assigned to the target_id subfield of field_ud_image to establish the relationship. The following code snippet shows how to do it:

field_ud_image/target_id:
  plugin: migration_lookup
  migration: ud_migrations_paragraph_intro_image
  source: photo_file

Paragraph field mappings

Before diving into the paragraph field mapping, let's think about what needs to be done. Paragraphs are revisioned entities. To make a reference to them, you need two IDs: their entity id and their entity revision id. These two values need to be assigned to two subfields of the paragraph reference field: target_id and target_revision_id respectively. You have to come up with a process pipeline that complies with this requirement. There are many ways to do it, and the specifics will depend on your field configuration. In this example, the paragraph reference field allows an unlimited number of paragraphs to be associated, but only of one type: ud_book_paragraph. Another thing to note is that even though the field allows you to add as many paragraphs as you want, the example migrates exactly one paragraph.

With those considerations in mind, the mapping of the paragraph field will be a two step process. First, use the migration_lookup plugin to get a reference to the paragraph. Second, use the fetched values to set the paragraph reference subfields. The following code snippet shows how to do it:

pseudo_mbe_book_paragraph:
  plugin: migration_lookup
  migration: ud_migrations_paragraph_intro_paragraph
  source: book_ref
field_ud_favorite_book:
  plugin: sub_process
  source:
    - '@pseudo_mbe_book_paragraph'
  process:
    target_id: '0'
    target_revision_id: '1'

The first step is a normal migration_lookup procedure. The important difference is that instead of getting a single value, like with images, the paragraph lookup operation will return an array of two values. The format is like [3, 7] where the 3 represents the entity id and the 7 represents the entity revision id of the paragraph. Note that the array keys are not named. To access those values, you would use the index of the elements starting with zero (0). This will be important later. The returned array is stored in the pseudo_mbe_book_paragraph pseudofield.

The second step is to set the target_id and target_revision_id subfields. In this example, field_ud_favorite_book is the machine name paragraph reference field. Remember that it is configured to accept an arbitrary number of paragraphs, and each will require passing an array of two elements. This means you need to process an array of arrays. To do that, you use the sub_process plugin to iterate over an array of paragraph references. In this example, the structure to iterate over would be like this:

[
  [3, 7]
]

Let's dissect how to do the mapping of the paragraph reference field. The source configuration of the sub_process plugin contains an array of paragraph references. In the example, that array has a single element: the '@pseudo_mbe_book_paragraph' pseudofield. The quotes (') and at sign (@) are required to reuse an element that appears before in the process pipeline. Then, in the process configuration, you set the subfields for the paragraph reference field. It is worth noting that at this point you are iterating over a list of paragraph references, even if that list contains only one element. If you had more than one paragraph to migrate, whatever you defined in process will apply to all of them.

The process configuration is an array of subfield mappings. The left side of the assignment is the name of the subfield you want to set. The right side of the assignment is an array index of the paragraph reference being processed. Remember that this array does not have named-keys so, you use their numerical index to refer to them. The example sets the target_id subfield to the element in the 0 index and the target_revision_id subfield to the element in the one 1 index. Using the example data, this would be target_id: 3 and target_revision_id: 7. The quotes around the numerical indexes are important. If not used, the migration will not find the indexes and the paragraphs will not be associated. The end result of this operation will be something like this:

'field_ud_favorite_book' => array (1) [
  array (2) [
    'target_id' => string (1) "3"
    'target_revision_id' => string (1) "7"
  ]
]

There are three ways to run the migrations: manually, executing dependencies, and using tags. The following code snippet shows the three options:

# 1) Manually.
$ drush migrate:import ud_migrations_paragraph_intro_image
$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node

# 2) Executing depenpencies.
$ drush migrate:import ud_migrations_paragraph_intro_node --execute-dependencies

# 3) Using tags.
$ drush migrate:import --tag='UD Paragraphs Intro'

And that is one way to map paragraph reference fields. In the end, all you have to do is set the target_id and target_revision_id subfields. The process pipeline that gets you to that point can vary depending on how your paragraphs are configured. The following is a non-exhaustive list of things to consider when migrating paragraphs:

  • How many paragraphs types can be referenced?
  • How many paragraphs instances are being migrated? Is this a multivalue field?
  • Do paragraphs have translations?
  • Do paragraphs have revisions?

Do migrated paragraphs disappear upon node rollback?

Paragraphs migrations are affected by a particular behavior of revisioned entities. If the host entity is deleted, and the paragraphs do not have translations, the whole paragraph gets deleted. That means that deleting a node will make the referenced paragraphs' data to be removed. How does this affect your migration workflow? If the migration of the host entity is rollback, then the paragraphs will be removed, the migrate API will not know about it. In this example, if you run a migrate status command after rolling back the node migration, you will see that the paragraph migration indicated that there are no pending elements to process. The file migration for the images will report the same, but in that case, the images will remain on the system.

In any migration project, it is common that you do rollback operations to test new field mappings or fix errors. Thus, chances are very high that you will stumble upon this behavior. Thanks to Damien McKenna for helping me understand this behavior and tracking it to the rollback method of the EntityReferenceRevisions destination plugin. So, what do you do to recover the deleted paragraphs? You have to rollback both migrations: node and paragraph. And then, you have to import the two again. The following snippet shows how to do it:

# 1) Rollback both migrations.
$ drush migrate:rollback ud_migrations_paragraph_intro_node
$ drush migrate:rollback ud_migrations_paragraph_intro_paragraph

# 2) Import both migrations againg.

$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node

What did you learn in today's blog post? Have you migrated paragraphs before? If so, what challenges have you found? Did you know paragraph reference fields require two subfields to be set? Did you that deleting the host entity also deletes referenced paragraphs? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.