31dom/28.md
Mauricio Dinarte ba253739b1 Rename files
2023-08-04 10:43:39 -06:00

20 KiB

How to debug Drupal migrations? - Part 2

In the previous article we began talking about debugging Drupal migrations. We gave some recommendations of things to do before diving deep into debugging. We also introduced the log process plugin. Today, we are going to show how to use the Migrate Devel module and the debug process plugin. Then we will give some guidelines on using a real debugger like XDebug. Next, we will share tips so you get used to migration errors. Finally, we are going to briefly talk about the migrate:fields-source Drush command. Let's get started.

The migrate_devel module

The Migrate Devel module is very helpful for debugging migrations. It allows you to visualize the data as it is received from the source, the result of field transformation in the process pipeline, and values that are stored in the destination. It works by adding extra options to Drush commands. When these options are used, you will see more output in the terminal with details on how rows are being processed.

As of this writing, you will need to apply a patch to use this module. Migrate Devel was originally written for Drush 8 which is still supported, but no longer recommended. Instead, you should use at least version 9 of Drush. Between 8 and 9 there were major changes in Drush internals.  Commands need to be updated to work with the new version. Unfortunately, the Migrate Devel module is not fully compatible with Drush 9 yet. Most of the benefits listed in the project page have not been ported. For instance, automatically reverting the migrations and applying the changes to the migration files is not yet available. The partial support is still useful and to get it you need to apply the patch from this issue. If you are using the Drush commands provided by Migrate Plus, you will also want to apply this patch. If you are using the Drupal composer template, you can add this to your composer.json to apply both patches:

"extra": {
  "patches": {
    "drupal/migrate_devel": {
      "drush 9 support": "https://www.drupal.org/files/issues/2018-10-08/migrate_devel-drush9-2938677-6.patch"
    },
    "drupal/migrate_tools": {
      "--limit option": "https://www.drupal.org/files/issues/2019-08-19/3024399-55.patch"
    }
  }
}

With the patchs applied and the modules installed, you will get two new command line options for the migrate:import command: --migrate-debug and --migrate-debug-pre. The major difference between them is that the latter runs before the destination is saved. Therefore, --migrate-debug-pre does not provide debug information of the destination.

Using any of the flags will produce a lot of debug information for each row being processed. Many time sanalyzing a subset of the records is enough to stop potential issues. The patch to Migrate Tools will allow you to use the --limit and --idlist options with the migrate:import command to limit the number of elements to process.

To demonstrate the output generated by the module, let's use the image migration from the CSV source example. You can get the code at https://github.com/dinarcon/ud_migrations. The following snippets how to execute the import command with the extra debugging options and the resulting output:

# Import only one element.
$ drush migrate:import udm_csv_source_image --migrate-debug --limit=1

# Use the row's unique identifier to limit which element to import.
$ drush migrate:import udm_csv_source_image --migrate-debug --idlist="P01"
$ drush migrate:import udm_csv_source_image --migrate-debug --limit=1
┌──────────────────────────────────────────────────────────────────────────────┐
│                                   $Source                                    │
└──────────────────────────────────────────────────────────────────────────────┘
array (10) [
    'photo_id' => string (3) "P01"
    'photo_url' => string (74) "https://udrupal.com/photos/freescholar.jpg"
    'path' => string (76) "modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv"
    'ids' => array (1) [
        string (8) "photo_id"
    ]
    'header_offset' => NULL
    'fields' => array (2) [
        array (2) [
            'name' => string (8) "photo_id"
            'label' => string (8) "Photo ID"
        ]
        array (2) [
            'name' => string (9) "photo_url"
            'label' => string (9) "Photo URL"
        ]
    ]
    'delimiter' => string (1) ","
    'enclosure' => string (1) """
    'escape' => string (1) "\"
    'plugin' => string (3) "csv"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│                                 $Destination                                 │
└──────────────────────────────────────────────────────────────────────────────┘
array (4) [
    'psf_destination_filename' => string (25) "picture-15-1421176712.jpg"
    'psf_destination_full_path' => string (25) "picture-15-1421176712.jpg"
    'psf_source_image_path' => string (74) "https://udrupal.com/photos/freescholar.jpg"
    'uri' => string (29) "./picture-15-1421176712_6.jpg"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│                             $DestinationIDValues                             │
└──────────────────────────────────────────────────────────────────────────────┘
array (1) [
    string (1) "3"
]
════════════════════════════════════════════════════════════════════════════════
Called from +56 /var/www/drupalvm/drupal/web/modules/contrib/migrate_devel/src/EventSubscriber/MigrationEventSubscriber.php
 [notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_csv_source_image'

In the terminal you can see the data as it is passed along in the Migrate API. In the $Source, you can see how the source plugin was configured and the different columns for the row being processed. In the $Destination, you can see all the fields that were mapped in the process section and their values after executing all the process plugin transformation. In $DestinationIDValues, you can see the unique identifier of the destination entity that was created. This migration created an image so the destination array has only one element: the file ID (fid). For paragraphs, which are revisioned entities, you will get two values: the id and the revision_id. The following snippet shows the $Destination and  $DestinationIDValues sections for the paragraph migration in the same example module:

$ drush migrate:import udm_csv_source_paragraph --migrate-debug --limit=1
┌──────────────────────────────────────────────────────────────────────────────┐
│                                   $Source                                    │
└──────────────────────────────────────────────────────────────────────────────┘
Omitted.
┌──────────────────────────────────────────────────────────────────────────────┐
│                                 $Destination                                 │
└──────────────────────────────────────────────────────────────────────────────┘
array (3) [
    'field_ud_book_paragraph_title' => string (32) "The definitive guide to Drupal 7"
    'field_ud_book_paragraph_author' => string UTF-8 (24) "Benjamin Melançon et al."
    'type' => string (17) "ud_book_paragraph"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│                             $DestinationIDValues                             │
└──────────────────────────────────────────────────────────────────────────────┘
array (2) [
    'id' => string (1) "3"
    'revision_id' => string (1) "7"
]
════════════════════════════════════════════════════════════════════════════════
Called from +56 /var/www/drupalvm/drupal/web/modules/contrib/migrate_devel/src/EventSubscriber/MigrationEventSubscriber.php
 [notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_csv_source_paragraph'

The debug process plugin

The Migrate Devel module also provides a new process plugin called debug. The plugin works by printing the value it receives to the terminal. As Benji Fisher explains in this issue, the debug plugin offers the following advantages over the log plugin provided by the core Migrate API:

  • The use of print_r handles both arrays and scalar values gracefully.
  • It is easy to differentiate debugging code that should be removed from logging plugin configuration that should stay.
  • It saves time as there is no need to run the migrate:messages command to read the logged values.

In short, you can use the debug plugin in place of log. There is a particular case where using debug is really useful. If used in between of a process plugin chain, you can see how elements are being transformed in each step. The following snippet shows an example of this setup and the output it produces:

field_tags:
  - plugin: skip_on_empty
    source: src_fruit_list
    method: process
    message: 'No fruit_list listed.'
  - plugin: debug
    label: 'Step 1: Value received from the source plugin: '
  - plugin: explode
    delimiter: ','
  - plugin: debug
    label: 'Step 2: Exploded taxonomy term names '
    multiple: true
  - plugin: callback
    callable: trim
  - plugin: debug
    label: 'Step 3: Trimmed taxonomy term names '
  - plugin: entity_generate
    entity_type: taxonomy_term
    value_key: name
    bundle_key: vid
    bundle: tags
  - plugin: debug
    label: 'Step 4: Generated taxonomy term IDs '
$ drush migrate:import udm_config_entity_lookup_entity_generate_node --limit=1
Step 1: Value received from the source plugin: Apple, Pear, Banana
Step 2: Exploded taxonomy term names Array
(
    [0] => Apple
    [1] =>  Pear
    [2] =>  Banana
)
Step 3: Trimmed taxonomy term names Array
(
    [0] => Apple
    [1] => Pear
    [2] => Banana
)
Step 4: Generated taxonomy term IDs Array
(
    [0] => 2
    [1] => 3
    [2] => 7
)
 [notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_config_entity_lookup_entity_generate_node'

The process pipeline is part of the node migration from the entity_generate plugin example. In the code snippet, a debug step is added after each plugin in the chain. That way, you can verify that the transformations are happening as expected. In the last step you get an array of the taxonomy term IDs (tid) that will be associated to the field_tags field. Note that this plugin accepts two optional parameters:

  • label is a string to print before the debug output. It can be used to give context of what is being printed.
  • multiple is a boolean that when set to true signals the next plugin in the pipeline to process each element of an array individually. The functionality is similar to the multiple_values plugin provided by Migrate Plus.

Using the right tool for the job: a debugger

Many migration issues can be solved by following the recommendations from the previous article and the tools provided by Migrate Devel. But there are problems so complex that you need a full blown debugger. The many layers of abstraction in Drupal, and the fact that multiple modules might be involved a single migration, makes the use of debuggers very appealing. With them, you can step through each line of code across multiple files and see how each variables changes over time.

In the next article we will explain how to configure XDebug to work with PHPStorm and DrupalVM. For now, let's consider where are good places to add breakpoints. In this article, Lucas Hedding recommends adding them in:

  • The import method of the MigrateExecutable class.
  • The processRow method of the MigrateExecutable class.
  • The process plugin if you know which one might be causing an issue. The transform method is a good place to set the breakpoint.

The use of a debugger is no guarantee that you will find the solution to your issue. It will depend on many factors including your familiarity with the system and how deep lies the problem. Previous debugging experience, even if not directly related to migrations, will help a lot. Do not get discouraged if it takes you too much time to discover what is causing the problem or if you cannot find it at all. Each time you will get a better understanding of the system.

Adam Globus-Hoenich, a migrate maintainer, once told me that the Migrate API "is impossible to understand for people that are not migrate maintainers." That was after spending about an hour together trying to debug an issue and failing to make it work. I mention this not with the intention to discourage you. But to illustrate that no single person knows everything about the Migrate API and even their maintainers can have a hard time debugging issues. Personally, I have spent countless hours in the debugger tracking how the data flows from the source to the destination entities. It is mind blowing and I barely understand what is going on. The community has come together to produce a fantastic piece of software. Anyone who uses the Migrate API is standing on the shoulders of giants.

If it is not broken, break it on purpose

One of the best ways to reduce the time you spend debugging an issue is having experience with a similar problem. A great way to learn to learn is finding a working example and breaking it on purpose. This will let you get familiar with the requirements and assumptions made by the system and the errors it produces.

Throughout the series, we have created many examples. We have made our best effort to explain how each example work. But we were not able to document every detail in the articles. In part to keep them within a reasonable length. But also, because we do not fully comprehend the system. In any case, we highly encourage you to take the examples and break them in every imaginable way. Making one change at a time, see how the migration behaves and what errors are produced. These are some things to try:

  • Do not leave a space after a colon (:) when setting a configuration option. Example: id:this_is_going_to_be_fun.
  • Change the indentation of plugin definitions.
  • Try to use a plugin provided by a contributed module that is not enabled.
  • Do not set a required plugin configuration option.
  • Leave out a full section like source, process, or destination.
  • Mix the upper and lowercase letters in configuration options, variables, pseudofields, etc.
  • Try to convert a migration managed as code to configuration; and vice versa.

The migrate:fields-source Drush command

Before wrapping up the discussion on debugging migrations, let's quicky cover the migrate:fields-source Drush command. It lists all the fields available in the source that can be used later in the process section. Many source plugins require that you manually set the list of fields to fetch from the source. Because of this, the information provided by this command is redundant most of the time. However, it is particularly useful with CSV source migrations. The CSV plugin automatically includes all the columns in the file. Executing this command will let you know which columns are available. For example, running drush migrate:fields-source udm_csv_source_node produces the following output in the terminal:

$ drush migrate:fields-source udm_csv_source_node
 -------------- -------------
  Machine Name   Description
 -------------- -------------
  unique_id      unique_id
  name           name
  photo_file     photo_file
  book_ref       book_ref
 -------------- -------------

The migration is part of the CSV source example. By running the command you can see that the file contains four columns. The values under "Machine Name" are the ones you are going to use for field mappings in the process section. The Drush command has a --format option that lets you change the format of the output. Execute drush migrate:fields-source --help to get a list of valid formats.

What did you learn in today's blog post? Have you ever used the migrate devel module for debugging purposes? What is your strategy when using a debugger like XDebug? Any debugging tips that have been useful to you? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.