31dom/27.md
Mauricio Dinarte ba253739b1 Rename files
2023-08-04 10:43:39 -06:00

15 KiB

How to debug Drupal migrations? - Part 1

Throughout the series we have showed many examples. I do not recall any of them working on the first try. When working on Drupal migrations, it is often the case that things do not work right away. Today's article is the first of a two part series on debugging Drupal migrations. We start giving some recommendations of things to do before diving deep into debugging. Then, we are going to talk about migrate messages and presented the log process plugin. Let's get started.

Minimizing the surface for errors

The Migrate API is a very powerful ETL framework that interacts with many systems provided by Drupal core and contributed modules. This adds layers of abstraction that can make the debugging process more complicated compared to other systems. For instance, if something fails with a remote JSON migration, the error might be produced in the Migrate API, the Entity API, the Migrate Plus module, the Migrate Tools module, or even the Guzzle HTTP Client library that fetches the file. For a more concrete example, while working on a recent article, I stumbled upon an issue that involved three modules. The problem was that when trying to rollback a CSV migration from the user interface an exception will be thrown making the operation fail. This is related to an issue in the core Migrate API that manifests itself when rollback operations are initiated from the interface provided by Migrate Plus. Then, the issue causes a condition in the Migrate Source CSV plugin that fails and the exception is thrown.

In general, you should aim to minimize the surface for errors. One way to do this by starting the migration with the minimum possible set up. For example, if you are going to migrate nodes, start by configuring the source plugin, one field (the title), and the destination. When that works, keep migrating one field at a time. If the field has multiple subfields, you can even migrate one subfield at a time. Commit every progress to version control so you can go back to a working state if things go wrong. Read this article for more recommendations on writing migrations.

What to check first?

Debugging is a process that might involve many steps. There are a few things that you should check before diving too deep into trying to find the root of the problem. Let's begin with making sure that changes to your migrations are properly detected by the system. One common question I see people ask is where to place the migration definition files. Should they go in the migrations or config/install directory of your custom module? The answer to this is whether you want to manage your migrations as code or configuration. Your choice will determine the workflow to follow for changes in the migration files to take effect. Migrations managed in code go in the migrations directory and require rebuilding caches for changes to take effect. On the other hand, migrations managed in configuration are placed in the config/install directory and require configuration synchronization for changes to take effect. So, make sure to follow the right workflow.

After verifying that your changes are being applied, the next thing to do is verify that the modules that provide your plugins are enabled and the plugins themselves are properly configured. Look for typos in the configuration options. Always refer to the official documentation to know which options are available and find the proper spelling of them. Other places to look at is the code for the plugin definition or articles like the ones in this series documenting how to use them. Things to keep in mind include proper indentation of the configuration options. An extra whitespace or a wrong indentation level can break the migration. You can either get a fatal error or the migration can fail silently without producing the expected results. Something else to be mindful is the version of the modules you are using because the configuration options might change per version. For example, the newly released 8.x-3.x branch of Migrate Source CSV changed various configuration options as described in this change record. And the 8.x-5.x branch of Migrate Plus changed some configurations for plugin related with DOM manipulation as described in this change record. Keeping an eye on the issue queue and change records for the different modules you use is always a good idea.

If the problem persists, look for reports of similar problems in the issue queue. Make sure to include closed issues as well in case your problem has been fixed or documented already. Remember that a problem in a module can affect a different module. Keeping an eye on the issue queue and change records for all the modules you use is always a good idea. Another place ask questions is the #migrate channel in Drupal slack. The support that is offered there is fantastic.

Migration messages

If nothing else has worked, it is time to investigate what is going wrong. In case the migration outputs an error or a stacktrace to the terminal, you can use that to search in the code base where the problem might originate. But if there is no output or if the output is not useful, the next thing to do is check the migration messages.

The Migrate API allows plugins to log messages to the database in case an error occurs. Not every plugin leverages this functionality, but it is always worth checking if a plugin in your migration wrote messages that could give you a hint of what went wrong. Some plugins like skip_on_empty and skip_row_if_not_set even expose a configuration option to specify messages to log. To check the migration messages use the following Drush command: drush migrate:messages [migration_id]. If you are managing migrations as configuration, the interface provided by Migrate Plus also exposes them.

Messages are logged separately per migration, even if you run multiple migrations at once. This could happen if you execute dependencies or use groups or tags. In those cases, errors might be produced in more than one migration. You will have to look at the messages for each of them individually.

Let's consider the following example. In the source there is a field called src_decimal_number with values like 3.1415, 2.7182, and 1.4142. It is needed to separate the number into two components: the integer part (3) and the decimal part (1415). For this, we are going to use the extract process plugin. Errors will be purposely introduced to demonstrate the workflow to check messages and update migrations. The following example shows the process plugin configuration and the output produced by trying to import the migration:

# Source values: 3.1415, 2.7182, and 1.4142

psf_number_components:
  plugin: explode
  source: src_decimal_number
$ drush mim ud_migrations_debug
[notice] Processed 3 items (0 created, 0 updated, 3 failed, 0 ignored) - done with 'ud_migrations_debug'

In MigrateToolsCommands.php line 811:
ud_migrations_debug Migration - 3 failed.

The error produced in the console does not say much. Let's see if any messages were logged using: drush migrate:messages ud_migrations_debug. In the previous example, the messages will look like this:

 ------------------- ------- --------------------
  Source IDs Hash    Level   Message
 ------------------- ------- --------------------
  7ad742e...732e755   1       delimiter is empty
  2d3ec2b...5e53703   1       delimiter is empty
  12a042f...1432a5f   1       delimiter is empty
 ------------------------------------------------

In this case, the migration messages are good enough t o let us know what is wrong. The required delimiter configuration option was not set. When an error occurs, usually you need to perform at least three steps:

  • Rollback the migration. This will also clear the messages.
  • Make changes to definition file and make they are applied. This will depend on whether you are managing the migrations as code or configuration.
  • Import the migration again.

Let's say we performed these steps, but we got an error again. The following snippet shows the updated plugin configuration and the messages that were logged:

psf_number_components:
  plugin: explode
  source: src_decimal_number
  delimiter: '.'
 ------------------- ------- ------------------------------------
  Source IDs Hash    Level   Message
 ------------------- ------- ------------------------------------
  7ad742e...732e755   1       3.1415000000000002 is not a string
  2d3ec2b...5e53703   1       2.7181999999999999 is not a string
  12a042f...1432a5f   1       1.4141999999999999 is not a string
 ----------------------------------------------------------------

The new error occurs because the explode operation works on strings, but we are providing numbers. One way to fix this is to update the source to add quotes around the number so it is treated as a string. This is of course not ideal and many times not even possible. A better way to make it work is setting the strict option to false in the plugin configuration. This will make sure to cast the input value to a string before applying the explode operation. This demonstrates the importance of reading the plugin documentation to know which options are at your disposal. Of course, you can also have a look at the plugin code to see how it works.

Note: Sometimes the error produces an non-recoverable condition. The migration can be left in a status of "Importing" or "Reverting". Refer to this article to learn how to fix this condition.

The log process plugin

In the example, adding the extra configuration option will make the import operation finish without errors. But, how can you be sure the expected values are being produced? Not getting an error does not necessarily mean that the migration works as expected. It is possible that the transformations being applied do not yield the values we think or the format that Drupal expects. This is particularly true if you have complex process plugin chains. As a reminder, we want to separate a decimal number from the source like 3.1415 into its components: 3 and 1415.

The log process plugin can be used for checking the outcome of plugin transformations. This plugin offered by the core Migrate API does two things. First, it logs the value it receives to the messages table. Second, the value is returned unchanged so that it can be used in process chains. The following snippets show how to use the log plugin and what is stored in the messages table:

psf_number_components:
  - plugin: explode
    source: src_decimal_number
    delimiter: '.'
    strict: false
  - plugin: log
 ------------------- ------- --------
  Source IDs Hash    Level   Message
 ------------------- ------- --------
  7ad742e...732e755   1       3
  7ad742e...732e755   1       1415
  2d3ec2b...5e53703   1       2
  2d3ec2b...5e53703   1       7182
  12a042f...1432a5f   1       1
  2d3ec2b...5e53703   1       4142
 ------------------------------------

Because the explode plugin produces an array, each of the elements is logged individually. And sure enough, in the output you can see the numbers being separated as expected.

The log plugin can be used to verify that source values are being read properly and process plugin chains produce the expected results. Use it as part of your debugging strategy, but make sure to remove it when done with the verifications. It makes the migration to run slower because it has to write to the database. The overhead is not needed once you verify things are working as expected.

In the next article, we are going to cover the Migrate Devel module, the debug process plugin, recommendations for using a proper debugger like XDebug, and the migrate:fields-source Drush command.

What did you learn in today's blog post? What workflow do you follow to debug a migration issue? Have you ever used the log process plugin for debugging purposes? If so, how did it help to solve the issue? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.