31dom/06.md

110 lines
12 KiB
Markdown
Raw Normal View History

2020-10-04 05:01:10 +00:00
# Tips for writing Drupal migrations and understanding their workflow
2023-08-15 12:07:17 +00:00
We have presented several examples so far. They started very simple and have been increasing in complexity. Until now, we have been rather optimistic. Get the sample code, install any module dependency, enable the module that defines the migration, and execute it assuming everything works on the first try. But Drupal migrations often involve a bit of trial and error. At the very least, it is an iterative process. In this chapter we are going to see what happens after **import** and **rollback** operations, how to **recover from a failed migration**, and some **tips for writing the migration plugin files**.
2020-10-04 05:01:10 +00:00
## Importing and rolling back migrations
2023-08-15 12:07:17 +00:00
When working on a migration project, it is common to write many migration plugin files. Even if you were to have only one, it is very likely that your destination will require many field mappings. Running an _import_ operation to get the data into Drupal is the first step. With so many moving parts, it is easy not to get the expected results on the first try. When that happens, you can run a _rollback_ operation. This instructs the system to revert anything that was introduced when then migration was initially imported. After rolling back, you can make changes to the migration plugin file and rebuild Drupal's cache for the system to pick up your changes. Finally, you can do another _import_ operation. Repeat this process until you get the results you expect. The following code snippet shows a basic Drupal migration workflow:
2020-10-04 05:01:10 +00:00
2020-10-04 19:12:49 +00:00
```console
2020-10-04 05:01:10 +00:00
# 1) Run the migration.
2023-08-15 12:07:17 +00:00
$ drush migrate:import first_example
2020-10-04 05:01:10 +00:00
# 2) Rollback migration because the expected results were not obtained.
2023-08-15 12:07:17 +00:00
$ drush migrate:rollback first_example
2020-10-04 05:01:10 +00:00
2023-08-15 12:07:17 +00:00
# 3) Change the migration plugin file.
2020-10-04 05:01:10 +00:00
2023-08-15 12:07:17 +00:00
# 4) Rebuild caches for changes to take effect.
2020-10-04 05:01:10 +00:00
$ drush cache:rebuild
# 5) Run the migration again.
2023-08-15 12:07:17 +00:00
$ drush migrate:import first_example
2020-10-04 05:01:10 +00:00
```
2023-08-15 12:07:17 +00:00
In all cases, `first_example` is the `id` of the migration to run. You can find the full code in Chapter 2!!!. Anytime you modify the plugin files, you need to rebuild Drupal's caches for the changes to take effect. This is the procedure to follow when creating the YAML files using Migrate API core features and placing them under the `migrations` directory.
2020-10-04 05:01:10 +00:00
2023-08-15 12:07:17 +00:00
_Note_: It is also possible to define migrations as configuration entities using the Migrate Plus module. In those cases, the YAML files follow a different naming convention and are placed under the `config/install` directory. For picking up changes in this case, you need to sync the YAML definition using [configuration management](https://www.drupal.org/docs/configuration-management/managing-your-sites-configuration) workflows. This will be covered in chapter !!!.
2020-10-04 05:01:10 +00:00
## Stopping and resetting migrations
Sometimes, you do not get the expected results due to an oversight in setting a value. On other occasions, fatal PHP errors can occur when running the migration. The Migrate API might not be able to recover from such errors. For example, using a non-existent PHP function with the `callback` plugin. When these errors happen, the migration is left in a state where no _import_ or _rollback_ operations could be performed.
You can check the state of any migration by running the `drush migrate:status` command. Ideally, you want them in `Idle` state. When something fails during import or rollback you would get the `Importing` or `Rolling back` states. To get the migration back to `Idle` you stop the migration and reset its status. The following snippet shows how to do it:
2020-10-04 19:12:49 +00:00
```console
2020-10-04 05:01:10 +00:00
# 1) Run the migration.
2023-08-15 12:07:17 +00:00
$ drush migrate:import first_example
2020-10-04 05:01:10 +00:00
# 2) Some non recoverable error occurs. Check the status of the migration.
2023-08-15 12:07:17 +00:00
$ drush migrate:status first_example
2020-10-04 05:01:10 +00:00
# 3) Stop the migration.
2023-08-15 12:07:17 +00:00
$ drush migrate:stop first_example
2020-10-04 05:01:10 +00:00
# 4) Reset the status to idle.
2023-08-15 12:07:17 +00:00
$ drush migrate:reset-status first_example
2020-10-04 05:01:10 +00:00
# 5) Rollback migration because the expected results were not obtained.
2023-08-15 12:07:17 +00:00
$ drush migrate:rollback first_example
2020-10-04 05:01:10 +00:00
2023-08-15 12:07:17 +00:00
# 6) Change the migration plugin file.
2020-10-04 05:01:10 +00:00
2023-08-15 12:07:17 +00:00
# 7) Rebuild caches for changes to take effect.
2020-10-04 05:01:10 +00:00
$ drush cache:rebuild
# 8) Run the migration again.
2023-08-15 12:07:17 +00:00
$ drush migrate:import first_example
2020-10-04 05:01:10 +00:00
```
2023-08-15 12:07:17 +00:00
All the commands above are provided by Drush itself. The Migrate Tools module is a migration runner that offers additional commands and provides extra features to some of the ones listed above. It adds the `migrate:tree` command to show a tree of migration dependencies. For `migrate:import`, it add additional flags like `--group` and `--continue-on-failure`. You can use the following to know which commands are available in your installation and get details on usage:
```console
# List of all migration related commands.
$ drush list --filter=migrate
# Get information on usage and available flags using --help
$ drush [command-name] --help
```
_Tip_: You can use Drush command aliases to write shorter commands. For example, `drush mim first_example` to start an import operation. When using `drush list` that aliases appear in parenthesis next to the command name.
The errors thrown by the Migrate API might not provide enough information to determine what went wrong. An excellent way to familiarize yourselves with possible errors is by intentionally braking working migrations. In the example repository for this book there are many migrations you can modify. Try anything that comes to mind: not leaving a space after a _colon_ (**:**) in a key-value assignment; not using proper indentation; using wrong subfield names; using invalid values in property assignments; etc. You might be surprised by how Migrate API deals with such errors. Also note that many other Drupal APIs are involved. For example, you might get a YAML file parse error or an [Entity API](https://www.drupal.org/docs/8/api/entity-api) save exception. When you have seen an error before, it is usually faster to identify the cause and fix it in the future. In chapters !!! we will talk about debugging migrations.
2020-10-04 05:01:10 +00:00
## What happens when you rollback a Drupal migration?
In an ideal scenario, when a migration is rolled back it _cleans after itself_. That means, it removes any entity that was created during the _import_ operation: nodes, taxonomy terms, files, etc. Unfortunately, that is not always the case. It is very important to understand this when planning and executing migrations. For example, you might not want to leave taxonomy terms or files that are no longer in use. Whether any dependent entity is removed or not has to do with how plugins or entities work.
For example, when using the `file_import` or `image_import` plugins provided by [Migrate File](https://www.drupal.org/project/migrate_file), the created files and images are not removed from the system upon rollback. When using the `entity_generate` plugin from Migrate Plus, the created entity also remains in the system after a _rollback_ operation.
In the next chapter we are going to start talking about migration dependencies. What happens with dependent migrations (e.g. files and paragraphs) when the migration for host entity (e.g. node) is rolled back? In this case, the Migrate API will perform an entity delete operation on the node. When this happens, referenced files are kept in the system, but paragraphs are automatically deleted. For the curious, this behavior for paragraphs is actually determined by its module dependency: [Entity Reference Revisions](https://www.drupal.org/project/entity_reference_revisions). We will talk more about paragraphs migrations in future chapters.
2023-08-15 12:07:17 +00:00
The moral of the story is that the resulting state of the system might be affected by other Drupal APIs. And in the case of _rollback_ operations, make sure to read the documentation or test manually to find out when migrations clean after themselves and when they do not.
2020-10-04 05:01:10 +00:00
_Note_: The focus of this section was [content entity](https://www.drupal.org/docs/8/api/entity-api/content-entity) migrations. The general idea can be applied to [configuration entities](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-configuration) or any custom target of the ETL process.
## Re-import or update migrations
2023-08-15 12:07:17 +00:00
We just mentioned that Migrate API triggers an entity delete action when rolling back a migration. This has another important side effect. Entity IDs (`nid`, `uid`, `tid`, `fid`, `mid`, etc.) are going to change every time you _rollback_ and _import_ again. Depending on auto generated IDs is generally not a good idea. But keep it in mind in case your workflow might rely on them. For example, if you are running migrations in a content staging environment, references to the migrated entities can break if their IDs change. Also, if you were to manually update the migrated entities to clean up edge cases, those changes would be lost if you _rollback_ and _import_ again. As described in the previous section, test data might remain in the system after a rollback so make sure to clean things up when deploying to production environments.
2020-10-04 05:01:10 +00:00
2023-08-15 12:07:17 +00:00
An alternative to rolling back a migration is to not execute this operation at all. Instead, you run an _import_ operation again using the `--update` flag. This tells the system that in addition to migrating unprocessed items from the source, you also want to update items that were previously imported using their current values. To do this, the Migrate API relies on _source identifiers_ and _map tables_. You might want to consider this option when your source changes overtime, when you have a large number of records to import, or when you want to execute the same migration many times on a schedule.
2020-10-04 05:01:10 +00:00
_Note_: On import operations, the Migrate API issues an entity save action.
## Tips for writing Drupal migrations
2023-08-15 12:07:17 +00:00
When working on migration projects, you might end up with many migration plugin files. They can set dependencies on each other. Each file might contain a significant number of field mappings. There are many things you can do to make Drupal migrations more straightforward. For example, practicing with different migration scenarios and studying working examples. As a reference to help you in the process of migrating into Drupal, consider these tips:
2020-10-04 05:01:10 +00:00
- Start from an existing migration. Look for an example online that does something close to what you need and modify it to your requirements.
- Pay close attention to the syntax of the YAML file. An extraneous space or wrong indentation level can break the whole migration.
- Read the documentation to know which source, process, and destination plugins are available. One might exist already that does exactly what you need.
- Make sure to read the documentation for the specific plugins you are using. Many times a plugin offer optional configurations. Understand the tools at your disposal and find creative ways to combine them.
- Look for [contributed modules](https://www.drupal.org/project/project_module?f%5B0%5D=&f%5B1%5D=&f%5B2%5D=im_vid_3%3A64&f%5B3%5D=drupal_core%3A7234&f%5B4%5D=sm_field_project_type%3Afull&f%5B5%5D=&f%5B6%5D=&text=&solrsort=iss_project_release_usage+desc&op=Search) that might offer more plugins or upgrade paths from previous versions of Drupal. The Migrate ecosystem is vibrant and lots of people are contributing to it.
- When writing the migration pipeline, map one field at a time. Problems are easier to isolate if there is only one thing that could break at a time.
- When mapping a field, work on one subfield at a time if possible. Some field types like images and addresses offer many subfields. Again, try to isolate errors by introducing individual changes each time.
- There is no need to do every data transformation using the Migrate API. When there are edge cases, you can manually update those after the automated migration is **completed**. That is, no more rollback operations. You can also clean up the source data in advance to make it easier to process in Drupal.
- Commit to your code repository any and every change that produces right results. That way you can go back in time and recover a partially working migration.
- Learn about [debugging migrations](https://www.drupal.org/docs/8/api/migrate-api/debugging-migrations). We will talk about this topic in a future chapter.
- See help from the community. Migrate maintainers and enthusiasts are very active and responsive in the #migrate channel of Drupal slack.
- If you feel stuck, take a break from the computer and come back to it later. Resting can do wonders in finding solutions to hard problems.