16 KiB
Understanding the entity_lookup and entity_generate process plugins from Migrate Tools
In recent posts we have explored the Migrate Plus and Migrate Tools modules. They extend the Migrate API to provide migrations defined as configuration entities, groups to share configuration among migrations, a user interface to execute migrations, among other things. Yet another benefit of using Migrate Plus is the option to leverage the many process plugins it provides. Today, we are going to learn about two of them: entity_lookup
and entity_generate
. We are going to compare them with the migration_lookup
plugin, show how to configure them, and explain their compromises and limitations. Let's get started.
What is the difference among the migration_lookup, entity_lookup, entity_generate plugins?
In the article about migration dependencies we covered the migration_lookup
plugin provided by the core Migrate API. It lets you maintain relationships among entities that are being imported. For example, if you are migrating a node that has associated users, taxonomy terms, images, paragraphs, etc. This plugin has a very important restriction: the related entities must come from another migration. But what can you do if you need to reference entities that already exists system? You might already have users in Drupal that you want to assign as node authors. In that case, the migration_lookup
plugin cannot be used, but entity_lookup
can do the job.
The entity_lookup
plugin is provided by the Migrate Plus module. You can use it to query any entity in the system and get its unique identifier. This is often used to populate entity reference fields, but it can be used to set any field or property in the destination. For example, you can query existing users and assign the uid
node property which indicates who created the node. If no entity is found, the module returns a NULL
value which you can use in combination of other plugins to provide a fallback behavior. The advantage of this plugin is that it does not require another migration. You can query any entity in the entire system.
The entity_generate
plugin, also provided by the Migrate Plus module, is an extension of entity_lookup
. If no entity is found, this plugin will automatically create one. For example, you might have a list of taxonomy terms to associate with a node. If some of the terms do not exist, you would like to create and relate them to the node.
Note: The migration_lookup
offers a feature called stubbing that neither entity_lookup
nor entity_generate
provides. It allows you to create a placeholder entity that will be updated later in the migration process. For example, in a hierarchical taxonomy terms migration, it is possible that a term is migrated before its parent. In that case, a stub for the parent will be created and later updated with the real data.
Getting the example code
You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD Config entity_lookup and entity_generate examples
whose machine name is ud_migrations_config_entity_lookup_entity_generate
. It comes with one JSON migrations: udm_config_entity_lookup_entity_generate_node
. Read this article for details on migrating from JSON files. The following snippet shows a sample of the file:
{
"data": {
"udm_nodes": [
{
"unique_id": 1,
"thoughtful_title": "Amazing recipe",
"creative_author": "udm_user",
"fruit_list": "Apple, Pear, Banana"
},
{...},
{...},
{...}
]
}
}
Additionally, the example module creates three users upon installation: 'udm_user', 'udm_usuario', and 'udm_utilisateur'. They are deleted automatically when the module is uninstalled. They will be used to assign the node authors. The example will create nodes of types "Article" from the standard installation profile. You can execute the migration from the interface provided by Migrate Tools at /admin/structure/migrate/manage/default/migrations
.
Using the entity_lookup to assign the node author
Let's start by assigning the node author. The following snippet shows how to configure the entity_lookup
plugin to assign the node author:
uid:
- plugin: entity_lookup
entity_type: user
value_key: name
source: src_creative_author
- plugin: default_value
default_value: 1
The uid
node property is used to assign the node author. It expects an integer value representing a user ID (uid
). The source data contains usernames so we need to query the database to get the corresponding user IDs. The users that will be referenced were not imported using the Migrate API. They were already in the system. Therefore, migration_lookup
cannot be used, but entity_lookup
can.
The plugin is configured using three keys. entity_type
is set to machine name of the entity to query: user
in this case. value_key
is the name of the entity property to lookup. In Drupal, the usernames are stored in a property called name
. Finally, source
specifies which field from the source contains the lookup value for the name
entity property. For example, the first record has a src_creative_author
value of udm_user
. So, this plugin will instruct Drupal to search among all the users in the system one whose name
(username) is udm_user
. If a value if found, the plugin will return the user ID. Because the uid
node property expects a user ID, the return value of this plugin can be used directly to assign its value.
What happens if the plugin does not find an entity matching the conditions? It returns a NULL
value. Then it is up to you to decide what to do. If you let the NULL
value pass through, Drupal will take some default behavior. In the case of the uid
property, if the received value is not valid, the node creation will be attributed to the anonymous user (uid: 0). Alternatively, you can detect if NULL
is returned and take some action. In the example, the second record specifies the "udm_not_found" user which does not exists. To accommodate for this, a process pipeline is defined to manually specify a user if entity_lookup
did not find one. The default_value
plugin is used to return 1
in that case. The number represents a user ID, not a username. Particularly, this is the user ID of "super user" created when Drupal was first installed. If you need to assign a different user, but the user ID is unknown, you can create a pseudofield and use the entity_lookup
plugin again to finds its user ID. Then, use that pseudofield as the default value.
Important: User entities do not have bundles. Do not set the bundle_key
nor bundle
configuration options of the entity_lookup
. Otherwise, you will get the following error: "The entity_lookup plugin found no bundle but destination entity requires one." Files do not have bundles either. For entities that have bundles like nodes and taxonomy terms, those options need to be set in the entity_lookup
plugin.
Using the entity_generate to assign and create taxonomy terms
Now, let's migrate a comma separated list of taxonomy terms. An example value is Apple, Pear, Banana
. The following snippet shows how to configure the entity_generate
plugin to look up taxonomy terms and create them on the fly if they do not exist:
field_tags:
- plugin: skip_on_empty
source: src_fruit_list
method: process
message: 'No src_fruit_list listed.'
- plugin: explode
delimiter: ','
- plugin: callback
callable: trim
- plugin: entity_generate
entity_type: taxonomy_term
value_key: name
bundle_key: vid
bundle: tags
The terms will be assigned to the field_tags
field using a process pipeline of four plugins:
skip_on_empty
will skip the processing of this field if the record does not have asrc_fruit_list
column.explode
will break the string of comma separated files into individual elements.callback
will use thetrim
PHP function to remove any whitespace from the start or end of the taxonomy term name.entity_generate
takes care of finding the taxonomy terms in the system and creating the ones that do not exist.
For a detailed explanation of the skip_on_empty
and explode
plugins see this article. For the callback
plugin see this article. Let's focus on the entity_generate
plugin for now. The field_tags
field expects an array of taxonomy terms IDs (tid
). The source data contains term names so we need to query the database to get the corresponding term IDs. The taxonomy terms that will be referenced were not imported using the Migrate API. And they might exist in the system yet. If that is the case, they should be created on the fly. Therefore, migration_lookup
cannot be used, but entity_generate
can.
The plugin is configured using five keys. entity_type
is set to machine name of the entity to query: taxonomy_term
in this case. value_key
is the name of the entity property to lookup. In Drupal, the taxonomy term names are stored in a property called name
. Usually, you would include a source
that specifies which field from the source contains the lookup value for the name
entity property. In this case it is not necessary to define this configuration option. The lookup value will be passed from the previous plugin in the process pipeline. In this case, the trimmed version of the taxonomy term name.
If, and only if, the entity type has bundles, you also must define two more configuration options: bundle_key
and bundle
. Similar to value_key
and source
, these extra options will become another condition in the query looking for the entities. bundle_key
is the name of the entity property that stores which bundle the entity belongs to. bundle
contains the value of the bundle used to restrict the search. The terminology is a bit confusing, but it boils down to the following. It is possible that the same value exists in multiple bundles of the same entity. So, you must pick one bundle where the lookup operation will be performed. In the case of the taxonomy term entity, the bundles are the vocabularies. Which vocabulary a term belongs to is associated in the vid
entity property. In the example, that is tags
. Let's consider an example term of "Apple". So, this plugin will instruct Drupal to search for a taxonomy term whose name
(term name) is "Apple" that belongs to the "tags" vid
(vocabulary).
What happens if the plugin does not find an entity matching the conditions? It will create one on the fly! It will use the value from the source configuration or from the process pipeline. This value will be used to assign the value_key
entity property for the newly created entity. The entity will be created in the proper bundle as specified by the bundle_key
and bundle
configuration options. In the example, the terms will be created in the tags
vocabulary. It is important to note that values are trimmed to remove whispaces at the start and end of the name. Otherwise, if your source contains spaces after the commas that separate elements, you might end up with terms that seem duplicated like "Apple" and " Apple".
More configuration options
Both entity_lookup
and entity_generate
share the previous configuration options. Additionally, the following options are only available:
ignore_case
contains a boolean value to indicate if the query should be case sensitive or not. It defaults to true.
access_check
contains a boolean value to indicate if the system should check whether the user has access to the entity. It defaults to true.
values
and default_values
apply only to the entity_generate
plugin. You can use them to set fields that could exist in the destination entity. An example configuration is included in the code for the plugin.
One interesting fact about these plugins is that none of the configuration options is required. The source
can be skipped if the value comes from the process pipeline. The rest of the configuration options can be inferred by code introspection. This has some restrictions and assumptions. For example, if you are migrating nodes, the code introspection requires the type
node property defined in the process section. If you do not set one because you define a default_bundle
in the destination section, an error will be produced. Similarly, for entity reference fields it is assumed they point to one bundle only. Otherwise, the system cannot guess which bundle to lookup and an error will be produced. Therefore, always set the entity_type
and value_key
configurations. And for entity types that have bundles, bundle_key
and bundle
must be set as well.
Note: There are various open issues contemplating changes to the configuration options. See this issue and the related ones to keep up to date with any future change.
Compromises and limitations
The entity_lookup
and entity_generate
plugins violate some ETL principles. For example, they query the destination system from the process section. And in the case of entity_generate
it even creates entities from the process section. Ideally, each phase of the ETL process is self contained. That being said, there are valid uses cases to use these plugins and they can you save time when their functionality is needed.
An important limitation of the entity_generate
plugin is that it is not able to clean after itself. That is, if you rollback the migration that calls this plugin, any created entity will remain in the system. This would leave data that is potentially invalid or otherwise never used in Drupal. Those values could leak into the user interface like in autocomplete fields. Ideally, rolling back a migration should delete any data that was created with it.
The recommended way to maintain relationships among entities in a migration project is to have multiple migrations. Then, you use the migration_lookup
plugin to relate them. Throughout the series, several examples have been presented. For example, this article shows how to do taxonomy term migrations.
What did you learn in today's blog post? Did you know how to configure these plugins for entities that do not have bundles? Did you know that reverting a migration does not delete entities created by the entity_generate
plugin? Did you know you can assign fields in the generated entity? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.