Add pending articles

This commit is contained in:
Mauricio Dinarte 2020-10-04 12:38:20 -06:00
parent 9014c67ca6
commit fa717d98e1
23 changed files with 2919 additions and 1 deletions

16
.vscode/settings.json vendored Normal file
View file

@ -0,0 +1,16 @@
{
"files.autoSave": "afterDelay",
"MarkdownPaste.rules": [
{
"regex": "^(?:https?://)?(?:(?:(?:www\\.?)?youtube\\.com(?:/(?:(?:watch\\?.*?v=([^&\\s]+).*)|))?))",
"options": "g",
"replace": "[![](https://img.youtube.com/vi/$1/0.jpg)](https://www.youtube.com/watch?v=$1)"
},
{
"regex": "^(https?://.*)",
"options": "ig",
"replace": "[]($1)"
}
]
}

85
11.txt Normal file
View file

@ -0,0 +1,85 @@
# Migrating users into Drupal - Part 2
Today we complete the user migration example. In the [previous post](https://understanddrupal.com/articles/migrating-users-drupal-part-1), we covered how to migrate email, timezone, username, password, and status. This time, we cover creation date, roles, and profile pictures. The *source*, *destination*, and *dependencies* configurations were explained already. Therefore, we are jumping straight to the *process* transformations in this entry.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD users` whose machine name is `ud_migrations_users`. The two migrations to execute are `udm_user_pictures` and `udm_users`. Notice that both migrations belong to the same module. Refer to this [article](https://understanddrupal.com/articles/writing-your-first-drupal-migration) to learn where the module should be placed.
The example assumes Drupal was installed using the `standard` installation profile. Particularly, we depend on a Picture (`user_picture`) *image* field attached to the *user* entity. The word in parenthesis represents the *machine name* of the image field.
The explanation below is only for the user migration. It depends on a file migration to get the profile pictures. One motivation to have two migrations is for the images to [be deleted if the file migration is rolled back](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow). Note that other techniques exist for migrating images without having to create a separate migration. We have covered two of them in the articles about [subfields](https://understanddrupal.com/articles/migrating-data-drupal-subfields) and [constants and pseudofields](https://understanddrupal.com/articles/using-constants-and-pseudofields-data-placeholders-drupal-migration-process-pipeline).
## Migrating user creation date
Have a look at the previous post for details on the *source* values. For reference, the user creation time is provided by the `member_since` column, and one of the values is `April 4, 2014`. The following snippet shows how the various user date related properties are set:
```
<code class="language-yaml">created:
plugin: format_date
source: member_since
from_format: 'F j, Y'
to_format: 'U'
changed: '@created'
access: '@created'
login: '@created'</code>
```
The `created`, *entity property* stores a [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) of when the user was added to Drupal. The value itself is an integer number representing the number of seconds since the [epoch](https://en.wikipedia.org/wiki/Epoch_\(computing\)). For example, `280299600` represents `Sun, 19 Nov 1978 05:00:00 GMT`. Kudos to the readers who knew this is [Drupal's default `expire` HTTP header](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/lib/Drupal/Core/EventSubscriber/FinishResponseSubscriber.php#L291). Bonus points if you knew it was chosen in honor of [someone's birthdate](https://dri.es/about). ;-)
Back to the migration, you need to transform the provided date from `Month day, year` format to a UNIX timestamp. To do this, you use the [format_date](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21FormatDate.php/class/FormatDate) plugin. The `from_format` is set to `F j, Y` which means your source date consists of:
- The full textual representation of a month: `April`.
- Followed by a space character.
- Followed by the day of the month without leading zeros: `4`.
- Followed by a comma and another space character.
- Followed by the full numeric representation of a year using four digits: `2014`.
If the value of `from_format` does not make sense, you are not alone. It is actually assembled from format characters of the [date](https://www.php.net/manual/en/function.date.php) PHP function. When you need to specify the `from` and `to` formats, you basically need to look at the [documentation](https://www.php.net/manual/en/function.date.php#refsect1-function.date-parameters) and assemble a string that matches the desired date format. You need to pay close attention because upper and lowercase letters represent different things like `Y` and `y` for the year with four-digits versus two-digits respectively. Some date components have subtle variations like `d` and `j` for the day with or without leading zeros respectively. Also, take into account white spaces and date component separators. To finish the plugin configuration, you need to set the `to_format` configuration to something that produces a UNIX timestamp. If you look again at the documentation, you will see that `U` does the job.
The `changed`, `access`, and `login` *entity properties* are also dates in UNIX timestamp format. `changed` indicates when the user account was last updated. `access` indicates when the user last accessed the site. `login` indicated when the user last logged in. For brevity, the same value assigned to `created` is also assigned to these three entity properties. The *at sign* (**@**) means copy the value of a previous mapping in the [process pipeline](https://understanddrupal.com/articles/using-constants-and-pseudofields-data-placeholders-drupal-migration-process-pipeline). If needed, each property can be set to a different value or left unassigned. None is actually required.
## Migrating user roles
For reference, the roles are provided by the `user_roles` column, and one of the values is `forum moderator, forum admin`. It is a comma separated list of roles from the legacy system which need to be mapped to Drupal roles. It is possible that the `user_roles` column is not provided at all in the *source*. The following snippet shows how the roles are set:
```yaml
roles:
- plugin: skip_on_empty
method: process
source: user_roles
- plugin: explode
delimiter: ','
- plugin: callback
callable: trim
- plugin: static_map
map:
'forum admin': administrator
'webmaster': administrator
default_value: null
```
First, the [skip_on_empty](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21SkipOnEmpty.php/class/SkipOnEmpty) plugin is used to skip the processing of the roles if the source column is missing. Then, the [explode](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Explode.php/class/Explode) plugin is used to break the list into an *array* of strings representing the roles. Next, the [callback](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Callback.php/class/Callback) plugin invokes the [trim](https://www.php.net/manual/en/function.trim.php) PHP function to remove any leading or trailing whitespace from the role names. Finally, the [static_map](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21StaticMap.php/class/StaticMap) plugin is used to manually map values from the legacy system to Drupal roles. All of these plugins have been explained previously. Refer to other articles in the series or the plugin [documentation](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) for details on how to use and configure them.
There are some things that are worth mentioning about migrating roles using this particular process pipeline. If the comma separated list includes spaces before or after the role name, you need to trim the value because the static map will perform an equality check. Having extraneous space characters will produce a mismatch.
Also, you do not need to map the `anonymous` or `authenticated` roles. Drupal users are assumed to be `authenticated` and cannot be `anonymous`. Any other role needs to be mapped manually to its *machine name*. You can find the machine name of any role in its edit page. In the example, only two out of four roles are mapped. Any role that is not found in the static map will be assigned the value `null` as indicated in the `default_value` configuration. After processing the `null` value will be ignored, and no role will be assigned. But you could use this feature to assign a default role in case the static map does not produce a match.
## Migrating profile pictures
For reference, the profile picture is provided by the `user_photo` column and one of the values is `P01`. This value corresponds to the unique identifier of one record in the `udm_user_pictures` *file* migration, which is part of the same demo module.  It is important to note that the `user_picture` *field* is **not** a user entity *property*. The *field* is created by the `standard` installation profile and attached to the *user* entity. You can find its configuration in the "Manage fields" tab of the "Account settings" configuration page at `/admin/config/people/accounts`. The following snippet shows how profile pictures are set:
```yaml
user_picture/target_id:
plugin: migration_lookup
migration: udm_user_pictures
source: user_photo
```
*Image* fields are *entity references*. Their `target_id` property needs to be an integer number containing the **file id** (`fid`) of the image. This can be obtained using the [migration_lookup](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21MigrationLookup.php/class/MigrationLookup) plugin. Details on how to configure it can be found in this [article](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal). You could simply use `user_picture` as your field mapping because `target_id` is the [default subfield](https://understanddrupal.com/articles/migrating-data-drupal-subfields) and could be omitted. Also, note that the `alt` subfield is not mapped. If present, its value will be used for the alternative text of the image. But if it is not specified, like in this example, Drupal will automatically [generate an alternative text](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/modules/user/user.module#L411) out of the username. An example value would be: `Profile picture for user michele`.
*Technical note*: The user entity contains other properties you can write to. For a list of available options, check the [baseFieldDefinitions](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/modules/user/src/Entity/User.php#L447)() method of the [User](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/modules/user/src/Entity/User.php) class defining the entity. Note that more properties can be available up in the class hierarchy.
And with that, we wrap up the *user* migration example. We covered how to migrate a user's mail, timezone, username, password, status, creation date, roles, and profile picture. Along the way, we presented various process plugins that had not been used previously in the series. We showed a couple of examples of process plugin chaining to make sure the migrated data is valid and in the format expected by Drupal.
What did you learn in today's blog post? Did you know how to process dates for user entity properties? Have you migrated user roles before? Did you know how to import profile pictures? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

106
12.txt Normal file
View file

@ -0,0 +1,106 @@
# Migrating dates into Drupal
Today we will learn how to migrate **dates** into Drupal. Depending on your field type and configuration, there are various possible combinations. You can store a *single date* or a *date range*. You can store only the *date* component or also include the *time*. You might have *timezones* to take into account. Importing the node creation date requires a slightly different configuration. In addition to the examples, a list of things to consider when migrating dates is also presented.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD date` whose machine name is `ud_migrations_date`. The migration to execute is `udm_date`. Notice that this migration writes to a content type called `UD Date` and to three fields: `field_ud_date`, `field_ud_date_range`, and `field_ud_datetime`. This content type and fields will be created when the module is installed. They will also be removed when the module is uninstalled. The module itself depends on the following modules provided by Drupal core: `datetime`, `datetime_range`, and `migrate`.
*Note*: Configuration placed in a module's `config/install` directory will be copied to Drupal's active configuration. And if those files have a `dependencies/enforced/module` key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created.
## PHP date format characters
To migrate dates, you need to be familiar with the format characters of the [date](https://www.php.net/manual/en/function.date.php) PHP function. Basically, you need to find a pattern that matches the date format you need to migrate to and from. For example, `January 1, 2019` is described by the `F j, Y` pattern.
As mentioned in the [previous post](https://understanddrupal.com/articles/migrating-users-drupal-part-2), you need to pay close attention to how you create the pattern. Upper and lowercase letters represent different things like `Y` and `y` for the year with four-digits versus two-digits, respectively. Some date components have subtle variations like `d` and `j` for the day with or without leading zeros, respectively. Also, take into account white spaces and date component separators. If you need to include a literal letter like `T` it has to be escaped with `\T`. If the pattern is wrong, an error will be raised, and the migration will fail.
## Date format conversions
For date conversions, you use the [format_date](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21FormatDate.php/class/FormatDate) plugin. You specify a `from_format` based on your *source* and a `to_format` based on what Drupal expects. In both cases, you will use the PHP date function's format characters to assemble the required patterns. Optionally, you can define the `from_timezone` and `to_timezone` configurations if conversions are needed. Just like any other migration, you need to understand your *source* format. The following code snippet shows the *source* and *destination* sections:
```yaml
source:
plugin: embedded_data
data_rows:
- unique_id: 1
node_title: 'Date example 1'
node_creation_date: 'January 1, 2019 19:15:30'
src_date: '2019/12/1'
src_date_end: '2019/12/31'
src_datetime: '2019/12/24 19:15:30'
destination:
plugin: 'entity:node'
default_bundle: ud_date
```
## Node creation time migration
The node creation time is migrated using the `created` *entity property*. The source column that contains the data is `node_creation_date`. An example value is `January 1, 2019 19:15:30`. Drupal expects a [UNIX timestamp](https://en.wikipedia.org/wiki/Unix_time) like `1546370130`. The following snippet shows how to do the transformation:
```yaml
created:
plugin: format_date
source: node_creation_date
from_format: 'F j, Y H:i:s'
to_format: 'U'
from_timezone: 'UTC'
to_timezone: 'UTC'
```
Following the documentation, `F j, Y H:i:s` is the `from_format` and `U` is the `to_format`. In the example, it is assumed that the source is provided in `UTC`. UNIX timestamps are expressed in `UTC` as well. Therefore, the `from_timezone` and `to_timezone` are both set to that value. Even though they are the same, it is important to specify both configurations keys. Otherwise, the *from timezone* might be picked from your server's configuration. Refer to the [article](https://understanddrupal.com/articles/migrating-users-drupal-part-2) on user migrations for more details on how to migrate when UNIX timestamps are expected.
## Date only migration
The Date module provided by core offers two storage options. You can store the *date only*, or you can choose to store the *date and time*. First, let's consider a date only field. The *source* column that contains the data is `src_date`. An example value is `2019/12/1`. Drupal expects date only fields to store data in `Y-m-d` format like `2019-12-01`. No timezones are involved in migrating this field. The following snippet shows how to do the transformation.
```yaml
field_ud_date/value:
plugin: format_date
source: src_date
from_format: 'Y/m/j'
to_format: 'Y-m-d'
```
## Date range migration
The Date Range module provided by Drupal core allows you to have a start and an end date in a single field. The `src_date` and `src_date_end` *source* columns contain the start and end date, respectively. This migration is very similar to date only fields. The difference is that you need to import an extra [subfield](https://understanddrupal.com/articles/migrating-data-drupal-subfields) to store the end date. The following snippet shows how to do the transformation:
```yaml
field_ud_date_range/value: '@field_ud_date/value'
field_ud_date_range/end_value:
plugin: format_date
source: src_date_end
from_format: 'Y/m/j'
to_format: 'Y-m-d'
```
The `value` subfield stores the start date. The *source* column used in the example is the same used for the `field_ud_date` field. Drupal uses the same format internally for *date only* and *date range* fields. Considering these two things, it is possible to reuse the `field_ud_date` mapping to set the start date of the `field_ud_date_range` field. To do it, you type the name of the previously mapped field in quotes (') and precede it with an at sign (@). Details on this syntax can be found in the blog post about the [migrate process pipeline](https://understanddrupal.com/articles/using-constants-and-pseudofields-data-placeholders-drupal-migration-process-pipeline). One important detail is that when `field_ud_date` was mapped, the `value` subfield was specified: `field_ud_date/value`. Because of this, when reusing that mapping, you must also specify the subfield: `'@field_ud_date/value'`. The `end_value` subfield stores the end date. The mapping is similar to `field_ud_date` expect that the source column is `src_date_end`.
*Note*: The Date Range module does not come enabled by default. To be able to use it in the example, it is set as a dependency of demo migration module.
## Datetime migration
A *date and time* field stores its value in `Y-m-d\TH:i:s` format. Note it does not include a timezone. Instead, `UTC` is assumed by default. In the example, the source column that contains the data is `src_datetime`. An example value is `2019/12/24 19:15:30`. Let's assume that all dates are provided with a timezone value of `America/Managua`. The following snippet shows how to do the transformation:
```yaml
field_ud_datetime/value:
plugin: format_date
source: src_datetime
from_format: 'Y/m/j H:i:s'
to_format: 'Y-m-d\TH:i:s'
from_timezone: 'America/Managua'
to_timezone: 'UTC'
```
If you need the timezone to be dynamic, things get a bit harder. The 'from_timezone' and 'to_timezone' settings expect a literal value. It is not possible to read a *source* column to set these configurations. An alternative is that your *source* column includes timezone information like `2019/12/24 19:15:30 -07:00`. In that case, you would need to tweak the `from_format` to include the timezone component and leave out the `from_timezone` configuration.
## Things to consider
Date migrations can be tricky because they can be affected by things outside of the Migrate API. Here is a non-exhaustive list of things to consider:
- For *date and time* fields, the transformation might be affected by your server's timezone if you do not manually set the `from_timezone` configuration.
- People might see the date and time according to the preferences in their user profile. That is, two users might see a different value for the same migrated field if their preferred timezones are not the same.
- For *date only* fields, the user might see a time depending on the format used to display them. A list of available formats can be found at `/admin/config/regional/date-time`.
- A field can be configured to be presented in a specific timezone always. This would override the site's timezone and the user's preferred timezone.
What did you learn in today's blog post? Did you know that entity properties and date fields expect different destination formats? Did you know how to do timezone conversions? What challenges have you found when migrating dates and times? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

133
13.txt Normal file
View file

@ -0,0 +1,133 @@
# Migrating addresses into Drupal
Today we will learn how to migrate **addresses** into Drupal. We are going to use the field provided by the [Address module](https://www.drupal.org/project/address) which depends on the third-party library [commerceguys/addressing](https://github.com/commerceguys/addressing). When migrating addresses you need to be careful with the data that Drupal expects. The address components can change per country. The way to store those components also varies per country. These and other important consideration will be explained. Let's get started.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD address` whose machine name is `ud_migrations_address`. The migration to execute is `udm_address`. Notice that this migration writes to a content type called `UD Address` and one field: `field_ud_address`. This content type and field will be created when the module is installed. They will also be removed when the module is uninstalled. The demo module itself depends on the following modules: `address` and `migrate`.
*Note*: Configuration placed in a module's `config/install` directory will be copied to Drupal's active configuration. And if those files have a `dependencies/enforced/module` key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created and deleted.
The recommended way to install the Address module is using composer: `composer require drupal/address`. This will grab the Drupal module **and** the `commerceguys/addressing` library that it depends on. If your Drupal site is not composer-based, an alternative is to use the [Ludwig module](https://www.drupal.org/project/ludwig). Read this [article](https://drupalcommerce.org/blog/49669/installing-commerce-2x-without-composer-ludwig) if you want to learn more about this option. In the example, it is assumed that the module and its dependency were obtained via composer. Also, keep an eye on the [Composer Support in Core Initiative](https://www.drupal.org/about/strategic-initiatives/composer) as they make progress.
## Source and destination sections
The example will migrate three addresses from the following countries: Nicaragua, Germany, and the United States of America (USA). This makes it possible to show how different countries expect different address data. As usual, for any migration you need to understand the source. The following code snippet shows how the *source* and *destination* sections are configured:
```yaml
source:
plugin: embedded_data
data_rows:
- unique_id: 1
first_name: 'Michele'
last_name: 'Metts'
company: 'Agaric LLC'
city: 'Boston'
state: 'MA'
zip: '02111'
country: 'US'
- unique_id: 2
first_name: 'Stefan'
last_name: 'Freudenberg'
company: 'Agaric GmbH'
city: 'Hamburg'
state: ''
zip: '21073'
country: 'DE'
- unique_id: 3
first_name: 'Benjamin'
last_name: 'Melançon'
company: 'Agaric SA'
city: 'Managua'
state: 'Managua'
zip: ''
country: 'NI'
ids:
unique_id:
type: integer
destination:
plugin: 'entity:node'
default_bundle: ud_address
```
Note that not every address component is set for all addresses. For example, the Nicaraguan address does not contain a *ZIP code*. And the German address does not contain a *state*. Also, the Nicaraguan state is fully spelled out: `Managua`. On the contrary, the USA state is a two letter abbreviation: `MA` for `Massachusetts`. One more thing that might not be apparent is that the USA ZIP code belongs to the state of Massachusetts. All of this is important because the module does validation of addresses. The destination is the custom `ud_address` content type created by the module.
## Available subfields
The Address field has 13 [subfields](https://understanddrupal.com/articles/migrating-data-drupal-subfields) available. They can be found in the [schema](https://git.drupalcode.org/project/address/blob/8.x-1.x/src/Plugin/Field/FieldType/AddressItem.php#L36)() method of the [AddresItem](https://git.drupalcode.org/project/address/blob/8.x-1.x/src/Plugin/Field/FieldType/AddressItem.php) class. Fields are not required to have a one-to-one mapping between their schema and the form widgets used for entering content. This is particularly true for addresses because input elements, labels, and validations change dynamically based on the selected country. The following is a reference list of all subfields for addresses:
1. `langcode` for language code.
2. `country_code` for country.
3. `administrative_area` for administrative area (e.g., state or province).
4. `locality` for locality (e.g. city).
5. `dependent_locality` for dependent locality (e.g. neighbourhood).
6. `postal_code` for postal or ZIP code.
7. `sorting_code` for sorting code.
8. `address_line1` for address line 1.
9. `address_line2` for address line 2.
10. `organization` for company.
11. `given_name` for first name.
12. `additional_name` for middle name.
13. `family_name` for last name:
Properly describing an address is not trivial. For example, there are discussions to add a third address line component. Check this [issue](https://www.drupal.org/project/address/issues/2482969) if you need this functionality or would like to participate in the discussion.
## Address subfield mappings
In the example, only 9 out of the 13 subfields will be mapped. The following code snippet shows how to do the *processing* of the address field:
```yaml
field_ud_address/given_name: first_name
field_ud_address/family_name: last_name
field_ud_address/organization: company
field_ud_address/address_line1:
plugin: default_value
default_value: 'It is a secret ;)'
field_ud_address/address_line2:
plugin: default_value
default_value: 'Do not tell anyone :)'
field_ud_address/locality: city
field_ud_address/administrative_area: state
field_ud_address/postal_code: zip
field_ud_address/country_code: country
```
The mapping is relatively simple. You specify a value for each subfield. The tricky part is to know the *name of the subfield* and the *value to store in it*. The format for an address component can change among countries. The easiest way to see what components are expected for each country is to create a node for a content type that has an address field. With this example, you can go to `/node/add/ud_address` and try it yourself. For simplicity sake, let's consider only 3 countries:
- For USA, *city*, *state*, and *ZIP code* are all required. And for state, you have a specific list form which you need to select from.
- For Germany, the *company* is [moved above](https://github.com/google/libaddressinput/issues/83) *first and last name*. The ZIP code label changes to Postal code and it is required. The *city* is also required. It is not possible to set a *state*.
- For Nicaragua, the *Postal code* is optional. The *State* label changes to *Department*. It is required and offers a predefined list to choose from. The *city* is also required.
Pay very close attention. The *available subfields* will depend on the *country*. Also, the form labels change per country or language settings. They do not necessarily match the subfield names. Moreover, the values that you see on the screen might not match what is stored in the database. For example, a Nicaraguan address will store the full *department* name like `Managua`. On the other hand, a USA address will only store a two-letter code for the *state* like `MA` for `Massachusetts`.
Something else that is not apparent even from the user interface is data validation. For example, let's consider that you have a USA address and select `Massachusetts` as the *state*. Entering the *ZIP code* `55111` will produce the following error: `Zip code field is not in the right format.` At first glance, the format is correct, a five-digits code. The real problem is that the Address module is validating if that ZIP code is valid for the **selected state**. It is not valid for Massachusetts. `55111` is a ZIP code for the state of Minnesota which makes the validation fail. Unfortunately, the error message does not indicate that. Nine-digits ZIP codes are accepted as long as they belong to the state that is *selected*.
*Note*: If you are upgrading from Drupal 7, the D8 Address module offers a [process plugin to upgrade](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/contrib-process-plugin-addressfield) from the D7 [Address Field module](https://www.drupal.org/project/addressfield).
## Finding expected values
Values for the same subfield can vary per country. How can you find out which value to use? There are a few ways, but they all require varying levels of technical knowledge or access to resources:
- You can inspect the source code of the address field widget. When the *country* and *state* components are rendered as select input fields (dropdowns), you can have a look at the `value` attribute for the `option` that you want to select. This will contain the two-letter code for *countries*, the two-letter abbreviations for USA *states*, and the fully spelled string for Nicaraguan *departments*.
- You can use the [Devel module](https://www.drupal.org/project/devel). Create a node containing an address. Then use the `devel` tab of the node to inspect how the values are stored. It is not recommended to have the `devel` module in a production site. In fact, do not deploy the code even if the module is not enabled. This approach should only be used in a local development environment. Make sure no module or configuration is committed to the repo nor deployed.
- You can inspect the database. Look for the records in a table named `node__field_[field_machine_name]`, if migrating nodes. First create some example nodes via the user interface and then query the table. You will see how Drupal stores the values in the database.
If you know a better way, please share it in the comments.
## The commerceguys addressing library
With version 8 came many changes in the way Drupal is developed. Now there is an intentional effort to integrate with the greater PHP ecosystem. This involves using already existing libraries and frameworks, like [Symfony](https://symfony.com/). But also, making code written for Drupal available as external libraries that could be used by other projects. `commerceguys\addressing` is one example of a library that was made available as an external library. That being said, the Address module also makes use of it.
Explaining how the library works or where its fetches its database is beyond the scope of this article. Refer to the [library documentation](https://github.com/commerceguys/addressing) for more details on the topic. We are only going to point out some things that are relevant for the migration. For example, the *ZIP code* validation happens at the [validatePostalCode](https://github.com/commerceguys/addressing/blob/50fe323ddc7faaab897759f31d4b29c1a793e5c2/src/Validator/Constraints/AddressFormatConstraintValidator.php#L140)() method of the [AddressFormatConstraintValidator](https://github.com/commerceguys/addressing/blob/50fe323ddc7faaab897759f31d4b29c1a793e5c2/src/Validator/Constraints/AddressFormatConstraintValidator.php) class. There is no need to know this for a migration project. But the key thing to remember is that the migration can be affected by third-party libraries outside of Drupal core or contributed modules. Another example, is the value for the *state* subfield. Address module expects a `subdivision` code as listed in one of the files in the `resources/subdivision` directory.
Does the validation really affect the migration? We have already [mentioned](https://understanddrupal.com/articles/migrating-data-drupal-subfields) that the Migrate API bypasses [Form API](https://api.drupal.org/api/drupal/elements/8.8.x) validations. And that is true for address fields as well. You can migrate a USA address with state `Florida` and ZIP code `55111`. Both are invalid because you need to use the two-letter *state* code `FL` and use a valid *ZIP code* within the state. Notwithstanding, the migration will not fail in this case. In fact, if you visit the migrated node you will see that Drupal happily shows the address with the data that you entered. The problems arrives when you need to **use the address**. If you try to edit the node you will see that the *state* will not be preselected. And if you try to save the node after selecting `Florida` you will get the validation error for the *ZIP code*.
This validation issues might be hard to track because no error will be thrown by the migration. The recommendation is to migrate a sample combination of countries and address components. Then, manually check if editing a node shows the migrated data for all the subfields. Also check that the address passes Form API validations upon saving. This manual testing can save you a lot of time and money down the road. After all, if you have an ecommerce site, you do not want to be shipping your products to wrong or invalid addresses. ;-)
*Technical note*: The `commerceguys/addressing` library actually follows [ISO standards](https://en.wikipedia.org/wiki/International_Organization_for_Standardization). Particularly, [ISO 3166](https://en.wikipedia.org/wiki/ISO_3166) for country and state codes. It also uses [CLDR](http://cldr.unicode.org/) and [Google's address data](https://chromium-i18n.appspot.com/ssl-address). The dataset is stored as part of the library's code in [JSON format](https://github.com/commerceguys/addressing/tree/master/resources).
## Migrating country and zone fields
The Address module offer two more fields types: `Country` and `Zone`. Both have only one subfield `value` which is selected by default. For country, you store the two-letter country code. For zone, you store a *serialized* version of a [Zone object](https://github.com/commerceguys/addressing/blob/50fe323ddc7faaab897759f31d4b29c1a793e5c2/src/Zone/Zone.php).
What did you learn in today's blog post? Have you migrated address before? Did you know the full list of subcomponents available? Did you know that data expectations change per country? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

187
14.txt Normal file
View file

@ -0,0 +1,187 @@
# Introduction to paragraphs migrations in Drupal
Today we will present an introduction to **paragraphs** migrations in Drupal. The example consists of migrating paragraphs of one type, then connecting the migrated paragraphs to nodes. A separate image migration is included to demonstrate how they are different. At the end, we will talk about behavior that deletes paragraphs when the host entity is deleted. Let's get started.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD paragraphs migration introduction` whose machine name is `ud_migrations_paragraph_intro`. It comes with three migrations: `ud_migrations_paragraph_intro_paragraph`, `ud_migrations_paragraph_intro_image`, and `ud_migrations_paragraph_intro_node`. One content type, one paragraph type, and four fields will be created when the module is installed.
The `ud_migrations_paragraph_intro` *only defines the migrations*. But the destination content type, fields, and paragraphs type need to be created as well. That is the job of the `ud_migrations_paragraph_config` module which is a dependency of `ud_migrations_paragraph_intro`. That reason to create the configuration in a separate module is because later articles in the series will make use of the same configuration. So, any example that depends on those content type, fields, and paragraph type can set a dependency on the `ud_migrations_paragraph_config` to make them available.
*Note*: Configuration placed in a module's `config/install` directory will be copied to Drupal's active configuration. And if those files have a `dependencies/enforced/module` key, the configuration will be removed when the listed modules are uninstalled. That is how the content type, the paragraph type, and the fields are automatically created and deleted.
You can get the [Paragraph module](https://www.drupal.org/project/paragraphs) using composer: `composer require drupal/paragraphs`. This will also download its dependency: the [Entity Reference Revisions module](https://www.drupal.org/project/entity_reference_revisions). If your Drupal site is not composer-based, you can get the code for both modules manually.
## Understanding the example set up
The example code creates one *paragraph type* named UD book paragraph (`ud_book_paragraph`). It has two "Text (plain)" *fields*: Title (`field_ud_book_paragraph_title`) and Author (`field_ud_book_paragraph_author`). A new UD Paragraphs (`ud_paragraphs`) content type is also created. This has two *fields*: Image (`field_ud_image`) and Favorite book (`field_ud_favorite_book`) containing references to *images* and *book paragraphs* imported in separate migrations. The words in parenthesis represent the *machine names* of the different elements.
## The paragraph migration
Migrating into a *paragraph type* is very similar to migrating into a *content type*. You specify the *source*, *process* the fields making any required transformation, and set the *destination* entity and bundle. The following code snippet shows the *source*, *process*, and destination *sections*:
```yaml
source:
plugin: embedded_data
data_rows:
- book_id: 'B10'
book_title: 'The definite guide to Drupal 7'
book_author: 'Benjamin Melançon et al.'
- book_id: 'B20'
book_title: 'Understanding Drupal Views'
book_author: 'Carlos Dinarte'
- book_id: 'B30'
book_title: 'Understanding Drupal Migrations'
book_author: 'Mauricio Dinarte'
ids:
book_id:
type: string
process:
field_ud_book_paragraph_title: book_title
field_ud_book_paragraph_author: book_author
destination:
plugin: 'entity_reference_revisions:paragraph'
default_bundle: ud_book_paragraph
```
The most important part of a paragraph migration is setting the *destination plugin* to `entity_reference_revisions:paragraph`. This plugin is actually provided by the Entity Reference Revisions module. It is very important to note that paragraphs entities are **revisioned**. This means that when you want to create a reference to them, you need to provide **two IDs**: `target_id` and `target_revision_id`. Regular entity reference fields like files, images, and taxonomy terms only require the `target_id`. This will be further explained with the node migration.
The other configuration that you can optionally set in the *destination* section is `default_bundle`. The value will be the *machine name* of the *paragraph type* you are migrating into. You can do this when all the paragraphs for a particular migration definition file will be of the same type. If that is not the case, you can leave out the `default_bundle` configuration and add a mapping for the `type` *entity property* in the process section.
You can execute the paragraph migration with this command: `drush migrate:import
ud_migrations_paragraph_intro_paragraph`. After running the migration, there is not much you can do to *verify that it worked*. Contrary to other entities, there is no user interface, available out of the box, that lists all paragraphs in the system. One way to verify if the migration worked is to manually create a [View](https://understanddrupal.com/articles/what-view-drupal-how-do-they-work) that shows paragraphs. Another way is to query the database directly. You can inspect the tables that store the paragraph fields' data. In this example, the tables would be:
- `paragraph__field_ud_book_paragraph_author` for the current author.
- `paragraph__field_ud_book_paragraph_title` for the current title.
- `paragraph_r__8c3a9563ac` for all the author revisions.
- `paragraph_r__3fa7e9863a` for all the title revisions.
Each of those tables contains information about the *bundle* (paragraph type), the *entity id*, the *revision id*, and the migrated *field value*. Table names are derived from the *machine names* of the fields. If they are too long, the field name will be hashed to produce a shorter table name. Having to query the database is not ideal. Unfortunately, the options available to check if a paragraph migration worked are limited at the moment.
## The node migration
The node migration will serve as the *host* for both referenced entities: *images* and *paragraphs*. The image migration is very similar to the one explained in a [previous article](https://understanddrupal.com/articles/migrating-files-and-images-drupal). This time, the focus will be the *paragraph* migration. Both of them are set as dependencies of the node migration, so they need to be executed in advance. The following snippet shows how the *source*, *destinations*, and *dependencies* are set:
```yaml
source:
plugin: embedded_data
data_rows:
- unique_id: 1
name: 'Michele Metts'
photo_file: 'P01'
book_ref: 'B10'
- unique_id: 2
name: 'Benjamin Melançon'
photo_file: 'P02'
book_ref: 'B20'
- unique_id: 3
name: 'Stefan Freudenberg'
photo_file: 'P03'
book_ref: 'B30'
ids:
unique_id:
type: integer
destination:
plugin: 'entity:node'
default_bundle: ud_paragraphs
migration_dependencies:
required:
- ud_migrations_paragraph_intro_image
- ud_migrations_paragraph_intro_paragraph
optional: []
```
Note that `photo_file` and `book_ref` both contain the *unique identifier* of records in the *image* and *paragraph* migrations, respectively. These can be used with the `migration_lookup` plugin to map the reference fields in the nodes to be migrated. `ud_paragraphs` is the *machine name* of the target content type.
The mapping of the image reference field follows the same pattern than the one explained in the article on [migration dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal). Using the [migration_lookup](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21MigrationLookup.php/class/MigrationLookup) plugin, you indicate which is the migration that should be searched for the images. You also specify which source column contains the *unique identifiers* that match those in the *image* migration. This operation will return **a single value**: the *file ID* (`fid`) of the image. This value can be assigned to the `target_id` [subfield](https://understanddrupal.com/articles/migrating-data-drupal-subfields) of `field_ud_image` to establish the relationship. The following code snippet shows how to do it:
```yaml
field_ud_image/target_id:
plugin: migration_lookup
migration: ud_migrations_paragraph_intro_image
source: photo_file
```
## Paragraph field mappings
Before diving into the paragraph field mapping, let's think about what needs to be done. Paragraphs are **revisioned entities**. To make a reference to them, you need **two IDs**: their *entity id* and their *entity revision id*. These two values need to be assigned to two [subfields](https://understanddrupal.com/articles/migrating-data-drupal-subfields) of the paragraph reference field: `target_id` and `target_revision_id` respectively. You have to come up with a process pipeline that complies with this requirement. There are many ways to do it, and the specifics will depend on your field configuration. In this example, the paragraph reference field allows an unlimited number of paragraphs to be associated, but only of one type: `ud_book_paragraph`. Another thing to note is that even though the field allows you to add as many paragraphs as you want, *the example migrates exactly one paragraph*.
With those considerations in mind, the mapping of the paragraph field will be a two step process. First, use the `migration_lookup` plugin to get a reference to the paragraph. Second, use the fetched **values** to set the paragraph reference subfields. The following code snippet shows how to do it:
```yaml
pseudo_mbe_book_paragraph:
plugin: migration_lookup
migration: ud_migrations_paragraph_intro_paragraph
source: book_ref
field_ud_favorite_book:
plugin: sub_process
source:
- '@pseudo_mbe_book_paragraph'
process:
target_id: '0'
target_revision_id: '1'
```
The first step is a normal `migration_lookup` procedure. The important difference is that instead of getting a single value, like with images, the paragraph lookup operation will return **an array of two values**. The format is like `[3, 7]` where the `3` represents the *entity id* and the `7` represents the entity *revision id* of the paragraph. Note that **the array keys are not named**. To access those values, you would use the **index** of the elements starting with zero (**0**). This will be important later. The returned array is stored in the `pseudo_mbe_book_paragraph` [pseudofield](https://understanddrupal.com/articles/using-constants-and-pseudofields-data-placeholders-drupal-migration-process-pipeline).
The second step is to set the `target_id` and `target_revision_id` subfields. In this example, `field_ud_favorite_book` is the *machine name* paragraph reference field. Remember that it is configured to accept an arbitrary number of paragraphs, and each will require passing an array of two elements. This means you need to *process an array of arrays*. To do that, you use the [sub_process](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21SubProcess.php/class/SubProcess) plugin to iterate over *an array of paragraph references*. In this example, the structure to iterate over would be like this:
```
[
[3, 7]
]
```
Let's dissect how to do the mapping of the paragraph reference field. The `source` configuration of the `sub_process` plugin contains *an array of paragraph references*. In the example, that array has a single element: the `'@pseudo_mbe_book_paragraph'` pseudofield. The *quotes* (**'**) and *at sign* (**@**) are required to reuse an element that appears before in the process pipeline. Then, in the `process` configuration, you set the subfields for the paragraph reference field. It is worth noting that at this point you are iterating over a list of paragraph references, even if that list contains only one element. If you had more than one paragraph to migrate, whatever you defined in `process` will apply to all of them.
The `process` configuration is an array of subfield mappings. The left side of the assignment is the **name of the subfield** you want to set. The right side of the assignment is **an array index** of the paragraph reference being processed. Remember that this array *does not have named-keys* so, you use their *numerical index* to refer to them. The example sets the `target_id` subfield to the element in the `0` index and the `target_revision_id` subfield to the element in the one `1` index. Using the example data, this would be `target_id: 3` and `target_revision_id: 7`. *The quotes around the numerical indexes are important*. If not used, the migration will not find the indexes and the paragraphs will not be associated. The end result of this operation will be something like this:
```
'field_ud_favorite_book' => array (1) [
array (2) [
'target_id' => string (1) "3"
'target_revision_id' => string (1) "7"
]
]
```
There are three ways to run the migrations: manually, [executing dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal), and [using tags](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal). The following code snippet shows the three options:
```console
# 1) Manually.
$ drush migrate:import ud_migrations_paragraph_intro_image
$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node
# 2) Executing depenpencies.
$ drush migrate:import ud_migrations_paragraph_intro_node --execute-dependencies
# 3) Using tags.
$ drush migrate:import --tag='UD Paragraphs Intro'
```
And that is *one way* to map paragraph reference fields. In the end, all you have to do is set the `target_id` and `target_revision_id` subfields. The process pipeline that gets you to that point can vary depending on how your paragraphs are configured. The following is a non-exhaustive list of things to consider when migrating paragraphs:
- How many paragraphs types can be referenced?
- How many paragraphs instances are being migrated? Is this a multivalue field?
- Do paragraphs have translations?
- Do paragraphs have revisions?
## Do migrated paragraphs disappear upon node rollback?
Paragraphs migrations are affected by a particular behavior of *revisioned entities*. If the host entity is deleted, and the paragraphs do not have translations, the whole paragraph gets deleted. That means that deleting a node will make the referenced paragraphs' data to be removed. How does this affect your [migration workflow](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow)? If the migration of the host entity is rollback, then the paragraphs will be removed, the migrate API will not know about it. In this example, if you run a migrate status command after rolling back the *node* migration, you will see that the *paragraph* migration indicated that there are no pending elements to process. The *file* migration for the images will report the same, but in that case, the images will remain on the system.
In any migration project, it is common that you do rollback operations to test new field mappings or fix errors. Thus, chances are very high that you will stumble upon this behavior. Thanks to [Damien McKenna](https://www.drupal.org/u/damienmckenna) for helping me understand this behavior and tracking it to the [rollback](https://git.drupalcode.org/project/entity_reference_revisions/blob/8.x-1.x/src/Plugin/migrate/destination/EntityReferenceRevisions.php#L150)() method of the [EntityReferenceRevisions](https://git.drupalcode.org/project/entity_reference_revisions/blob/8.x-1.x/src/Plugin/migrate/destination/EntityReferenceRevisions.php) destination plugin. So, what do you do to recover the deleted paragraphs? You have to rollback *both* migrations: *node* and *paragraph*. And then, you have to import the two again. The following snippet shows how to do it:
```console
# 1) Rollback both migrations.
$ drush migrate:rollback ud_migrations_paragraph_intro_node
$ drush migrate:rollback ud_migrations_paragraph_intro_paragraph
# 2) Import both migrations againg.
$ drush migrate:import ud_migrations_paragraph_intro_paragraph
$ drush migrate:import ud_migrations_paragraph_intro_node
```
What did you learn in today's blog post? Have you migrated paragraphs before? If so, what challenges have you found? Did you know paragraph reference fields require two subfields to be set? Did you that deleting the host entity also deletes referenced paragraphs? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

163
15.txt Normal file
View file

@ -0,0 +1,163 @@
# Migrating CSV files into Drupal
Today we will learn how to migrate content from a **Comma-Separated Values (CSV) file** into Drupal. We are going to use the latest version of the [Migrate Source CSV module](https://www.drupal.org/project/migrate_source_csv) which depends on the third-party library [league/csv](https://github.com/thephpleague/csv). We will show how configure the source plugin to read files with or without a header row. We will also talk about a new feature that allows you to use stream wrappers to set the file location. Let's get started.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD CSV source migration` whose machine name is `ud_migrations_csv_source`. It comes with three migrations: `udm_csv_source_paragraph`, `udm_csv_source_image`, and `udm_csv_source_node`.
You can get the Migrate Source CSV module is using [composer](https://getcomposer.org/): `composer require drupal/migrate_source_csv`. This will also download its dependency: the `league/csv` library. The example assumes you are using `8.x-3.x` branch of the module, which requires composer to be installed. If your Drupal site is not composer-based, you can use the `8.x-2.x` branch. Continue reading to learn the difference between the two branches.
## Understanding the example set up
This migration will reuse the same configuration from the [introduction to paragraph migrations](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal) example. Refer to that article for details on the configuration. Today, only the parts relevant to the CSV migration will be explained. The destinations will be same content type, paragraph type, and fields. The end result will be nodes containing an image and a paragraph with information about someone's favorite book. The major difference is that we are going to read from CSV files.
Note that you can literally swap migration sources *without changing any other part of the migration*.  This is a powerful feature of [ETL frameworks](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) like Drupal's Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some *machine names* had to be changed to avoid conflicts with other examples in the demo repository.
## Migrating CSV files with a header row
In any migration project, understanding the source is very important. For CSV migrations, the primary thing to consider is whether or not the file contains a *row of headers*. Other things to consider are what characters to use as *delimiter*, *enclosure*, and *escape character*. For now, let's consider the following CSV file whose first row serves as column headers:
| unique_id | name | photo_file | book_ref |
|:-|:-|:-|:-|
| 1 | Michele Metts | P01 | B10 |
| 2 | Benjamin Melançon | P02 | B20 |
| 3 | Stefan Freudenberg | P03 | B30 |
This file will be used in the *node* migration. The four columns are used as follows:
- `unique_id` is the unique identifier for each record in this CSV file.
- `name` is the name of a person. This will be used as the node title.
- `photo_file` is the unique identifier of an image that was created in a separate migration.
- `book_ref` is the unique identifier of a book paragraph that was created in a separate migration.
The following snippet shows the configuration of the CSV *source* plugin for the *node* migration:
```yaml
source:
plugin: csv
path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_people.csv
ids: [unique_id]
```
The name of the plugin is `csv`. Then you define the `path` pointing to the file itself. In this case, the path is *relative to the Drupal root*. Finally, you specify an `ids` array of columns names that would uniquely identify each record. As already stated, the `unique_id` column serves that purpose. Note that there is no need to specify all the column names from the CSV file. The plugin will automatically make them available. That is the simplest configuration of the CSV source plugin.
The following snippet shows part of the *process*, *destination*, and *dependencies* configuration of the *node* migration:
```yaml
process:
field_ud_image/target_id:
plugin: migration_lookup
migration: udm_csv_source_image
source: photo_file
destination:
plugin: 'entity:node'
default_bundle: ud_paragraphs
migration_dependencies:
required:
- udm_csv_source_image
- udm_csv_source_paragraph
optional: []
```
Note that the `source` for the setting the image reference is `photo_file`. In the process pipeline you can directly use any *column name* that exists in the CSV file. The configuration of the migration lookup plugin and dependencies point to two CSV migrations that come with this example. One is for migrating *images* and the other for migrating *paragraphs*.
## Migrating CSV files without a header row
Now let's consider two examples of CSV files that do not have a header row. The following snippets show the example CSV file and *source* plugin configuration for the *paragraph* migration:
| | | |
|-|-|-|-|
| B10 | The definite guide to Drupal 7 | Benjamin Melançon et al. |
| B20 | Understanding Drupal Views | Carlos Dinarte |
| B30 | Understanding Drupal Migrations | Mauricio Dinarte |
```yaml
source:
plugin: csv
path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_book_paragraph.csv
ids: [book_id]
header_offset: null
fields:
- name: book_id
- name: book_title
- name: 'Book author'
```
When you do not have a header row, you need to specify two more configuration options. `header_offset` has to be set to `null`. `fields` has to be set to an array where each element represents a column in the CSV file. You include a `name` for each column following the order in which they appear in the file. The name itself can be arbitrary. If it contained *spaces*, you need to put *quotes* (**'**) around it. After that, you set the `ids` configuration to one or more columns using the names you defined.
In the *process* section you refer to source columns as usual. You write their `name` adding quotes if it contained *spaces*. The following snippet shows how the *process* section is configured for the *paragraph* migration:
```yaml
process:
field_ud_book_paragraph_title: book_title
field_ud_book_paragraph_author: 'Book author'
```
The final example will show a slight variation of the previous configuration. The following two snippets show the example CSV file and *source* plugin configuration for the *image* migration:
| | |
|-|-|-|
| P01 | https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg |
| P02 | https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg |
| P03 | https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg |
```yaml
source:
plugin: csv
path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv
ids: [photo_id]
header_offset: null
fields:
- name: photo_id
label: 'Photo ID'
- name: photo_url
label: 'Photo URL'
```
For each column defined in the `fields` configuration, you can optionally set a `label`. This is a description used when presenting details about the migration. For example, in the user interface provided by the Migrate Tools module. When defined, you **do not use** the *label* to refer to source columns. You keep using the column *name*. You can see this in the value of the `ids` configuration.
The following snippet shows part of the *process* configuration of the *image* migration:
```yaml
process:
psf_destination_filename:
plugin: callback
callable: basename
source: photo_url
```
## CSV file location
When setting the `path` configuration you have three options to indicate the CSV file location:
- Use a *relative path* from the **Drupal root**. The path *should not start* with a *slash* (**/**). This is the approach used in this demo. For example, `modules/custom/my_module/csv_files/example.csv`.
- Use an *absolute path* pointing to the CSV location in the file system. The path should start with a *slash* (**/**). For example, `/var/www/drupal/modules/custom/my_module/csv_files/example.csv`.
- Use a [stream wrapper](https://api.drupal.org/api/drupal/namespace/Drupal!Core!StreamWrapper/8.8.x). This feature was introduced in the 8.x-3.x branch of the module. Previous versions cannot make use of them.
Being able to use stream wrappers gives you many options for setting the location to the CSV file. For instance:
- Files located in the [public](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21PublicStream.php/class/PublicStream/8.8.x), [private](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21PrivateStream.php/class/PrivateStream/8.8.x), and [temporary](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21TemporaryStream.php/class/TemporaryStream/8.8.x) file systems managed by Drupal. This leverages functionality already available in Drupal core. For example: `public://csv_files/example.csv`.
- Files located in profiles, modules, and themes. You can use the [System stream wrapper module](https://www.drupal.org/project/system_stream_wrapper) or [apply](https://www.drupal.org/patch/apply) this [core patch](https://www.drupal.org/project/drupal/issues/1308152) to get this functionality. For example, `module://my_module/csv_files/example.csv`.
- Files located in remote servers including RSS feeds. You can use the [Remote stream wrapper module](https://www.drupal.org/project/remote_stream_wrapper) to get this functionality. For example, `https://understanddrupal.com/csv-files/example.csv`.
## CSV source plugin configuration
The configuration options for the CSV source plugin are very well documented in the [source code](https://git.drupalcode.org/project/migrate_source_csv/blob/8.x-3.x/src/Plugin/migrate/source/CSV.php#L14). They are included here for quick reference:
- `path` is required. It contains the path to the CSV file. Starting with the 8.x-3.x branch, stream wrappers are supported.
- `ids` is required. It contains an array of column names that uniquely identify each record.
- `header_offset` is optional. The index of record to be used as the CSV header and the thereby each record's field name. It defaults to zero (`0`) because the index is zero-based. For CSV files with no header row the value should be set to `null`.
- `fields` is optional. It contains a nested array of names and labels to use instead of a header row. If set, it will overwrite the column names obtained from `header_offset`.
- `delimiter` is optional. It contains *one character* used as column delimiter. It defaults to a *comma* (**,**). For example, if your file uses *tabs* as delimiter, you set this configuration to `\t`.
- `enclosure` is optional. It contains *one character* used to enclose the column values. Defaults to *double quotation marks* (**"**).
- `escape` is optional. It contains *one character* used for character escaping in the column values. It defaults to a *backslash* (**\**).
**Important**: The configuration options changed significantly between the 8.x-3.x and 8.x-2.x branches. Refer to this [change record](https://www.drupal.org/node/3060246) for a reference of how to configure the plugin for the 8.x-2.x.
And that is how you can use CSV files as the source of your migrations. Because this is such a common need, it was considered to move the CSV source plugin to Drupal core. The effort is currently on hold and it is unclear if it will materialize during Drupal 8's lifecycle. The maintainers of the Migrate API are focusing their efforts on other [priorities](https://www.drupal.org/core/roadmap#migrate) at the moment. You can read this [issue](https://www.drupal.org/node/2931739) to learn about the motivation and context for offering functionality in Drupal core.
*Note*: The [Migrate Spreadsheet module](https://www.drupal.org/project/migrate_spreadsheet) can also be used to migrate data from CSV files. It also supports Microsoft Office Excel and LibreOffice Calc (OpenDocument) files. The module leverages the [PhpOffice/PhpSpreadsheet](https://github.com/PHPOffice/PhpSpreadsheet) library.
What did you learn in today's blog post? Have you migrated from CSV files before? Did you know that it is now possible to read files using stream wrappers? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

291
16.txt Normal file
View file

@ -0,0 +1,291 @@
# Migrating JSON files into Drupal
Today we will learn how to migrate content from a **JSON file** into Drupal using the [Migrate Plus module](https://www.drupal.org/project/migrate_plus). We will show how to configure the migration to read files from the *local file system* and *remote locations*. The example includes *node*, *images*, and *paragraphs* migrations. Let's get started.
*Note*: Migrate Plus has many more features. For example, it contains source plugins to import from [XML files](https://understanddrupal.com/articles/migrating-xml-files-drupal) and SOAP endpoints. It provides many useful process plugins for DOM manipulation, string replacement, transliteration, etc. The module also lets you define migration plugins as configurations and create groups to share settings. It offers a custom event to modify the source data before processing begins. In today's blog post, we are focusing on importing JSON files. Other features will be covered in future entries.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD JSON source migration` whose machine name is `ud_migrations_json_source`. It comes with four migrations: `udm_json_source_paragraph`, `udm_json_source_image`, `udm_json_source_node_local`, and `udm_json_source_node_remote`.
You can get the Migrate Plus module using [composer](https://getcomposer.org/): `composer require 'drupal/migrate_plus:^5.0'`. This will install the `8.x-5.x` branch, where new development will happen. This branch was created to introduce breaking changes in preparation for Drupal 9\. As of this writing, the `8.x-4.x` branch has feature parity with the newer branch. If your Drupal site is not composer-based, you can download the module manually.
## Understanding the example set up
This migration will reuse the same configuration from the [introduction to paragraph migrations](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal) example. Refer to that article for details on the configuration: the destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain JSON migrations. The end result will again be nodes containing an image and a paragraph with information about someone's favorite book. The major difference is that we are going to read from JSON. In fact, three of the migrations will read from the same file. The following snippet shows a reduced version of the file to get a sense of its structure:
```json
{
"data": {
"udm_people": [
{
"unique_id": 1,
"name": "Michele Metts",
"photo_file": "P01",
"book_ref": "B10"
},
{...},
{...}
],
"udm_book_paragraph": [
{
"book_id": "B10",
"book_details": {
"title": "The definite guide to Drupal 7",
"author": "Benjamin Melançon et al."
}
},
{...},
{...}
],
"udm_photos": [
{
"photo_id": "P01",
"photo_url": "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg",
"photo_dimensions": [240, 351]
},
{...},
{...}
]
}
}
```
*Note*: You can literally swap migration sources *without changing any other part of the migration*.  This is a powerful feature of [ETL frameworks](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) like Drupal's Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some *machine names* had to be changed to avoid conflicts with other examples in the demo repository.
## Migrating nodes from a JSON file
In any migration project, understanding the *source* is very important. For JSON migrations, there are two major considerations. First, where in the file hierarchy lies the data that you want to import. It can be at the root of the file or several levels deep in the hierarchy. Second, when you get to the *array of records* that you want to import, what fields are going to be made available to the migration. It is possible that each record contains more data than needed. For improved performance, it is recommended to manually include only the fields that will be required for the migration. The following code snippet shows part of the *local* JSON file relevant to the *node* migration:
```json
{
"data": {
"udm_people": [
{
"unique_id": 1,
"name": "Michele Metts",
"photo_file": "P01",
"book_ref": "B10"
},
{...},
{...}
]
}
}
```
The array of records containing node data lies two levels deep in the hierarchy. Starting with `data` at the root and then descending one level to `udm_people`. Each element of this array is an object with four properties:
- `unique_id` is the *unique identifier* for each record **within** the `/data/udm_people` hierarchy.
- `name` is the name of a person. This will be used in the node title.
- `photo_file` is the *unique identifier* of an image that was created in a separate migration.
- `book_ref` is the *unique identifier* of a book paragraph that was created in a separate migration.
The following snippet shows the configuration to read a *local* JSON file for the *node* migration:
```yaml
source:
plugin: url
data_fetcher_plugin: file
data_parser_plugin: json
urls:
- modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json
item_selector: /data/udm_people
fields:
- name: src_unique_id
label: 'Unique ID'
selector: unique_id
- name: src_name
label: 'Name'
selector: name
- name: src_photo_file
label: 'Photo ID'
selector: photo_file
- name: src_book_ref
label: 'Book paragraph ID'
selector: book_ref
ids:
src_unique_id:
type: integer
```
The name of the plugin is `url`. Because we are reading a local file, the `data_fetcher_plugin`  is set to `file` and the `data_parser_plugin` to `json`. The `urls` configuration contains an array of file paths *relative to the Drupal root*. In the example we are reading from one file only, but you can read from multiple files at once. In that case, it is important that they have a homogeneous structure. The settings that follow will apply equally to all the files listed in `urls`.
The `item_selector` configuration indicates where in the JSON file lies the *array of records* to be migrated. Its value is an [XPath](https://en.wikipedia.org/wiki/XPath)-like string used to traverse the file hierarchy. In this case, the value is `/data/udm_people`. Note that you separate each level in the hierarchy with a *slash* (**/**).
`fields` has to be set to an *array*. Each element represents a field that will be made available to the migration. The following options can be set:
- `name` is required. This is how the field is going to be referenced in the migration. The name itself can be arbitrary. If it contains spaces, you need to put *double quotation marks* (**"**) around it when referring to it in the migration.
- `label` is optional. This is a description used when presenting details about the migration. For example, in the user interface provided by the [Migrate Tools module](https://www.drupal.org/project/migrate_tools). When defined, you **do not use** the *label* to refer to the field. Keep using the *name*.
- `selector` is required. This is another XPath-like string to find the field to import. The value must be relative to the location specified by the `item_selector` configuration. In the example, the fields are direct children of the records to migrate. Therefore, only the property name is specified (e.g., `unique_id`). If you had nested objects or arrays, you would use a *slash* (**/**) character to go deeper in the hierarchy. This will be demonstrated in the *image* and *paragraph* migrations.
Finally, you specify an `ids` *array* of field *names* that would uniquely identify each record. As already stated, the `src_unique_id` field serves that purpose. The following snippet shows part of the *process*, *destination*, and *dependencies* configuration of the node migration:
```yaml
process:
field_ud_image/target_id:
plugin: migration_lookup
migration: udm_json_source_image
source: src_photo_file
destination:
plugin: 'entity:node'
default_bundle: ud_paragraphs
migration_dependencies:
required:
- udm_json_source_image
- udm_json_source_paragraph
optional: []
```
The `source` for the setting the image reference is `src_photo_file`. Again, this is the `name` of the field, not the `label` nor `selector`. The configuration of the migration lookup plugin and dependencies point to two JSON migrations that come with this example. One is for migrating *images* and the other for migrating *paragraphs*.
## Migrating paragraphs from a JSON file
Let's consider an example where the records to migrate have many levels of nesting. The following snippets show part of the *local* JSON file and *source* plugin configuration for the *paragraph* migration:
```json
{
"data": {
"udm_book_paragraph": [
{
"book_id": "B10",
"book_details": {
"title": "The definite guide to Drupal 7",
"author": "Benjamin Melançon et al."
}
},
{...},
{...}
]
}
```
```yaml
source:
plugin: url
data_fetcher_plugin: file
data_parser_plugin: json
urls:
- modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json
item_selector: /data/udm_book_paragraph
fields:
- name: src_book_id
label: 'Book ID'
selector: book_id
- name: src_book_title
label: 'Title'
selector: book_details/title
- name: src_book_author
label: 'Author'
selector: book_details/author
ids:
src_book_id:
type: string
```
The `plugin`, `data_fetcher_plugin`, `data_parser_plugin` and `urls` configurations have the same values as in the *node* migration. The `item_selector` and `ids` configurations are slightly different to represent the path to *paragraph* records and the unique identifier field, respectively.
The interesting part is the value of the `fields` configuration. Taking `/data/udm_book_paragraph` as a starting point, the records with *paragraph* data have a *nested structure*. Particularly, `book_details` is an object with two properties: `title` and `author`. To refer to them, the selectors are `book_details/title` and `book_details/author`, respectively. Note that you can go as many level deeps in the hierarchy to find the value that should be assigned to the field. Every level in the hierarchy would be separated by a *slash* (**/**).
In this example, the target is a single paragraph type. But a similar technique can be used to migrate multiple types. One way to configure the JSON file is having two properties. `paragraph_id` would contain the *unique identifier* for the record. `paragraph_data` would be an object with a property to set the paragraph type. This would also have an arbitrary number of extra properties with the data to be migrated. In the *process* section, you would iterate over the records to map the paragraph fields.
The following snippet shows part of the *process* configuration of the *paragraph* migration:
```yaml
process:
field_ud_book_paragraph_title: src_book_title
field_ud_book_paragraph_author: src_book_author
```
## Migrating images from a JSON file
Let's consider an example where the records to migrate have *more data than needed*. The following snippets show part of the *local* JSON file and *source* plugin configuration for the *image* migration:
```json
{
"data": {
"udm_photos": [
{
"photo_id": "P01",
"photo_url": "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg",
"photo_dimensions": [240, 351]
},
{...},
{...}
]
}
}
```
```yaml
source:
plugin: url
data_fetcher_plugin: file
data_parser_plugin: json
urls:
- modules/custom/ud_migrations/ud_migrations_json_source/sources/udm_data.json
item_selector: /data/udm_photos
fields:
- name: src_photo_id
label: 'Photo ID'
selector: photo_id
- name: src_photo_url
label: 'Photo URL'
selector: photo_url
ids:
src_photo_id:
type: string
```
The `plugin`, `data_fetcher_plugin`, `data_parser_plugin` and `urls` configurations have the same values as in the *node* migration. The `item_selector` and `ids` configurations are slightly different to represent the path to *image* records and the unique identifier field, respectively.
The interesting part is the value of the `fields` configuration. Taking `/data/udm_photos` as a starting point, the records with image data have extra properties that are not used in the migration. Particularly, the `photo_dimensions` property contains an array with two values representing the width and height of the image, respectively. To ignore this property, you simply omit it from the `fields` configuration. In case you wanted to use it, the selectors would be `photo_dimensions/0` for the width and `photo_dimensions/1` for the height. Note that you use a *zero-based numerical index* to get the values out of arrays. Like with objects, a *slash* (**/**) is used to separate each level in the hierarchy. You can go as far as necessary in the hierarchy.
The following snippet shows part of the *process* configuration of the *image* migration:
```yaml
process:
psf_destination_filename:
plugin: callback
callable: basename
source: src_photo_url
```
## JSON file location
**Important**: What is described in this section **only applies** when you use the `file` data fetcher plugin.
When using the `file` data fetcher plugin, you have three options to indicate the location to the JSON files in the `urls` configuration:
- Use a *relative path* from the **Drupal root**. The path *should not start* with a *slash* (**/**). This is the approach used in this demo. For example, `modules/custom/my_module/json_files/example.json`.
- Use an *absolute path* pointing to the JSON location in the file system. The path *should start* with a *slash* (**/**). For example, `/var/www/drupal/modules/custom/my_module/json_files/example.json`.
- Use a *fully-qualified URL* to any [built-in wrapper](https://www.php.net/manual/en/wrappers.php) like `http`, `https`, `ftp`, `ftps`, etc. For example, `https://understanddrupal.com/json-files/example.json`.
- Use a [custom stream wrapper](https://api.drupal.org/api/drupal/namespace/Drupal!Core!StreamWrapper/8.8.x).
Being able to use stream wrappers gives you many more options. For instance:
- Files located in the [public](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21PublicStream.php/class/PublicStream/8.8.x), [private](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21PrivateStream.php/class/PrivateStream/8.8.x), and [temporary](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21TemporaryStream.php/class/TemporaryStream/8.8.x) file systems managed by Drupal. This leveragers functionality already available in Drupal core. For example: `public://json_files/example.json`.
- Files located in profiles, modules, and themes. You can use the [System stream wrapper module](https://www.drupal.org/project/system_stream_wrapper) or [apply](https://www.drupal.org/patch/apply) this [core patch](https://www.drupal.org/project/drupal/issues/1308152) to get this functionality. For example, `module://my_module/json_files/example.json`.
- Files located in [AWS Amazon S3](https://aws.amazon.com/s3/). You can use the [S3 File System module](https://www.drupal.org/project/s3fs) along with the [S3FS File Proxy to S3 module](https://www.drupal.org/project/s3fs_file_proxy_to_s3) to get this functionality.
## Migrating remote JSON files
**Important**: What is described in this section **only applies** when you use the `http` data fetcher plugin.
Migrare Plus provides another data fetcher plugin named `http`. Under the hood, it uses the [Guzzle HTTP Client](https://github.com/guzzle/guzzle) library. You can use it to fetch files using any [protocol supported](https://curl.haxx.se/libcurl/c/CURLOPT_PROTOCOLS.html) by [curl](https://curl.haxx.se/libcurl/) like `http`, `https`, `ftp`, `ftps`, `sftp`, etc. In a future blog post we will explain this data fetcher in more detail. For now, the `udm_json_source_node_remote` migration demonstrates a basic setup for this plugin. Note that only the `data_fetcher_plugin` and `urls` configurations are different from the local file example. The following snippet shows part of the configuration to read a *remote* JSON file for the *node* migration:
```yaml
source:
plugin: url
data_fetcher_plugin: http
data_parser_plugin: json
urls:
- https://api.myjson.com/bins/110rcr
item_selector: /data/udm_people
fields: ...
ids: ...
```
And that is how you can use JSON files as the *source* of your migrations. Many more configurations are possible. For example, you can provide authentication information to get access to protected resources. You can also set custom HTTP headers. Examples will be presented in a future entry.
What did you learn in today's blog post? Have you migrated from JSON files before? If so, what challenges have you found? Did you know that you can read local and remote files? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

328
17.txt Normal file
View file

@ -0,0 +1,328 @@
# Migrating XML files into Drupal
Today we will learn how to migrate content from a **XML file** into Drupal using the [Migrate Plus module](https://www.drupal.org/project/migrate_plus). We will show how to configure the migration to read files from the *local file system* and *remote locations*. We will also talk about the difference between two data parsers provided the module. The example includes *node*, *images*, and *paragraphs* migrations. Let's get started.
*Note*: Migrate Plus has many more features. For example, it contains source plugins to import from [JSON files](https://understanddrupal.com/articles/migrating-json-files-drupal) and SOAP endpoints. It provides many useful process plugins for DOM manipulation, string replacement, transliteration, etc. The module also lets you define migration plugins as configurations and create groups to share settings. It offers a custom event to modify the source data before processing begins. In today's blog post, we are focusing on importing XML files. Other features will be covered in future entries.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD XML source migration` whose machine name is `ud_migrations_xml_source`. It comes with four migrations: `udm_xml_source_paragraph`, `udm_xml_source_image`, `udm_xml_source_node_local`, and `udm_xml_source_node_remote`.
You can get the Migrate Plus module using [composer](https://getcomposer.org/): `composer require 'drupal/migrate_plus:^5.0'`. This will install the `8.x-5.x` branch, where new development will happen. This branch was created to introduce breaking changes in preparation for Drupal 9\. As of this writing, the `8.x-4.x` branch has feature parity with the newer branch. If your Drupal site is not composer-based, you can download the module manually.
## Understanding the example set up
This migration will reuse the same configuration from the [introduction to paragraph migrations](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal) example. Refer to that article for details on the configuration: the destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain XML migrations. The end result will again be nodes containing an image and a paragraph with information about someone's favorite book. The major difference is that we are going to read from XML. In fact, three of the migrations will read from the same file. The following snippet shows a reduced version of the file to get a sense of its structure:
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<data>
<udm_people>
<unique_id>1</unique_id>
<name>Michele Metts</name>
<photo_file>P01</photo_file>
<book_ref>B10</book_ref>
</udm_people>
<udm_people>
...
</udm_people>
<udm_people>
...
</udm_people>
<udm_book_paragraph>
<book_id>B10</book_id>
<book_details>
<title>The definite guide to Drupal 7</title>
<author>Benjamin Melançon et al.</author>
</book_details>
</udm_book_paragraph>
<udm_book_paragraph>
...
</udm_book_paragraph>
<udm_book_paragraph>
...
</udm_book_paragraph>
<udm_photos>
<photo_id>P01</photo_id>
<photo_url>https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg</photo_url>
<photo_dimensions>
<width>240</width>
<height>351</height>
</photo_dimensions>
</udm_photos>
<udm_photos>
...
</udm_photos>
<udm_photos>
...
</udm_photos>
</data>
```
*Note*: You can literally swap migration sources *without changing any other part of the migration*.  This is a powerful feature of [ETL frameworks](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) like Drupal's Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some *machine names* had to be changed to avoid conflicts with other examples in the demo repository.
## Migrating nodes from a XML file
In any migration project, understanding the source is very important. For XML migrations, there are two major considerations. First, where in the *XML tree* hierarchy lies the data that you want to import. It can be at the root of the file or several levels deep in the hierarchy. You use an [XPath](https://en.wikipedia.org/wiki/XPath) expression to select a *set of nodes* from the *XML document*. In this article, the term `element` when referring to an *XML document node* to distinguish it from a [Drupal node](https://understanddrupal.com/articles/what-difference-between-node-and-content-type-drupal).  Second, when you get to the *set of elements* that you want to import, what child elements are going to be made available to the migration. It is possible that each element contains more data than needed. In XML imports, you have to manually include the child elements that will be required for the migration. The following code snippet shows part of the *local* XML file relevant to the *node* migration:
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<data>
<udm_people>
<unique_id>1</unique_id>
<name>Michele Metts</name>
<photo_file>P01</photo_file>
<book_ref>B10</book_ref>
</udm_people>
<udm_people>
...
</udm_people>
<udm_people>
...
</udm_people>
</data>
```
The *set of elements* containing node data lies two levels deep in the hierarchy. Starting with `data` at the root and then descending one level to `udm_people`. Each element of this array is an object with four properties:
- `unique_id` is the *unique identifier* for each element **returned by** the `data/udm_people` hierarchy.
- `name` is the name of a person. This will be used in the node title.
- `photo_file` is the *unique identifier* of an image that was created in a separate migration.
- `book_ref` is the *unique identifier* of a book paragraph that was created in a separate migration.
The following snippet shows the configuration to read a *local* XML file for the *node* migration:
```yaml
source:
plugin: url
# This configuration is ignored by the 'xml' data parser plugin.
# It only has effect when using the 'simple_xml' data parser plugin.
data_fetcher_plugin: file
# Set to 'xml' to use XMLReader https://www.php.net/manual/en/book.xmlreader.php
# Set to 'simple_xml' to use SimpleXML https://www.php.net/manual/en/ref.simplexml.php
data_parser_plugin: xml
urls:
- modules/custom/ud_migrations/ud_migrations_xml_source/sources/udm_data.xml
# XPath expression. It is common that it starts with a slash (/).
item_selector: /data/udm_people
fields:
- name: src_unique_id
label: 'Unique ID'
selector: unique_id
- name: src_name
label: 'Name'
selector: name
- name: src_photo_file
label: 'Photo ID'
selector: photo_file
- name: src_book_ref
label: 'Book paragraph ID'
selector: book_ref
ids:
src_unique_id:
type: integer
```
The name of the plugin is `url`. Because we are reading a local file, the `data_fetcher_plugin`  is set to `file` and the `data_parser_plugin` to `xml`. The `urls` configuration contains an array of file paths *relative to the Drupal root*. In the example we are reading from one file only, but you can read from multiple files at once. In that case, it is important that they have a homogeneous structure. The settings that follow will apply equally to all the files listed in `urls`.
*Technical note*: Migrate Plus provides two data parser plugins for XML files. [xml](https://git.drupalcode.org/project/migrate_plus/blob/8.x-5.x/src/Plugin/migrate_plus/data_parser/Xml.php) uses [XMLReader](https://www.php.net/manual/en/book.xmlreader.php) while [simple_xml](https://git.drupalcode.org/project/migrate_plus/blob/8.x-5.x/src/Plugin/migrate_plus/data_parser/SimpleXml.php) uses [SimpleXML](https://www.php.net/manual/en/ref.simplexml.php). The parser to use is configured in the `data_parser_plugin` configuration. Also note that when you use the `xml` parser, the `data_fetcher_plugin` setting is ignored. More details below.
The `item_selector` configuration indicates where in the XML file lies the *set of elements* to be migrated. Its value is an XPath expression used to traverse the file hierarchy. In this case, the value is `/data/udm_people`. Verify that your expression is valid and select the elements you intend to import. It is common that it starts with a *slash* (**/**).
`fields` has to be set to an *array*. Each element represents a field that will be made available to the migration. The following options can be set:
- `name` is required. This is how the field is going to be referenced in the migration. The name itself can be arbitrary. If it contained spaces, you need to put *double quotation marks* (**"**) around it when referring to it in the migration.
- `label` is optional. This is a description used when presenting details about the migration. For example, in the user interface provided by the [Migrate Tools module](https://www.drupal.org/project/migrate_tools). When defined, you **do not use** the *label* to refer to the field. Keep using the *name*.
- `selector` is required. This is another XPath-like string to find the field to import. The value must be relative to the subtree specified by the `item_selector` configuration. In the example, the fields are direct children of the elements to migrate. Therefore, the XPath expression only includes the element name (e.g., `unique_id`). If you had nested elements, you could use a *slash* (**/**) character to go deeper in the hierarchy. This will be demonstrated in the *image* and *paragraph* migrations.
Finally, you specify an `ids` *array* of field *names* that would uniquely identify each record. As already stated, the `unique_id` field servers that purpose. The following snippet shows part of the *process*, *destination*, and *dependencies* configuration of the node migration:
```yaml
process:
field_ud_image/target_id:
plugin: migration_lookup
migration: udm_xml_source_image
source: src_photo_file
destination:
plugin: 'entity:node'
default_bundle: ud_paragraphs
migration_dependencies:
required:
- udm_xml_source_image
- udm_xml_source_paragraph
optional: []
```
The `source` for the setting the image reference is `src_photo_file`. Again, this is the `name` of the field, not the `label` nor `selector`. The configuration of the migration lookup plugin and dependencies point to two XML migrations that come with this example. One is for migrating *images* and the other for migrating *paragraphs*.
## Migrating paragraphs from a XML file
Let's consider an example where the elements to migrate have many levels of nesting. The following snippets show part of the *local* XML file and *source* plugin configuration for the *paragraph* migration:
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<data>
<udm_book_paragraph>
<book_id>B10</book_id>
<book_details>
<title>The definite guide to Drupal 7</title>
<author>Benjamin Melançon et al.</author>
</book_details>
</udm_book_paragraph>
<udm_book_paragraph>
...
</udm_book_paragraph>
<udm_book_paragraph>
...
</udm_book_paragraph>
</data>
```
```yaml
source:
plugin: url
# This configuration is ignored by the 'xml' data parser plugin.
# It only has effect when using the 'simple_xml' data parser plugin.
data_fetcher_plugin: file
# Set to 'xml' to use XMLReader https://www.php.net/manual/en/book.xmlreader.php
# Set to 'simple_xml' to use SimpleXML https://www.php.net/manual/en/ref.simplexml.php
data_parser_plugin: xml
urls:
- modules/custom/ud_migrations/ud_migrations_xml_source/sources/udm_data.xml
# XPath expression. It is common that it starts with a slash (/).
item_selector: /data/udm_book_paragraph
fields:
- name: src_book_id
label: 'Book ID'
selector: book_id
- name: src_book_title
label: 'Title'
selector: book_details/title
- name: src_book_author
label: 'Author'
selector: book_details/author
ids:
src_book_id:
type: string
```
The `plugin`, `data_fetcher_plugin`, `data_parser_plugin` and `urls` configurations have the same values as in the *node* migration. The `item_selector` and `ids` configurations are slightly different to represent the path to *paragraph* elements and the unique identifier field, respectively.
The interesting part is the value of the `fields` configuration. Taking `data/udm_book_paragraph` as a starting point, the records with *paragraph* data have a *nested structure*. Particularly, the `book_details` element has two children: `title` and `author`. To refer to them, the selectors are `book_details/title` and `book_details/author`, respectively. Note that you can go as many level deeps in the hierarchy to find the value that should be assigned to the field. Every level in the hierarchy could be separated by a *slash* (**/**).
In this example, the target is a single paragraph type. But a similar technique can be used to migrate multiple types. One way to configure the XML file is having two children. `paragraph_id` would contain the *unique identifier* for the record. `paragraph_data` would contain a child element to specify the paragraph type. It would also have an arbitrary number of extra child elements with the data to be migrated. In the *process* section, you would iterate over the children to map the paragraph fields.
The following snippet shows part of the *process* configuration of the *paragraph* migration:
```yaml
process:
field_ud_book_paragraph_title: src_book_title
field_ud_book_paragraph_author: src_book_author
```
## Migrating images from a XML file
Let's consider an example where the elements to migrate have *more data than needed*. The following snippets show part of the *local* XML file and *source* plugin configuration for the *image* migration:
```xml
<?xml version="1.0" encoding="UTF-8" ?>
<data>
<udm_photos>
<photo_id>P01</photo_id>
<photo_url>https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg</photo_url>
<photo_dimensions>
<width>240</width>
<height>351</height>
</photo_dimensions>
</udm_photos>
<udm_photos>
...
</udm_photos>
<udm_photos>
...
</udm_photos>
</data>
```
```yaml
source:
plugin: url
# This configuration is ignored by the 'xml' data parser plugin.
# It only has effect when using the 'simple_xml' data parser plugin.
data_fetcher_plugin: file
# Set to 'xml' to use XMLReader https://www.php.net/manual/en/book.xmlreader.php
# Set to 'simple_xml' to use SimpleXML https://www.php.net/manual/en/ref.simplexml.php
data_parser_plugin: xml
urls:
- modules/custom/ud_migrations/ud_migrations_xml_source/sources/udm_data.xml
# XPath expression. It is common that it starts with a slash (/).
item_selector: /data/udm_photos
fields:
- name: src_photo_id
label: 'Photo ID'
selector: photo_id
- name: src_photo_url
label: 'Photo URL'
selector: photo_url
ids:
src_photo_id:
type: string
```
The `plugin`, `data_fetcher_plugin`, `data_parser_plugin` and `urls` configurations have the same values as in the *node* migration. The `item_selector` and `ids` configurations are slightly different to represent the path to *image* elements and the unique identifier field, respectively.
The interesting part is the value of the `fields` configuration. Taking `data/udm_photos` as a starting point, the elements with *image* data have extra children that are not used in the migration. Particularly, the `photo_dimensions` element has two children representing the width and height of the image. To ignore this subtree, you simply omit it from the `fields` configuration. In case you wanted to use it, the selectors would be `photo_dimensions/width` and `photo_dimensions/height`, respectively.
The following snippet shows part of the *process* configuration of the *image* migration:
```yaml
process:
psf_destination_filename:
plugin: callback
callable: basename
source: src_photo_url
```
## XML file location
**Important**: What is described in this section **only applies** when you use either (1) the `xml` data parser or (2) the `simple_xml` parser with the `file` data fetcher.
When using the `file` data fetcher plugin, you have three options to indicate the location to the XML files in the `urls` configuration:
- Use a *relative path* from the **Drupal root**. The path *should not start* with a *slash* (**/**). This is the approach used in this demo. For example, `modules/custom/my_module/xml_files/example.xml`.
- Use an *absolute path* pointing to the XML location in the file system. The path *should start* with a *slash* (**/**). For example, `/var/www/drupal/modules/custom/my_module/xml_files/example.xml`.
- Use a *fully-qualified URL* to any [built-in wrapper](https://www.php.net/manual/en/wrappers.php) like `http`, `https`, `ftp`, `ftps`, etc. For example, `https://understanddrupal.com/xml-files/example.xml`.
- Use a [custom stream wrapper](https://api.drupal.org/api/drupal/namespace/Drupal!Core!StreamWrapper/8.8.x).
Being able to use stream wrappers gives you many more options. For instance:
- Files located in the [public](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21PublicStream.php/class/PublicStream/8.8.x), [private](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21PrivateStream.php/class/PrivateStream/8.8.x), and [temporary](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21StreamWrapper%21TemporaryStream.php/class/TemporaryStream/8.8.x) file systems managed by Drupal. This leveragers functionality already available in Drupal core. For example: `public://xml_files/example.xml`.
- Files located in profiles, modules, and themes. You can use the [System stream wrapper module](https://www.drupal.org/project/system_stream_wrapper) or [apply](https://www.drupal.org/patch/apply) this [core patch](https://www.drupal.org/project/drupal/issues/1308152) to get this functionality. For example, `module://my_module/xml_files/example.xml`.
- Files located in [AWS Amazon S3](https://aws.amazon.com/s3/). You can use the [S3 File System module](https://www.drupal.org/project/s3fs) along with the [S3FS File Proxy to S3 module](https://www.drupal.org/project/s3fs_file_proxy_to_s3) to get this functionality.
## Migrating remote XML files
**Important**: What is described in this section **only applies** when you use the `http` data fetcher plugin.
Migrare Plus provides another data fetcher plugin named `http`. Under the hood, it uses the [Guzzle HTTP Client](https://github.com/guzzle/guzzle) library. You can use it to fetch files using any [protocol supported](https://curl.haxx.se/libcurl/c/CURLOPT_PROTOCOLS.html) by [curl](https://curl.haxx.se/libcurl/) like `http`, `https`, `ftp`, `ftps`, `sftp`, etc. In a future blog post we will explain this data fetcher in more detail. For now, the `udm_xml_source_node_remote` migration demonstrates a basic setup for this plugin. Note that only the `data_fetcher_plugin`, `data_parser_plugin`, and `urls` configurations are different from the local file example. The following snippet shows part of the configuration to read a *remote* XML file for the *node* migration:
```yaml
source:
plugin: url
data_fetcher_plugin: http
# 'simple_xml' is configured to be able to use the 'http' fetcher.
data_parser_plugin: simple_xml
urls:
- https://sendeyo.com/up/d/478f835718
item_selector: /data/udm_people
fields: ...
ids: ...
```
And that is how you can use XML files as the *source* of your migrations. Many more configurations are possible when you use the `simple_xml` parser with the `http` fetcher. For example, you can provide authentication information to get access to protected resources. You can also set custom HTTP headers. Examples will be presented in a future entry.
## XMLReader vs SimpleXML in Drupal migrations
As noted in the module's [README file](https://git.drupalcode.org/project/migrate_plus/blob/8.x-5.x/README.txt#L48), the `xml` parser plugin uses the [XMLReader](https://www.php.net/manual/en/ref.simplexml.php) interface to incrementally parse XML files. The reader acts as a cursor going forward on the document stream and stopping at each node on the way. This should be used for XML sources which are potentially very large. On the other than, the `simple_xml` parser plugin uses the [SimpleXML](https://www.php.net/manual/en/ref.simplexml.php) interface to fully parse XML files. This should be used for XML sources where you need to be able to use complex XPath expressions for your item selectors, or have to access elements outside of the current item element via XPath.
What did you learn in today's blog post? Have you migrated from XML files before? If so, what challenges have you found? Did you know that you can read local and remote files? Did you know that the `data_fetcher_plugin` configuration is ignored when using the `xml` data parser? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

195
18.txt Normal file
View file

@ -0,0 +1,195 @@
# Adding HTTP request headers and authentication to remote JSON and XML in Drupal migrations
In the previous two blog posts we learned to migrate data from [JSON](https://understanddrupal.com/articles/migrating-json-files-drupal) and [XML](https://understanddrupal.com/articles/migrating-xml-files-drupal) files. We presented to configure the migrations to fetch remote files. In today's blog post, we will learn how to add [**HTTP request headers**](https://developer.mozilla.org/en-US/docs/Glossary/Request_header) and **authentication** to the request. . For HTTP authentication you need to choose among three options: [Basic](https://en.wikipedia.org/wiki/Basic_access_authentication), [Digest](https://en.wikipedia.org/wiki/Digest_access_authentication), and [OAuth2](https://oauth.net/2/). To provide this functionality the Migrate API leverages the [Guzzle HTTP Client](https://github.com/guzzle/guzzle) library. Usage requirements and limitations will be presented. Let's begin.
## Migrate Plus architecture for remote data fetching
The [Migrate Plus module](https://www.drupal.org/project/migrate_plus) provides an extensible architecture for importing remote files. It makes use of different plugin types to fetch file, add HTTP authentication to the request, and parse the response. The following is an overview of the different plugins and how they work together to allow code and configuration reuse.
### Source plugin
The `url` source plugin is at the core of the implementation. Its purpose is to retrieve data from a list of URLs. Ingrained in the system is the goal to separate the file fetching from the file parsing. The `url` plugin will delegate both tasks to other plugin types provided by Migrate Plus.
### Data fetcher plugins
For file *fetching*, you have two options. A general-purpose `file` fetcher for getting files from the local file system or via stream wrappers. This plugin has been explained in detail on the posts about JSON and XML migrations. Because it supports stream wrapper, this plugin is very useful to fetch files from different locations and over different protocols. But it has two major downsides. First, it does not allow setting custom HTTP headers nor authentication parameters. Second, this fetcher is completely ignored if used with the `xml` or `soap` data parser (see below).
The second fetcher plugin is `http`. Under the hood, it uses the [Guzzle HTTP Client](https://github.com/guzzle/guzzle) library. This plugin allows you to define a `headers` configuration. You can set it to a list of HTTP headers to send along with the request. It also allows you to use authentication plugins (see below). The downside is that you cannot use stream wrappers. Only [protocols supported](https://curl.haxx.se/libcurl/c/CURLOPT_PROTOCOLS.html) by [curl](https://curl.haxx.se/libcurl/) can be used: `http`, `https`, `ftp`, `ftps`, `sftp`, etc.
### Data parsers plugins
*Data parsers* are responsible for processing the files considering their type: JSON, XML, or SOAP. These plugins let you select a subtree *within* the file hierarchy that contains the elements to be imported. Each record might contain more data than what you need for the migration. So, you make a second selection to manually indicate which elements will be made available to the migration. Migrate plus provides four data parses, but only two use the data fetcher plugins. Here is a summary:
- `json` can use any of the data fetchers. Offers an extra configuration option called `include_raw_data`. When set to true, in addition to all the `fields` manually defined, a new one is attached to the source with the name `raw`. This contains a copy of the full object currently being processed.
- `simple_xml` can use any data fetcher. It uses the [SimpleXML](https://www.php.net/manual/en/book.simplexml.php) class.
- `xml` does not use any of the data fetchers. It uses the [XMLReader](https://www.php.net/manual/en/class.xmlreader.php) class to directly fetch the file. Therefore, it is not possible to set HTTP headers or authentication.
- `soap` does not use any data fetcher. It uses the [SoapClient](https://www.php.net/manual/en/class.soapclient.php) class to directly fetch the file. Therefore, it is not possible to set HTTP headers or authentication.
The difference between `xml` and `simple_xml` were presented in the [previous article.](https://understanddrupal.com/articles/migrating-xml-files-drupal)
### Authentication plugins
These plugins add *authentication* headers to the request. If correct, you could fetch data from protected resources. They work exclusively with the `http` data fetcher. Therefore, you can use them only with `json` and `simple_xml` data parsers. To do that, you set an `authentication` configuration whose value can be one of the following:
- `basic` for HTTP Basic authentication.
- `digest` for HTTP Digest authentication.
- `oauth2` for OAuth2 authentication over HTTP.
Below are examples for JSON and XML imports with HTTP headers and authentication configured. The code snippets do not contain real migrations. You can also find them in the `ud_migrations_http_headers_authentication` directory of the demo repository <https://github.com/dinarcon/ud_migrations>.
**Important**: The examples are shown for reference only. **Do not** store any sensitive data in plain text or commit it to the repository.
## JSON and XML Drupal migrations with HTTP request headers and Basic authentication.
```yaml
source:
plugin: url
data_fetcher_plugin: http
# Choose one data parser.
data_parser_plugin: json|simple_xml
urls:
- https://understanddrupal.com/files/data.json
item_selector: /data/udm_root
# This configuration is provided by the `http` data fetcher plugin.
# Do not disclose any sensitive information in the headers.
headers:
Accept-Encoding: 'gzip, deflate, br'
Accept-Language: 'en-US,en;q=0.5'
Custom-Key: 'understand'
Arbitrary-Header: 'drupal'
# This configuration is provided by the `basic` authentication plugin.
# Credentials should never be saved in plain text nor committed to the repo.
authentication:
plugin: basic
username: totally
password: insecure
fields:
- name: src_unique_id
label: 'Unique ID'
selector: unique_id
- name: src_title
label: 'Title'
selector: title
ids:
src_unique_id:
type: integer
process:
title: src_title
destination:
plugin: 'entity:node'
default_bundle: page
```
## JSON and XML Drupal migrations with HTTP request headers and Digest authentication.
```yaml
source:
plugin: url
data_fetcher_plugin: http
# Choose one data parser.
data_parser_plugin: json|simple_xml
urls:
- https://understanddrupal.com/files/data.json
item_selector: /data/udm_root
# This configuration is provided by the `http` data fetcher plugin.
# Do not disclose any sensitive information in the headers.
headers:
Accept: 'application/json; charset=utf-8'
Accept-Encoding: 'gzip, deflate, br'
Accept-Language: 'en-US,en;q=0.5'
Custom-Key: 'understand'
Arbitrary-Header: 'drupal'
# This configuration is provided by the `digest` authentication plugin.
# Credentials should never be saved in plain text nor committed to the repo.
authentication:
plugin: digest
username: totally
password: insecure
fields:
- name: src_unique_id
label: 'Unique ID'
selector: unique_id
- name: src_title
label: 'Title'
selector: title
ids:
src_unique_id:
type: integer
process:
title: src_title
destination:
plugin: 'entity:node'
default_bundle: page
```
## JSON and XML Drupal migrations with HTTP request headers and OAuth2 authentication
```yaml
source:
plugin: url
data_fetcher_plugin: http
# Choose one data parser.
data_parser_plugin: json|simple_xml
urls:
- https://understanddrupal.com/files/data.json
item_selector: /data/udm_root
# This configuration is provided by the `http` data fetcher plugin.
# Do not disclose any sensitive information in the headers.
headers:
Accept: 'application/json; charset=utf-8'
Accept-Encoding: 'gzip, deflate, br'
Accept-Language: 'en-US,en;q=0.5'
Custom-Key: 'understand'
Arbitrary-Header: 'drupal'
# This configuration is provided by the `oauth2` authentication plugin.
# Credentials should never be saved in plain text nor committed to the repo.
authentication:
plugin: oauth2
grant_type: client_credentials
base_uri: https://understanddrupal.com
token_url: /oauth2/token
client_id: some_client_id
client_secret: totally_insecure_secret
fields:
- name: src_unique_id
label: 'Unique ID'
selector: unique_id
- name: src_title
label: 'Title'
selector: title
ids:
src_unique_id:
type: integer
process:
title: src_title
destination:
plugin: 'entity:node'
default_bundle: page
```
To use OAuth2 authentication, you need to install the [sainsburys/guzzle-oauth2-plugin](https://github.com/Sainsburys/guzzle-oauth2-plugin) package as suggested in Migrate Plus' `composer.json` file. You can do it via Composer issuing the following command: `composer require sainsburys/guzzle-oauth2-plugin`. Otherwise, you would get an error similar to the following:
```
[error] Error: Class 'Sainsburys\Guzzle\Oauth2\GrantType\ClientCredentials'
not found in Drupal\migrate_plus\Plugin\migrate_plus\authentication\OAuth2->getAuthenticationOptions()
(line 46 of /var/www/drupalvm/drupal/web/modules/contrib/migrate_plus/src/Plugin/migrate_plus/authentication/OAuth2.php)
#0 /var/www/drupalvm/drupal/web/modules/contrib/migrate_plus/src/Plugin/migrate_plus/data_fetcher/Http.php(100):
Drupal\migrate_plus\Plugin\migrate_plus\authentication\OAuth2->getAuthenticationOptions()
```
What did you learn in today's blog post? Did you know the configuration names for adding HTTP request headers and authentication to your JSON and XML requests? Did you know that this was limited to the parsers that make use of the `http` fetcher? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

127
19.txt Normal file
View file

@ -0,0 +1,127 @@
# Migrating Google Sheets into Drupal
Today we will learn how to migrate content from **Google Sheets** into Drupal using the [Migrate Google Sheets module](https://www.drupal.org/project/migrate_google_sheets). We will give instructions on how to publish them in JSON format to be consumed by the migration. Then, we will talk about some assumptions made by the module to allow easier plugin configurations. Finally, we will present the source plugin configuration for Google Sheets migrations. Let's get started.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD Google Sheets, Microsoft Excel, and LibreOffice Calc source migration` whose machine name is `ud_migrations_sheets_sources`. It comes with four migrations: `udm_google_sheets_source_node.yml`, `udm_libreoffice_calc_source_paragraph.yml`, `udm_microsoft_excel_source_image.yml`, and `udm_backup_csv_source_node.yml`. The last one is a backup in case the Google Sheet is not available. To execute it you would need the [Migrate Source CSV module](https://www.drupal.org/project/migrate_source_csv).
You can get the Migrate Google Sheets module and its dependency using [composer](https://getcomposer.org/): `composer require 'drupal/migrate_google_sheets:^1.0'`. It depends on [Migrate Plus](https://www.drupal.org/project/migrate_plus). Installing via composer will get you both modules.  If your Drupal site is not composer-based, you can download them manually.
## Understanding the example set up
This migration will reuse the same configuration from the [introduction to paragraph migrations](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal) example. Refer to that article for details on the configuration. The destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain Google Sheets migrations. The end result will again be nodes containing an image and a paragraph with information about someone's favorite book. The major difference is that we are going to read from different sources. In the next article, two of the migrations will be explained. They read from Microsoft Excel and LibreOffice Calc files.
*Note*: You can literally swap migration sources *without changing any other part of the migration*.  This is a powerful feature of [ETL frameworks](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) like Drupal's Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some *machine names* had to be changed to avoid conflicts with other examples in the demo repository.
## Migrating nodes from Google Sheets
In any migration project, understanding the *source* is very important. For Google Sheets there are many details that need your attention. First, the module works on top of Migrate Plus and extends its JSON data parser. In fact, you have to *publish* your Google Sheet and consume it in JSON format. Second, you need to make the JSON export publicly available. Third, you must understand the JSON format provided by Google Sheets and the assumptions made by the module to configure your fields properly. Specific instructions for Google Sheets migrations will be provided. That being said, everything explained in the JSON migration example is applicable in this case too.
## Publishing a Google Sheet in JSON format
Before starting the migration you need the *source* from where you will extract the data. For this, create a Google Sheet document. The example will use this one:
<https://docs.google.com/spreadsheets/d/1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk/edit#gid=0>
The `1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk` value is the *worksheet ID* which will be used later. Once you are done creating the document, you need to **publish** it so it can be consumed by the Migrate API. To do this, go to the `File` menu and then click on `Publish to the web`. A modal window will appear where you can configure the export. Note that it is possible to publish the `Entire document` or only some of the *worksheets* (tabs). The example document has two: `UD Example Sheet` and `Do not peek in here`. Make sure that all the worksheets that you need are published or export the entire document. Unless multiple `urls` are configured, a migration can *only import from one worksheet at a time*. If you fetch from multiple `urls` they need to have homogeneous structures. When you click the `Publish` button, a new URL will be presented. In the example it is:
<https://docs.google.com/spreadsheets/d/e/2PACX-1vTy2-CGzsoTBkmvYbolFh0UDWenwd9OCdel55j9Qa37g_earT1vA6y-6phC31Xkj8sTWF0o6mZTM90H/pubhtml>
The previous URL **will not be used**. Publishing a document is a required step, but the URL that you get should be ignored. Note that you **do not have to share** the document. It is fine that the document is *private* to you as long as it is *published*. It is up to you if you want to make it available to `Anyone with the link` or `Public on the web` and potentially grant edit or comment access. The `Share` setting does not affect the migration. The final step is getting the JSON representation of the document. You need to assemble a URL with the following pattern:
`http://spreadsheets.google.com/feeds/list/**[workbook-id]**/**[worksheet-index]**/public/values?alt=json`
Replace the `[workbook-id]` by *worksheet ID* mentioned at the beginning of this section. The one that is part of the *regular* document URL, not the *published* URL. The `worksheet-index` is an integer number starting that represents the order in which worksheets appear in the document. Use `1` for the first, `2` for the second, and so on. This means that **changing the order** of the worksheets *will affect your migration*. At the very least, you will have to update the path to reflect the new index. In the example migration, the `UD Example Sheet` worksheet will be used. It appears first in the document so worksheet index is `1`. Therefore, the exported JSON will be available at the following URL:
<http://spreadsheets.google.com/feeds/list/1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk/1/public/values?alt=json>
## Understanding the published Google Sheet JSON export
Take a moment to read the JSON export and try to understand its structure. It contains much more data than what you need. The records to be imported can be retrieved using this [XPath](https://en.wikipedia.org/wiki/XPath) expression: `/feed/entry`. You would normally have to assign this value to the `item_selector` configuration of the Migrate Plus' JSON data parser. But, because the value is the same for all Google Sheets, the module takes care of this automatically. You do not have to set that configuration in the *source* section. As for the data cells, have a look at the following code snippet to see how they appear on the export:
```json
{
"feed": {
"entry": [
{
"gsx$uniqueid": {
"$t": "1"
},
"gsx$name": {
"$t": "One Uno Un"
},
"gsx$photo-file": {
"$t": "P01"
},
"gsx$bookref": {
"$t": "B10"
}
}
]
}
}
```
*Tip*: Firefox includes a built-in JSON document viewer which helps a lot in understanding the structure of the document. If your browser does not include a similar tool out of the box, look for one in their extensions repository. You can also use a file formatter to pretty print the JSON output.
The following is a list of headers as they appear in the Google Sheet compared to how they appear in the JSON export:
- `unique_id` appears like `gsx$uniqueid`.
- `name` appears like `gsx$name`.
- `photo-file` appears like `gsx$photo-file`.
- `Book Ref` appears like `gsx$bookref`.
So, the header name from Google Sheet gets transformed in the JSON export. They get a prefix of `gsx$` and the header name is transformed to all lowercase letters with spaces and most special characters removed. On top of this, the actual cell value, that you will eventually import, is in a `$t` property one level under the header name. Now, you should create a list of fields to migrate using XPath expressions as selectors. For example, for the `Book Ref` header, the selector would be `gsx$bookref/$t`. But that is not the way to configure the Google Sheets data parser. The module makes some assumptions to make the selector clearer. So, the `gsx$` prefix and `/$t` hierarchy are assumed. For the selector, you only need to use the *transformed name*. In this case: `uniqueid`, `name`, `photo-file`, and `bookref`.
## Configuring the Migrate Google Sheets source plugin
With the JSON export of the Google Sheet and the list of transformed header names, you can proceed to configure the plugin. It will be very similar to configuring a remote JSON migration. The following code snippet shows *source* configuration for the *node* migration:
```yaml
source:
plugin: url
data_fetcher_plugin: http
data_parser_plugin: google_sheets
urls: 'http://spreadsheets.google.com/feeds/list/1YVJt9isPNjkUNHf3YgoTx38r04TwqRYnp1LFrik3TAk/1/public/values?alt=json'
fields:
- name: src_unique_id
label: 'Unique ID'
selector: uniqueid
- name: src_name
label: 'Name'
selector: name
- name: src_photo_file
label: 'Photo ID'
selector: photo-file
- name: src_book_ref
label: 'Book paragraph ID'
selector: bookref
ids:
src_unique_id:
type: integer
```
You use the `url` plugin, the `http` fetcher, and the `google_sheets` parser. The latter is provided by the module. The `urls` configuration is set to the exported JSON link. The `item_selector` is not configured because the `/feed/entry` value is assumed. The fields are configured as in the JSON migration with the caveat of using the *transformed header values* for the `selector`. Finally, you need to set the `ids` key to a combination of fields that uniquely identify each record.
The rest of the migration is almost identical to the [JSON example](https://understanddrupal.com/articles/migrating-json-files-drupal). Small changes were made to prevent machine name conflicts with other examples in the demo repository. For reference, the following snippet shows part of the *process*, *destination*, and *dependencies* section for the Google Sheets migration.
```yaml
process:
field_ud_image/target_id:
plugin: migration_lookup
migration: udm_microsoft_excel_source_image
source: src_photo_file
destination:
plugin: 'entity:node'
default_bundle: ud_paragraphs
migration_dependencies:
required:
- udm_microsoft_excel_source_image
- udm_libreoffice_calc_source_paragraph
optional: []
```
Note that the node migration depends on an *image* and *paragraph* migration. They are already available in the example. One uses a Microsoft Excel file as the source while the other a LibreOffice Calc document. Both of these migrations will be explained in the next article. Refer to [this entry](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal) to know how to run migrations that depend on others. For example, you can run: `drush migrate:import --tag='UD Sheets Source'`.
What did you learn in today's blog post? Have you migrated from Google Sheets before? If so, what challenges have you found? Did you know the procedure to export a sheet in JSON format? Did you know that the Migrate Google Sheets module is an extension of Migrate Plus? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

124
20.txt Normal file
View file

@ -0,0 +1,124 @@
# Migrating Microsoft Excel and LibreOffice Calc files into Drupal
Today we will learn how to migrate content from **Microsoft Excel** and **LibreOffice Calc files** into Drupal using the [Migrate Spreadsheet module](https://www.drupal.org/project/migrate_spreadsheet). We will give instructions on getting the module and its dependencies. Then, we will present how to configure the module for spreadsheets with or without a header row. There are two example migrations: images and paragraphs. Let's get started.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD Google Sheets, Microsoft Excel, and LibreOffice Calc source migration` whose machine name is `ud_migrations_sheets_sources`. It comes with four migrations: `udm_google_sheets_source_node.yml`, `udm_libreoffice_calc_source_paragraph.yml`, `udm_microsoft_excel_source_image.yml`, and `udm_backup_csv_source_node.yml`. The image migration uses a Microsoft Excel file as source. The *paragraph* migration uses a LibreOffice Calc file as source. The CSV migration is a backup in case the Google Sheet is not available. To execute the last one you would need the [Migrate Source CSV module](https://www.drupal.org/project/migrate_source_csv).
You can get the Migrate Google Sheets module using [composer](https://getcomposer.org/): `composer require 'drupal/migrate_spreadsheet:^1.0'. This module depends on the [PHPOffice/PhpSpreadsheet](https://github.com/PHPOffice/PhpSpreadsheet) library **and** many PHP extensions *including* `ext-zip`. Check [this page](https://github.com/PHPOffice/PhpSpreadsheet/blob/master/composer.json#L41) for a full list of dependencies. If any required extension is missing the installation will fail. If your Drupal site is not composer-based, you will not be able to use Migrate Spreadsheet, unless you go around a lot of hoops.
## Understanding the example set up
This migration will reuse the same configuration from the [introduction to paragraph migrations](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal) example. Refer to that article for details on the configuration. The destinations will be the same content type, paragraph type, and fields. The source will be changed in today's example, as we use it to explain Microsoft Excel and LibreOffice Calc migrations. The end result will again be nodes containing an image and a paragraph with information about someone's favorite book. The major difference is that we are going to read from different sources.
*Note*: You can literally swap migration sources *without changing any other part of the migration*.  This is a powerful feature of [ETL frameworks](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) like Drupal's Migrate API. Although possible, the example includes slight changes to demonstrate various plugin configuration options. Also, some *machine names* had to be changed to avoid conflicts with other examples in the demo repository.
## Understanding the source document and plugin configuration
In any migration project, understanding the source is very important. For Microsoft Excel and LibreOffice Calc migrations, the primary thing to consider is whether or not the file contains a *row of headers*. Also, a *workbook* (file) might contain several *worksheets* (tabs). You can only migrate from *one worksheet at a time*. The example documents have two worksheets: `UD Example Sheet` and `Do not peek in here`. We are going to be working with the first one.
The `spreadsheet` source plugin exposes seven configuration options. The values to use might change depending on the presence of a header row, but all of them apply for both types of document. Here is a summary of the available configurations:
- `file` is required. It stores the path to the document to process. You can use a relative path from the Drupal root, an absolute path, or [stream wrappers](https://understanddrupal.com/articles/migrating-csv-files-drupal#file-location).
- `worksheet` is required. It contains the name of the one worksheet to process.
- `header_row` is optional. This number indicates which row contains the headers. Contrary to [CSV migrations](https://understanddrupal.com/articles/migrating-csv-files-drupal), the row number is not zero-based. So, set this value to `1` if headers are on the first row, `2` if they are on the second, and so on.
- `origin` is optional and defaults to `A2`. It indicates which non-header cell contains the first value you want to import. It assumes a grid layout and you only need to indicate the position of the top-left cell value.
- `columns` is optional. It is the list of columns you want to make available for the migration. In case of files with a header row, use those header values in this list. Otherwise, use the default title for columns: `A`, `B`, `C`, etc. If this setting is missing, the plugin will return all columns. This is not ideal, especially for very large files containing more columns than needed for the migration.
- `row_index_column` is optional. This is a special column that contains the row number for each record. This can be used as a *unique identifier* for the records in case your dataset does not provide a suitable value. Exposing this special column in the migration is up to you. If so, you can come up with any name as long as it does not conflict with header row names set in the `columns` configuration. Important: this is an autogenerated column, not any of the columns that comes with your dataset.
- `keys` is optional and, if not set, it defaults to the value of `row_index_column`. It contains an array of column names that *uniquely identify* each record. For files with a header row, you can use the values set in the `columns` configuration. Otherwise, use default column titles like `A`, `B`, `C`, etc. In both cases, you can use the `row_index_column` column if it was set. Each value in the array will contain database storage details for the column.
Note that nowhere in the plugin configuration you specify the file type. The same setup applies for both Microsoft Excel and LibreOffice Calc files. The library will take care of detecting and validating the proper type.
## Migrating spreadsheet files with a header row
This example is for the *paragraph* migration and uses a LibreOffice Calc file. The following snippets shows the `UD Example Sheet` worksheet and the configuration of the *source* plugin:
```
book_id, book_title, Book author
B10, The definite guide to Drupal 7, Benjamin Melançon et al.
B20, Understanding Drupal Views, Carlos Dinarte
B30, Understanding Drupal Migrations, Mauricio Dinarte
```
```yaml
source:
plugin: spreadsheet
file: modules/custom/ud_migrations/ud_migrations_sheets_sources/sources/udm_book_paragraph.ods
worksheet: 'UD Example Sheet'
header_row: 1
origin: A2
columns:
- book_id
- book_title
- 'Book author'
row_index_column: 'Document Row Index'
keys:
book_id:
type: string
```
The name of the plugin is `spreadsheet`. Then you use the `file` configuration to indicate the path to the file. In this case, it is *relative to the Drupal root*. The `UD Example Sheet` is set as the `worksheet` to process. Because the first row of the file contains the header rows, then `header_row` is set to `1` and `origin` to `A2`.
Then specify which `columns` to make available to the migration. In this case, we listed all of them so this setting could have been left unassigned. It is better to get into the habit of being explicit about what you import. If the file were to change and more columns were added, you would not have to update the file to prevent unneeded data to be fetched. The `row_index_column` is not actually used in the migration, but it is set to show all the configuration options in the example. The values will be `1`, `2`, `3`, etc.  Finally, the `keys` is set the column that serves as *unique identifiers* for the records.
The rest of the migration is almost identical to the [CSV example](https://understanddrupal.com/articles/migrating-csv-files-drupal). Small changes were made to prevent machine name conflicts with other examples in the demo repository. For reference, the following snippet shows the *process* and *destination* sections for the LibreOffice Calc *paragraph* migration.
```
<code class="language-yaml">process:
field_ud_book_paragraph_title: book_title
field_ud_book_paragraph_author: 'Book author'
destination:
plugin: 'entity_reference_revisions:paragraph'
default_bundle: ud_book_paragraph
```
## Migrating spreadsheet files without a header row
Now let's consider an example of a spreadsheet file that does not have a header row. This example is for the *image* migration and uses a Microsoft Excel file. The following snippets shows the `UD Example Sheet` worksheet and the configuration of the *source* plugin:
```
P01, https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg
P02, https://agaric.coop/sites/default/files/pictures/picture-3-1421176784.jpg
P03, https://agaric.coop/sites/default/files/pictures/picture-2-1421176752.jpg
```
```yaml
source:
plugin: spreadsheet
file: modules/custom/ud_migrations/ud_migrations_sheets_sources/sources/udm_photos.xlsx
worksheet: 'UD Example Sheet'
# The file does not have a headers row.
header_row: null
origin: A1
# If no header row is available, you use the spreadsheet's column names: A, B, C, etc.
# If you do not manually add a list of columns, all columns that contain data in the worksheet would be returned.
# The same names need to used in the process section.
columns:
- A # This column contains the photo ID. Example: 'P01'.
- B # This column contains the photo URL.
row_index_column: null
keys:
A:
type: string
```
The `plugin`, `file`, amd `worksheet` configurations follow the same pattern as the *paragraph* migration. The difference for files with no header row is reflected in the other parameters. `header_row` is set to `null` to indicate the lack of headers and `origin` is to `A1`. Because there are no column names to use, you have to use the ones provided by the spreadsheet. In this case, we want to use the first two columns: `A` and `B`. Contrary to [CSV migrations](https://understanddrupal.com/articles/migrating-csv-files-drupal), the `spreadsheet` plugin does not allow you to define aliases for unnamed columns. That means that you would have to use `A`, `B` in the process section to refer to this columns.
`row_index_column` is set to `null` because it will not be used. And finally, in the `keys` section, we use the `A` column as the primary key. This might seem like an odd choice. Why use that value if you could use the `row_index_column` as the unique identifier for each row? If this were an isolated migration, that would be a valid option. But this migration is referenced from the *node* migration explained in the previous example. The lookup is made based on the values stored in the `A` column. If we used the index of the row as the unique identifier, we would have to update the other migration or the lookup would fail. In many cases, that is not feasible nor desirable.
Except for the name of the columns, the rest of the migration is almost identical to the CSV example. Small changes were made to prevent machine name conflicts with other examples in the demo repository. For reference, the following snippet shows part of the *process* and *destination* section for the Microsoft Excel *image* migration.
```yaml
process:
psf_destination_filename:
plugin: callback
callable: basename
source: B # This is the photo URL column.
destination:
plugin: 'entity:file'
```
Refer to this entry to know how to run migrations that depend on others. In this case, you can execute them all by running: `drush migrate:import --tag='UD Sheets Source'`. And that is how you can use Microsoft Excel and LibreOffice Calc files as the *source* of your migrations. This example is very interesting because each of the migration uses a different source type. The node migration explained in the [previous post](https://understanddrupal.com/articles/migrating-google-sheets-drupal) uses a Google Sheet. This is a great example of how powerful and flexible the Migrate API is.
What did you learn in today's blog post? Have you migrated from Microsoft Excel and LibreOffice Calc files before? If so, what challenges have you found? Did you know the source plugin configuration is not dependent on the file type? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

89
21.txt Normal file
View file

@ -0,0 +1,89 @@
# Defining Drupal migrations as configuration entities with the Migrate Plus module
Today, we are going to talk about how to manage migrations as **configuration entities**. This functionality is provided by the [Migrate Plus module](https://www.drupal.org/project/migrate_plus). First, we will explain the difference between managing migrations as code or configuration. Then, we will show how to convert existing migrations. Finally, we will talk about some important options to include in migration configuration entities. Let's get started.
## Drupal migrations: code or configuration?
So far, we have been managing migrations as **code**. This is functionality provided out of the box. You write the migration definition file in [YAML](https://en.wikipedia.org/wiki/YAML) format. Then, you place it in the `migrations` directory of your module. If you need to update the migration, you make the modifications to the files and then **rebuild caches**. More details on the workflow for migrations managed in code can be found in [this article](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow).
Migrate Plus offers an alternative to this approach. It allows you to manage migrations as **configuration entities**. You still use YAML files to write the migration definition files, but their location and workflow is different. They need to be placed in a `config/install` directory. If you need to update the migration,  you make the modifications to the files and then **sync the configuration** again. More details on this workflow can be found in [this article](https://understanddrupal.com/articles/workflows-and-benefits-managing-drupal-migrations-configuration-entities).
There is one thing worth emphasizing. When managing migrations as code you need access to the *file system* to update and deploy the changes to the file. This is usually done by developers.  When managing migrations as configuration, you can make updates via the *user interface* as long as you have permissions to sync the site's configuration. This is usually done by site administrators. You might still have to modify files depending on how you manage your configuration. But the point is that file system access to update migrations is optional. Although not recommended, you can write, modify, and execute the migrations entirely via the user interface.
## Transitioning to configuration entities
To demonstrate how to transition from code to configuration entities, we are going to convert the JSON migration example. You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD config JSON source migration` whose machine name is `udm_config_json_source`. It comes with four migrations: `udm_config_json_source_paragraph`, `udm_config_json_source_image`, `udm_config_json_source_node_local`, and `udm_config_json_source_node_remote`.
The transition to configuration entities is a two step process. First, *move* the migration definition files from the `migrations` folder to a `config/install` folder. Second, *rename* the files so that they follow this pattern: `migrate_plus.migration.[migration_id].yml`. For example: `migrate_plus.migration.udm_config_json_source_node_local.yml`. And that's it! Files placed in that directory following that pattern will be synced into Drupal's active configuration when the module is **installed for the first time (only)**. Note that changes to the files require a new synchronization operation for changes to take effect. Changing the files and rebuilding caches does not update the configuration as it was the case with migrations managed in code.
If you have the Migrate Plus module enabled, it will detect the migrations and you will be able to execute them. You can continue using the [Drush](https://www.drush.org/) commands provided the [Migrate Run](https://www.drupal.org/project/migrate_run) module. Alternatively, you can install the [Migrate Tools](https://www.drupal.org/project/migrate_tools) module which provides Drush commands for running both types of migrations: code and configuration. Migrate Tools also offers a user interface for executing migrations. This user interface is only for migrations defined as configuration though. It is available at `/admin/structure/migrate`. For now, you can run the migrations using the following Drush command: `drush migrate:import udm_config_json_source_node_local --execute-dependencies`.
*Note*: For executing migrations in the command line, choose between Migrate Run or Migrate Tools. You pick one or the other, but not both as the commands provided by the two modules have the same name. Another thing to note is that the example uses Drush 9\. There were major refactorings between versions 8 and 9 which included changes to the name of the commands.
## UUIDs for migration configuration entities
When managing migrations as configuration, you can set extra options. Some are exposed by Migrate Plus while others come from [Drupal's configuration management system](https://www.drupal.org/docs/8/configuration-management). Let's see some examples.
The most important new option is defining a [UUID](https://en.wikipedia.org/wiki/Universally_unique_identifier) for the migration definition file. This is optional, but adding one will greatly simplify the workflow to update migrations. The UUID is used to keep track of every piece of configuration in the system. When you add new configuration, Drupal will read the UUID value if provided and update that particular piece of configuration. Otherwise, it will create a UUID on the fly, attach it to the configuration definition, and then import it. That is why you want to set a UUID value manually. If changes need to be made, you want to update the same configuration, not create a new one. If no UUID was originally set, you can get the automatically created value by exporting the migration definition. The workflow for this is a bit complicated and error prone so always include a UUID with your migrations. This following snippet shows an example UUID:
```yaml
uuid: b744190e-3a48-45c7-97a4-093099ba0547
id: udm_config_json_source_node_local
label: 'UD migrations configuration example'
```
The UUID a string of 32 hexadecimal digits displayed in 5 groups. Each is separated by hyphens following this pattern: 8-4-4-4-12\. In Drupal, two or more pieces of configuration cannot share the same value. Drupal will check the UUID and the type of configuration in sync operations. In this case the type is signaled by the `migrate_plus.migration.` prefix in the name of the migration definition file.
When using configuration entities, a single migration is identified by two different options. The `uuid` is used by the Drupal's configuration system and the `id` is used by the Migrate API. Always make sure that this combination is kept the same when updating the files and syncing the configuration. Otherwise you might get hard to debug errors. Also, make sure you are importing the proper configuration type. The latter should not be something to worry about unless you utilize the user interface to export or import single configuration items.
If you do not have a UUID in advance for your migration, you can try one of these commands to generate it:
```bash
# Use Drupal's UUID service.
$ drush php:eval "echo \Drupal::service('uuid')->generate() . PHP_EOL;"
# Use a Drush command provided by the Devel module, if enabled.
$ drush devel:uuid
# Use a tool provided by your operating system, if available.
$ uuidgen
```
Alternatively, you can search online for UUID v4 generators. There are many available.
*Technical note*: Drupal uses [UUID v4 (RFC 4122 section 4.4)](http://www.rfc-editor.org/rfc/rfc4122.txt) values which are [generated](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Uuid%21Php.php/function/Php%3A%3Agenerate/9.0.x) by the [uuid](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Uuid%21Php.php/class/Php/9.0.x) service. There is a [separate class for validation](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Uuid%21Uuid.php/class/Uuid/9.0.x) purposes. Drupal might [override the UUID service](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Core%21CoreServiceProvider.php/function/CoreServiceProvider%3A%3Aalter/9.0.x) to use the most efficient generation method available. This could be using a [PECL extension](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Uuid%21Pecl.php/class/Pecl/9.0.x) or a [COM implementation for Windows](https://api.drupal.org/api/drupal/core%21lib%21Drupal%21Component%21Uuid%21Com.php/class/Com/9.0.x).
## Automatically deleting migration configuration entities
By default, configuration remains in the system even if the module that added it gets uninstalled. This can cause problems if your migration depends on custom migration plugins provided by your module. It is possible to enforce that migration entities get removed when your custom module is uninstalled. To do this, you leverage the `dependencies` option provided by Drupal's configuration management system. The following snippet shows how to do it:
```yaml
uuid: b744190e-3a48-45c7-97a4-093099ba0547
id: udm_config_json_source_node_local
label: 'UD migrations configuration example'
dependencies:
enforced:
module:
- ud_migrations_config_json_source
```
You add the machine name of your module to `dependencies > enforced > module` array. This adds an enforced dependency on your own module. The effect is that the migration will be removed from Drupal's active configuration when your custom module is uninstalled. Note that the top level `dependencies` array can have others keys in addition to `enforced`. For example: `config` and `module`. Learning more about them is left as an exercise for the curious reader.
It is important not to confuse the `dependencies` and `migration_dependencies` options. The former is provided by Drupal's configuration management system and was just explained. The latter is provided by the Migrate API and is used to declare migrations that need be imported in advance. Read [this article](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal) to know more about this feature. The following snippet shows an example:
```yaml
uuid: b744190e-3a48-45c7-97a4-093099ba0547
id: udm_config_json_source_node_local
label: 'UD migrations configuration example'
dependencies:
enforced:
module:
- ud_migrations_config_json_source
migration_dependencies:
required:
- udm_config_json_source_image
- udm_config_json_source_paragraph
optional: []
```
What did you learn in today's blog post? Did you know that you can manage migrations in two ways: code or configuration? Did you know that file name and location as well as workflows need to be adjusted depending on which approach you follow? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

64
22.txt Normal file
View file

@ -0,0 +1,64 @@
# Workflows and benefits of managing Drupal migrations as configuration entities
In the [last blog post](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module) we were introduced to managing migration as configuration entities using [Migrate Plus](https://www.drupal.org/project/migrate_plus). Today, we will present some benefits and potential drawbacks of this approach. We will also show a recommended workflow for working with migration as configuration. Let's get started.
## What is the benefit of managing migration as configurations?
At first sight, there does not seem to be a big difference between defining [migrations as code or configuration](https://understanddrupal.com/articles/defining-migrations-configuration-entities-migrate-plus-module). You can certainly do a lot without using Migrate Plus' configuration entities. The [series](https://understanddrupal.com/migrations) so far contains many examples of managing migrations as code. So, what are the benefits of adopting s configuration entities?
The [configuration management system](https://www.drupal.org/docs/8/configuration-management) is one of the major features that was introduced in Drupal 8\. It provides the ability to export all your site's configuration to files. These files can be added to version control and deployed to different environments. The system has evolved a lot in the last few years and many workflows and best practices have been established to manage configuration. On top of Drupal core's incremental improvements, a big ecosystem has sprung in terms of contributed modules. When you manage migrations via configuration, you can leverage those tools and workflows.
Here are a few use cases of what is possible:
- When migrations are managed in code, you need file system access to make any changes. Using configuration entities allows site administrators to customize or change the migration via the user interface. This is not about rewriting all the migrations. That should happen during development and never on production environments. But it is possible to tweak certain options. For example, administrators could change the location to the file that is going to be migrated, be it a local file or on remote server.
- When writing migrations, it is very likely that you will work on a subset of the data that will eventually be used to get content into the production environment.  Having migrations as configuration allow you to override part of the migration definition per environment. You could use the [Configuration Split module](https://www.drupal.org/project/config_split) to configure different source files or paths per environment. For example, you could link to a small sample of the data in *development*, a larger sample in *staging*, and the complete dataset in *production*.
- It would be possible to provide extra configuration options via the user interface. In the article about [adding HTTP authentication to fetch remote JSON and XML files](https://understanddrupal.com/articles/adding-http-request-headers-and-authentication-remote-json-and-xml-drupal-migrations), the credentials were hardcoded in the migration definition file. That is less than ideal and exposes sensitive information. An alternative would be to provide a configuration form in the administration interface for the credentials to be added. Then, the submitted values could be injected into the configuration for the migration. Again, you could make use of contrib modules like Configuration Split to make sure those credentials are never exported with the rest of your site's configuration.
- You could provide a user interface to upload migration source files. In fact, the [Migrate source UI module](https://www.drupal.org/project/migrate_source_ui) does exactly this. It exposes an administration interface where you have a file field to upload a CSV file. In the same interface, you get a list of supported migrations in the system. This allows a site administrator to manually upload a file to run the migration against. *Note*: The module is supposed to work with JSON and XML migrations. It did not work during my tests. I opened [this issue](https://www.drupal.org/project/migrate_source_ui/issues/3076725) to follow up on this.
These are some examples, but many more possibilities are available. The point is that you have the whole configuration management ecosystem at your disposal. Do you have another example? Please share it in the comments.
## Are there any drawbacks?
Managing configuration as configuration adds an extra layer of abstraction in the migration process. This adds a bit of complexity. For example:
- Now you have to keep the `uuid` and `id` keys in sync. This might not seem like a big issue, but it is something to pay attention to.
- When you work with migrations groups (explained in the next article), your migration definition could live in more file.
- The configuration management system has its own restrictions and workflows that you need to follow, particularly for updates.
- You need to be extra careful with your YAML syntax, specially if syncing configuration via the user interface. It is possible to import invalid configuration without getting an error. It is until the migration fails that you realize something is wrong.
Using configuration entities to define migrations certainly offers lots of benefits. But it requires being extra careful managing them.
## Workflow for managing migrations as configuration entities
The configuration synchronization system has specific workflows to make changes to configuration entities. This imposes some restrictions in the way you make updates to the migration definitions. Explaining how to manage configuration could use another 31 days blog post series. ;-) For now, only a general overview will be presented. The general approach is similar to [managing configuration as code](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow). The main difference is what needs to be done for changes to the migration files to take effect.
You could use the "Configuration synchronization" administration interface at `/admin/config/development/configuration`. In it you have the option to export  or import a "full archive" containing all your site's settings or a "single item" like a specific migration. This is one way to manage migrations as configuration entities which let's you find their UUIDs if not set initially. This approach can be followed by site administrators without requiring file system access. Nevertheless, it is less than ideal and error prone. This is **not** the recommended way to manage migration configuration entities.
Another option is to use [Drush](https://www.drush.org/) or [Drupal Console](https://drupalconsole.com/) to synchronize your site's configuration via the command line. Similarly to the user interface approach, you can export and import your full site configuration or only single elements. The recommendation is to do **partial configuration imports** so that only the migrations you are actively working on are updated.
Ideally, your site's architecture is completed before the migration starts. In practice, you often work on the migration while other parts of the sites are being built. If you were to export and import the entire site's configuration as you work on the migrations, you might inadvertently override unrelated pieces of configurations. For instance, this can lead to missing content types, changed field settings, and lots of frustration. That is why *doing partial or single configuration imports is recommended*. The following code snippet shows a basic Drupal workflow for managing migrations as configuration:
```console
# 1) Run the migration.
$ drush migrate:import udm_config_json_source_node_local
# 2) Rollback migration because the expected results were not obtained.
$ drush migrate:rollback udm_config_json_source_node_local
# 3) Change the migration definition file in the "config/install" directory.
# 4a) Sync configuration by folder using Drush.
$ drush config:import --partial --source="modules/custom/ud_migrations/ud_migrations_config_json_source/config/install"
# 4b) Sync configuration by file using Drupal Console.
$ drupal config:import:single --file="modules/custom/ud_migrations/ud_migrations_config_json_source/config/install/migrate_plus.migration.udm_config_json_source_node_local.yml"
# 5) Run the migration again.
$ drush migrate:import udm_config_json_source_node_local
```
Note the use of the `--partial` and `--source` flags in the migration import command. Also note that the path is relative to the current working directory from where the command is being issued. In this snippet, the value of the *source flag* is the directory holding your migrations. Be mindful if there are other non-migration related configurations in the same folder. If you need to be more granular, Drupal Console offers a command to import individual configuration files as shown in the previous snippet.
*Note*: Uninstalling and installing the module again will also apply any changes to your configuration. This might produce errors if the migration configuration entities are not removed automatically when the module is uninstalled. Read [this article](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module) for details on how to do that.
What did you learn in today's blog post? Did the know benefits and trade-offs of managing migrations as configuration? Did you know what to do for changes in migration configuration entities to take effect? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

121
23.txt Normal file
View file

@ -0,0 +1,121 @@
# Using migration groups to share configuration among Drupal migrations
In the previous posts we talked about option to manage [migrations as configuration entities](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module) and some of [the benefits this brings](https://understanddrupal.com/articles/workflows-and-benefits-managing-drupal-migrations-configuration-entities). Today, we are going to learn another useful feature provided by the [Migrate Plus module](https://www.drupal.org/project/migrate_plus): **migration groups**. We are going to see how they can be used to execute migrations together and share configuration among them. Let's get started.
## Understanding migration groups
The Migrate Plus module defines a new configuration entity called **migration group**. When the module is enabled, each migration can define one group they belong to. This serves two purposes:
1. It is possible to **execute operations per group**. For example, you can import or rollback all migrations in the same group with one Drush command provided by the [Migrate Tools module](https://www.drupal.org/project/migrate_tools).
2. It is is possible to declare **shared configuration** for all the migrations within a group. For example, if they use the same source file, the value can be set in a single place: the migration group.
To demonstrate how to leverage migration groups, we will convert the [CSV source example](https://understanddrupal.com/articles/migrating-csv-files-drupal) to use migrations defined as configuration and groups. You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD configuration group migration (CSV source)` whose machine name is `ud_migrations_config_group_csv_source`. It comes with three migrations: `udm_config_group_csv_source_paragraph`, `udm_config_group_csv_source_image`, and  `udm_config_group_csv_source_node`. Additionally, the demo module provides the `udm_config_group_csv_source` group.
*Note*: The Migrate Tools module provides a user interface for managing migrations defined as configuration. It is available under "Structure > Migrations" at `/admin/structure/migrate`. This user interface will be explained in a future entry. For today's example, it is assumed that migrations are executed using the Drush commands provided by Migrate Plus. In the past we have used the [Migrate Run module](https://www.drupal.org/project/migrate_run) to execute migrations, but this module does not offer the ability to import or rollback migrations per group.
## Creating a migration group
The migration groups are defined in YAML files using the following naming convention: `migrate_plus.migration_group.[migration_group_id].yml`. Because they are configuration entities, they need to be placed in the `config/install` directory of your module. Files placed in that directory following that pattern will be synced into Drupal's active configuration when the module is **installed for the first time (only)**. If you need to update the migration groups, you make the modifications to the files and then sync the configuration again. More details on this workflow can be found in [this article](https://understanddrupal.com/articles/workflows-and-benefits-managing-drupal-migrations-configuration-entities). The following snippet shows an example migration group:
```yaml
uuid: e88e28cc-94e4-4039-ae37-c1e3217fc0c4
id: udm_config_group_csv_source
label: 'UD Config Group (CSV source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'CSV resource'
shared_configuration: null
```
```yaml
uuid: e88e28cc-94e4-4039-ae37-c1e3217fc0c4
id: udm_config_group_csv_source
label: 'UD Config Group (CSV source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'CSV resource'
shared_configuration: null
```
The `uuid` key is optional. If not set, the configuration management system will create one automatically and assign it to the migration group. Setting one simplifies the workflow for updating configuration entities as explained in [this article](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module). The `id` key is required. Its value is used to associate individual migrations to this particular group.
The `label`, `description`, and `source_type` keys are used to give details about the migration. Their value appear in the user interface provided by Migrate Tools. `label` is required and serves as the name of the group. `description` is optional and provides more information about the group. `source_type` is optional and gives context about the type of source you are migrating from. For example, "Drupal 7", "WordPress", "CSV file", etc.
To associate a migration to a group, set the `migration_group` key in the migration definition file: For example:
```yaml
uuid: 97179435-ca90-434b-abe0-57188a73a0bf
id: udm_config_group_csv_source_node
label: 'UD configuration host node migration for migration group example (CSV source)'
migration_group: udm_config_group_csv_source
source: ...
process: ...
destination: ...
migration_dependencies: ...
```
Note that if you omit the `migration_group` key, it will default to a `null` value meaning the migration is not associated with any group. You will still be able to execute the migration from the command line, but it will not appear in the user interface provided by Migrate Tools. If you want the migration to be available in the user interface without creating a new group, you can set the `migration_group` key to `default`. This group is automatically created by Migrate Plus and can be used as a generic container for migrations.
## Organizing and executing migrations
Migration groups are used to organize migrations. Migration projects usually involve several types of elements to import. For example, book reports, events, subscriptions, user accounts, etc. Each of them might require multiple migrations to be completed. Let's consider a news articles migration. The "book report" content type has many entity reference fields: book cover (image), support documents (file), tags (taxonomy term), author (user), citations (paragraphs). In this case, you will have one primary node migration that depends on five migrations of multiple types. You can put all of them in the same group and execute them together.
It is very important not to confuse *migration groups* with [migration dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal). In the previous example, the primary book report node migration should still list all its dependencies in the `migration_dependencies` section of its definition file. Otherwise, there is no guarantee that the five migrations it depends on will be executed in advance. This could cause problems if the primary node migration is executed before images, files, taxonomy terms, users, or paragraphs have already been imported into the system.
It is possible to execute all migrations in a group by issuing a single Drush with the `--group` flag. This is supported by the import and rollback commands exposed by Migrate Tools. For example, `drush migrate:import --group='udm_config_group_csv_source'`. Note that as of this writing, there is no way to run all migrations in a group in a single operation from the user interface. You could import the main migration and the system will make sure that if any explicit dependency is set, they will be run in advance. If the group contained more migrations than the ones listed as dependencies, those will not be imported. Moreover, migration dependencies are only executed automatically for import operations. Dependent migrations will not be rolled back automatically if the main migration is rolled back individually.
*Note*: This example assumes you are using Drush to execute the migrations. At the time of this writing, it is not possible to rollback a CSV migration from the user interface. See [this issue](https://www.drupal.org/project/migrate_source_csv/issues/3068017) in the Migrate Source CSV for more context.
## Sharing configuration with migration groups
Arguably, the major benefit of migration groups is the ability to share configuration among migrations. In the example, there are three migrations all reading from CSV files. Some configurations like the source `plugin` and `header_offset` can be shared. The following snippet shows an example of declaring shared configuration in the migration group for the CSV example:
```yaml
uuid: e88e28cc-94e4-4039-ae37-c1e3217fc0c4
id: udm_config_group_csv_source
label: 'UD Config Group (CSV source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'CSV resource'
shared_configuration:
dependencies:
enforced:
module:
- ud_migrations_config_group_csv_source
migration_tags:
- UD Config Group (CSV Source)
- UD Example
source:
plugin: csv
# It is assumed that CSV files do not contain a headers row. This can be
# overridden for migrations where that is not the case.
header_offset: null
```
Any configuration that can be set in a regular migration definition file can be set under the `shared_configuration` key. When the migrate system loads the migration, it will read the migration group it belongs to, and pull any shared configuration that is defined. If both the migration and the group provide a value for the same key, the one defined in the migration definition file will override the one set in the migration group. If a key only exists in the group, it will be added to the migration when the definition file is loaded.
In the example, `dependencies`, `migration_tags`, and `source` options are being set. They will apply to all migrations that belong to the `udm_config_group_csv_source` group. If you updated any of these values, the changes would propagate to every migration in the group. Remember that you would need to sync the migration group configuration for the update to take effect. You can do that with partial configuration imports as explained in this article.
Any configuration set in the group can be overridden in specific migrations. In the example, the `header_offset` is set to `null` which means the CSV files do not contain a header row. The node migration uses a CSV file that contains a header row so that configuration needs to be redeclared. The following snippet shows how to do it:
```yaml
uuid: 97179435-ca90-434b-abe0-57188a73a0bf
id: udm_config_group_csv_source_node
label: 'UD configuration host node migration for migration group example (CSV source)'
# Any configuration defined in the migration group can be overridden here
# by re-declaring the configuration and assigning a value.
# `dependencies` inherited from migration group.
# `migration_tags` inherited from migration group.
migration_group: udm_config_group_csv_source
source:
# `plugin` inherited from migration group.
path: modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_people.csv
ids: [unique_id]
# This overrides the `header_offset` defined in the group. The referenced CSV
# file includes headers in the first row. Thus, a value of `0` is used.
header_offset: 0
process: ...
destination: ...
migration_dependencies: ...
```
Another example would be multiple migrations reading from a remote JSON. Let's say that instead of fetching a remote file, you want to read a local file. The only file you would have to update is the migration group. Change the `data_fetcher_plugin` key to `file` and the `urls` array to the path to the local file. You can try this with the `ud_migrations_config_group_json_source` module from the [demo repository](https://github.com/dinarcon/ud_migrations).
What did you learn in today's blog post? Did the know that migration groups can be used to share configuration among different migrations? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

89
24.txt Normal file
View file

@ -0,0 +1,89 @@
# What is the difference between migration tags and migration groups in Drupal?
In the [previous post](https://understanddrupal.com/articles/using-migration-groups-share-configuration-among-drupal-migrations) we talked about **migration groups** provided by the [Migrate Plus module](https://www.drupal.org/project/migrate_plus). Today, we are going to compare them to **migration tags**. Along the way, we are going to dive deeper into how they work and provide tips to avoid problems when working with them. Let's get started.
## What is the difference between migration tags and migration groups?
In the article on [declaring migration dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal) we briefly touched on the topic of tags. Here is a summary of migration tags:
- They are a feature provided by the core Migrate API.
- Multiple tags can be assigned to a single migration.
- They are defined in the migration definition file alone and do not require creating a separate file.
- Both Migrate Tools and Migrate Run provide a flag to execute operations by tag.
- They do not allow you to share configuration among migrations tagged with the same value.
Here is a summary of migration groups:
- You need to install the Migrate Plus module to take advantage of them.
- Only one group can be assigned to any migration.
- You need to create a separate file to declare group. This affects the readability of migrations as their configuration will be spread over two files.
- Only the Migrate Tools provides a flag to execute operations by group.
- They offer the possibility to share configuration among members of the same group.
- Any shared configuration could be overridden in the individual migration definition files.
## What do migration tags and groups have in common?
The ability to put together multiple migrations under a single name. This name can be used to import or rollback all the associated migrations in one operation. This is true for the `migrate:import` and `migrate:rollback` Drush commands provided by Migrate Plus. What you have to do is use the `--group` or `--tag` flags, respectively. The following snipped shows an example of importing and rolling back the migrations by group and tag:
```console
$ drush migrate:import --tag='UD Config Group (JSON Source)'
$ drush migrate:rollback --tag='UD Config Group (JSON Source)'
$ drush migrate:import --group='udm_config_group_json_source'
$ drush migrate:rollback --group='udm_config_group_json_source'
```
*Note*: You might get errors indicating that the "--tag" or "--group" options do not accept a value. See [this issue](https://www.drupal.org/project/migrate_tools/issues/3024399) if you experience that problem.
Neither migration tags nor groups replace migration dependencies. If there are explicit [migration dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal) among the members of a tag or group, the Migrate API will determine the order in which the migrations need to be executed. But if no dependencies are explicitly set, there is no guarantee the migrations will be executed in any particular order. Keep this in mind if you have separate migrations for entities that you later use to populate entity reference fields. Also note that migration dependencies are only executed automatically for import operations. Dependent migrations will not be rolled back automatically if the main migration is rolled back individually.
## Can groups only be used for migrations defined as configuration entities?
Technically speaking, no. It is possible to use groups for migrations defined as code. Notwithstanding, migration groups can only be created as configuration entities. You would have to rebuild caches and sync configuration for changes in the migrations and groups to take effect, respectively. This is error prone and can lead to hard to debug issues.
Also, things might get confusing when executing migrations. The user interface provided by Migrate Plus works exclusively with migrations defined as configuration entities. The Drush commands provided by the same module work for both types of migrations: code and configuration. The `default` and `null` values for the `migration_group` key are handled a bit different between the user interface and the Drush commands. Moreover, the ability to execute operations per group using Drush commands is provided only by the Migrate Tools module. The Migrate Run module lacks this functionality.
Managing migrations as code or configuration should be a decision to take at the start of the project. If you want to use [migration groups](https://understanddrupal.com/articles/using-migration-groups-share-configuration-among-drupal-migrations), or some of the other [benefits provided by migrations defined as configuration](https://understanddrupal.com/articles/workflows-and-benefits-managing-drupal-migrations-configuration-entities), stick to them since the very beginning. It is possible to change at any point and the transition is straightforward. But it should be avoided if possible. In any case, try not to mix both workflows in a single project.
*Tip*: It is recommended to read [this article](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module) to learn more about the difference between managing migrations as code or configuration.
## Setting migration tags inside migration groups
As seen in this article, it is possible to use set migration tags as part of the shared configuration of a group. If you do this, it is **not recommended** to override the `migration_tags` key in individual migrations. The end result might not be what you expect. Consider the following snippets as example:
```yaml
# Migration group configuration entity definition.
# File: migrate_plus.migration_group.udm_config_group_json_source.yml
uuid: 78925705-a799-4749-99c9-a1725fb54def
id: udm_config_group_json_source
label: 'UD Config Group (JSON source)'
description: 'A container for migrations about individuals and their favorite books. Learn more at https://understanddrupal.com/migrations.'
source_type: 'JSON resource'
shared_configuration:
migration_tags:
- UD Config Group (JSON Source)
- UD Example
```
```yaml
# Migration configuration entity definition.
# File: migrate_plus.migration.udm_config_group_json_source_node.yml
uuid: def749e5-3ad7-480f-ba4d-9c7e17e3d789
id: udm_config_group_json_source_node
label: 'UD configuration host node migration for migration group example (JSON source)'
migration_tags:
- UD Lorem Ipsum
migration_group: udm_config_group_json_source
source: ...
process: ...
destination: ...
migration_dependencies: ...
```
The group configuration declares two tags: `UD Config Group (JSON Source)` and `UD Example`. The migration configuration overrides the tags to a single value `UD Lorem Ipsum`. What would you expect the final value for the `migration_tags` key be? Is it a combination of the three tags? Is it only the one key defined in the migration configuration?
The answer in this case is not very intuitive. The final migration will have two tags: `UD Lorem Ipsum` and `UD Example`. This has to do with how Migrate Plus merges the configuration from the group into the individual migrations. It uses the `array_replace_recursive()` PHP function which performs the merge operation based on array keys. In this example, `UD Config Group (JSON Source)` and `UD Lorem Ipsum` have the same index in the `migration_tags` array. Therefore, only one value is preserved in the final result.
The examples uses the `migration_tags` key as it is the subject of this article, but the same applies to any nested structure. Some configurations are more critical to a migration than a tag or group. Debugging a problem like this can be tricky. But the same applies to any configuration that has a nested structure. If the end result might be ambiguous, it is preferred to avoid the situation in the first place. In general, nested structures should only be set in either the group or the migration definition file, but not both. Additionally, all the recommendations for writing migrations presented in [this article](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow) also apply here.
What did you learn in today's blog post? Did you know the difference between migration tags and groups? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

70
25.txt Normal file
View file

@ -0,0 +1,70 @@
# Executing Drupal migrations from the user interface with Migrate Tools
In previous posts we introduced the concept of defining [migrations as configuration entities](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module). This type of migrations can be executed from a user interface provided by the [Migrate Tools module](https://www.drupal.org/project/migrate_tools). In today's article, we will present the workflow to import configuration entities and execute migrations from the user interface. Let's get started.
**Important**: User interfaces tend to change. In fact, Migrate Tools [recently launched a redesign of the user interface](https://www.drupal.org/node/3064965). Screenshots and referenced on-screen text might change over time. Some exposed settings might work, but give [no feedback indicating a successful outcome](https://www.drupal.org/node/3012731). Some forms might [only work with specific versions of modules](https://www.drupal.org/node/3077558). The goal is to demonstrate what is possible and what are the limitations of doing so. In general, executing migrations from the command line offers a better experience.
## Getting the example code
You can get the full code example for today's example at <https://github.com/dinarcon/ud_migrations> The module to use is `UD configuration group migration (JSON source)` whose machine name is `ud_migrations_config_group_json_source`.  It comes with three migrations: `udm_config_group_json_source_paragraph`, `udm_config_group_json_source_image`, and  `udm_config_group_json_source_node`. Additionally, the demo module provides the `udm_config_group_json_source` group.
You could install the module as we have done with every other example in [the series](https://understanddrupal.com/migrations). Instructions can be found in [this article](https://understanddrupal.com/articles/writing-your-first-drupal-migration). When the module is installed, you could execute the migrations using Drush commands as explained in [this article](https://understanddrupal.com/articles/workflows-and-benefits-managing-drupal-migrations-configuration-entities). But, we are going to use the user interface provided by Migrate Plus instead.
## Importing configuration entities from the user interface
The first step is getting the configuration entities into Drupal's active configuration. One way to do this is by using the "Single item" import from the "Configuration synchronization" interface. It can be found at `/admin/config/development/configuration/single/import`. Four configuration items will be imported in the following order:
1. The [udm_config_group_json_source](https://github.com/dinarcon/ud_migrations/blob/master/ud_migrations_config_group_json_source/config/install/migrate_plus.migration_group.udm_config_group_json_source.yml) using "Migration Group" as the configuration type.
2. The [udm_config_group_json_source_image](https://github.com/dinarcon/ud_migrations/blob/master/ud_migrations_config_group_json_source/config/install/migrate_plus.migration.udm_config_group_json_source_image.yml) using "Migration" as the configuration type.
3. The [udm_config_group_json_source_paragraph](https://github.com/dinarcon/ud_migrations/blob/master/ud_migrations_config_group_json_source/config/install/migrate_plus.migration.udm_config_group_json_source_paragraph.yml) using "Migration" as the configuration type.
4. The [udm_config_group_json_source_node](https://github.com/dinarcon/ud_migrations/blob/master/ud_migrations_config_group_json_source/config/install/migrate_plus.migration.udm_config_group_json_source_node.yml) using "Migration" as the configuration type.
When importing configuration this way, you need to select the proper "Configuration type" from the dropdown. Otherwise, the system will make its best effort to produce a valid configuration entity from the YAML code you paste in the box. This can happen without a visual indication that you imported the wrong configuration type which can lead to hard to debug errors.
![Single item configuration import interface](https://understanddrupal.com/sites/default/files/inline-images/config_import.png)
*Note*: If you do not see the "Migration" and "Migration group" configuration types, you need to enable the Migrate Plus module.
Another thing to pay close attention is changes in formatting when pasting code from a different source. Be it GitHub, your IDE, or an online tutorial, verify that your YAML syntax is valid and whitespaces are preserved. A single extraneous whitespace can break the whole migration. For invalid syntax, the best case scenario is the system rejecting the YAML definition. Otherwise, you might encounter hard to debug errors.
In today's example, we are going to copy the YAML definitions from GitHub. You can find the source files to copy [here](https://github.com/dinarcon/ud_migrations/tree/master/ud_migrations_config_group_json_source/config/install). When viewing a file, it is recommended to click the "Raw" button to get the content of the file in plain text. Then, copy and paste the whole file in the designated box of the "Single item" import interface. Make sure to import the four configuration entities in the order listed above using the proper type.
![GitHub interface showing configuration definition file](https://understanddrupal.com/sites/default/files/inline-images/raw_button_github.png)
To verify that this worked, you can use the "Single item" export interface located at `/admin/config/development/configuration/single/export`. Select the "Configuration type" and the "Configuration name" for each element that was imported. You might see some extra keys in the export. As long as the ones you manually imported are properly set, you should be good to continue.
Note that we included `uuid` keys in all the example migrations. As explained in this article, that is not required. Drupal would automatically create a `uuid` when the configuration is imported. But defining one simplifies the update process. If you needed to update any of those configurations, you can directly visit the "Single item" import interface and paste the new configuration. Otherwise, you would have to export it first, copy the `uuid` value, and add it to the code to import.
## Executing configuration entities from the user interface
With the migration related configuration entities in Drupal's active configuration, you should be able to execute them from the user interface provided by Migrate Tools. It is available in  "Manage > Structure > Migration" at `/admin/structure/migrate`. You should see the `UD Config Group (JSON source)` that was imported in the previous step.
![Interface listing migration groups](https://understanddrupal.com/sites/default/files/inline-images/list_migration_groups.png)
*Note*: If you do not see the "Migration" link in "Manage > Structure" interface, you need to enable the Migrate Tools module.
This user interface will list all the migration groups in the system. From there, you can get to the individual migrations. You can even inspect the migration definition from this interface. It is important to note that *only migrations defined as configuration entities will appear in this interface*. Migrations defined as code will **not** be listed.
![Interface listing migration](https://understanddrupal.com/sites/default/files/inline-images/list_migrations.png)
For the `udm_config_group_json_source` group, click the "List migrations" button to display all the migrations in the group. Next, click the "Execute" button on the `udm_config_group_json_source_image` migration. Then, make sure "Import" is selected as the operation and click the "Execute" button. Drupal will perform the import operation for the image migration. A success status message will appear if things work as expected. You can also verify that images where imported by visiting the files listing page at `/admin/content/files`.
Repeat the same process for the `udm_config_group_json_source_paragraph` and `udm_config_group_json_source_node`migrations. The final result will be similar to the one from the [JSON source example](https://understanddrupal.com/articles/migrating-json-files-drupal). For import operations, the system will check for migration dependencies and execute them advance if defined. That means that in this example, you could run the import operation directly on the node migration. Drupal will automatically execute the images and paragraphs migrations. Note that migration dependencies are only executed automatically for import operations. Dependent migrations will not be rolled back automatically if the main migration is rolled back individually.
This example includes a paragraph migrations. As explained in [this article](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal), if you rollback the node migration, any paragraph associated with the nodes will be deleted. The Migrate API will not be aware of this. To fix the issue and recover the paragraphs, you need to rollback the paragraph migration as well. Then you re-import the paragraph and node migrations again. Doing this from the user interface involves more manual steps compared to running a single Drush command in the terminal.
## Limitations of the user interface
Although you can import and execute migrations from the user interface, this workflow comes with many limitations.The following is a list of some of the things that you should consider:
- Updating configuration in production environments is not recommended. This can be enforced using the [Configuration Read-only mode module](https://www.drupal.org/project/config_readonly).
- If the imported configuration entity did not contain a `uuid`, you need to export that configuration to get the auto generated value. This should be used in subsequent configuration import operations if updates to the YAML definition files are needed. Otherwise, you will get an error like "An entity with this machine name already exists but the import did not specify a UUID."
- The following operations to not provide any user interface feedback if they succeeded: "Rollback", "Stop", and "Reset". See [this issue](https://www.drupal.org/node/3012731) for more context.
- As of this writing, most operations for CSV sources fail if using the 8.x-3.x branch of the [Migrate Source CSV module](https://www.drupal.org/project/migrate_source_csv). See [this issue](https://www.drupal.org/node/3068017) for more context.
- As of this writing, the user interface for renaming columns in CSV sources produces a fatal error if using the 8.x-3.x branch of the Migrate Source CSV module. See [this issue](https://www.drupal.org/node/3077558) for more context.
- As of this writing, it is not possible to execute all migrations in a group from the user interface in one operation. See [this issue](https://www.drupal.org/node/2996610) for more context.
- The [Migrate source UI module](https://www.drupal.org/project/migrate_source_ui) can be used to upload a file to be used as source in CSV migrations. As of this writing, a similar feature for JSON and XML files is not working. See [this issue](https://www.drupal.org/node/3076725) for more context.
To reiterate, it is not recommended to use the user interface to add configuration related to migrations and execute them. The extra layer of abstractions can make it harder to debug problems with migrations if they arise. If possible, execute your migrations using the commands provided by Migrate Tools. Finally, it is recommended to read [this article](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module) to learn more about the difference between managing migrations as code or configuration.
What did you learn in today's blog post? Did you know you can import migration related configuration entities from the user interface? Did you know that Migrate Tools provides a user interface for running migrations? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

116
26.txt Normal file
View file

@ -0,0 +1,116 @@
# Understanding the entity_lookup and entity_generate process plugins from Migrate Tools
In recent posts we have explored the [Migrate Plus](https://www.drupal.org/project/migrate_plus) and [Migrate Tools](https://www.drupal.org/project/migrate_tools) modules. They extend the Migrate API to provide [migrations defined as configuration entities](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module), [groups to share configuration among migrations](https://understanddrupal.com/articles/using-migration-groups-share-configuration-among-drupal-migrations), [a user interface to execute migrations](https://understanddrupal.com/articles/executing-drupal-migrations-user-interface-migrate-tools), among [other things](https://understanddrupal.com/migrations). Yet another benefit of using Migrate Plus is the option to leverage the many process plugins it provides. Today, we are going to learn about two of them: `entity_lookup` and `entity_generate`. We are going to compare them with the `migration_lookup` plugin, show how to configure them, and explain their compromises and limitations. Let's get started.
## What is the difference among the migration_lookup, entity_lookup, entity_generate plugins?
In the article about [migration dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal) we covered the `migration_lookup` plugin provided by the core Migrate API. It lets you maintain relationships among *entities that are being imported*. For example, if you are migrating a node that has associated users, taxonomy terms, images, paragraphs, etc. This plugin has a very important restriction: the related entities must come from another migration. But what can you do if you need to reference entities that already exists system? You might already have users in Drupal that you want to assign as node authors. In that case, the `migration_lookup` plugin cannot be used, but `entity_lookup` can do the job.
The `entity_lookup` plugin is provided by the Migrate Plus module. You can use it to query any entity in the system and get its unique identifier. This is often used to populate entity reference fields, but it can be used to set any field or property in the destination. For example, you can query existing users and assign the `uid` *node property* which indicates who created the node. If no entity is found, the module returns a `NULL` value which you can use in combination of other plugins to provide a fallback behavior. The advantage of this plugin is that it does not require another migration. You can query any entity in the entire system.
The `entity_generate` plugin, also provided by the Migrate Plus module, is an extension of `entity_lookup`. If no entity is found, this plugin will automatically create one. For example, you might have a list of taxonomy terms to associate with a node. If some of the terms do not exist, you would like to create and relate them to the node.
*Note*: The `migration_lookup` offers a feature called [stubbing](https://www.drupal.org/docs/8/api/migrate-api/migrate-api-overview#stubs) that neither `entity_lookup` nor `entity_generate` provides. It allows you to create a placeholder entity that will be updated later in the migration process. For example, in a hierarchical taxonomy terms migration, it is possible that a term is migrated before its parent. In that case, a stub for the parent will be created and later updated with the real data.
## Getting the example code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD Config entity_lookup and entity_generate examples` whose machine name is `ud_migrations_config_entity_lookup_entity_generate`. It comes with one JSON migrations: `udm_config_entity_lookup_entity_generate_node`. Read this article for details on [migrating from JSON files](https://understanddrupal.com/articles/migrating-json-files-drupal). The following snippet shows a sample of the file:
```json
{
"data": {
"udm_nodes": [
{
"unique_id": 1,
"thoughtful_title": "Amazing recipe",
"creative_author": "udm_user",
"fruit_list": "Apple, Pear, Banana"
},
{...},
{...},
{...}
]
}
}
```
Additionally, the example module creates three users upon installation: 'udm_user', 'udm_usuario', and 'udm_utilisateur'. They are deleted automatically when the module is uninstalled. They will be used to assign the node authors. The example will create nodes of types "Article" from the standard installation profile. You can execute the migration from the interface provided by Migrate Tools at `/admin/structure/migrate/manage/default/migrations`.
## Using the entity_lookup to assign the node author
Let's start by assigning the node author. The following snippet shows how to configure the `entity_lookup` plugin to assign the node author:
```yaml
uid:
- plugin: entity_lookup
entity_type: user
value_key: name
source: src_creative_author
- plugin: default_value
default_value: 1
```
The `uid` node property is used to assign the node author. It expects an integer value representing a user ID (`uid`). The *source* data contains usernames so we need to query the database to get the corresponding user IDs. The users that will be referenced were not imported using the Migrate API. They were already in the system. Therefore, `migration_lookup` cannot be used, but `entity_lookup` can.
The plugin is configured using three keys. `entity_type` is set to machine name of the entity to query: `user` in this case. `value_key` is the name of the *entity property* to lookup. In Drupal, the usernames are stored in a property called `name`. Finally, `source` specifies which field from the *source* contains the lookup value for the `name` entity property. For example, the first record has a `src_creative_author` value of `udm_user`. So, this plugin will instruct Drupal to search among all the users in the system one whose `name` (username) is `udm_user`. If a value if found, the plugin will return the user ID. Because the `uid` *node property* expects a user ID, the return value of this plugin can be used directly to assign its value.
What happens if the plugin does not find an entity matching the conditions? It returns a `NULL` value. Then it is up to you to decide what to do. If you let the `NULL` value pass through, Drupal will take some default behavior. In the case of the `uid` property, if the received value is not valid, the node creation will be attributed to the anonymous user (uid: 0). Alternatively, you can detect if `NULL` is returned and take some action. In the example, the second record specifies the "udm_not_found" user which does not exists. To accommodate for this, a [process pipeline](https://understanddrupal.com/articles/using-constants-and-pseudofields-data-placeholders-drupal-migration-process-pipeline) is defined to manually specify a user if `entity_lookup` did not find one. The `default_value` plugin is used to return `1` in that case. The number represents a user ID, not a username. Particularly, this is the user ID of "*super user*" created when Drupal was first installed. If you need to assign a different user, but the user ID is unknown, you can create a pseudofield and use the `entity_lookup` plugin again to finds its user ID. Then, use that pseudofield as the default value.
**Important**: User entities **do not** have bundles. Do not set the `bundle_key` nor `bundle` configuration options of the `entity_lookup`. Otherwise, you will get the following error: "The entity_lookup plugin found no bundle but destination entity requires one." Files do not have bundles either. For entities that have bundles like nodes and taxonomy terms, those options need to be set in the `entity_lookup` plugin.
## Using the entity_generate to assign and create taxonomy terms
Now, let's migrate a comma separated list of taxonomy terms. An example value is `Apple, Pear, Banana`.  The following snippet shows how to configure the `entity_generate` plugin to look up taxonomy terms and create them on the fly if they do not exist:
```yaml
field_tags:
- plugin: skip_on_empty
source: src_fruit_list
method: process
message: 'No src_fruit_list listed.'
- plugin: explode
delimiter: ','
- plugin: callback
callable: trim
- plugin: entity_generate
entity_type: taxonomy_term
value_key: name
bundle_key: vid
bundle: tags
```
The terms will be assigned to the `field_tags` field using a process pipeline of four plugins:
- `skip_on_empty` will skip the processing of this field if the record does not have a `src_fruit_list` column.
- `explode` will break the string of comma separated files into individual elements.
- `callback` will use the `trim` PHP function to remove any whitespace from the start or end of the taxonomy term name.
- `entity_generate` takes care of finding the taxonomy terms in the system and creating the ones that do not exist.
For a detailed explanation of the `skip_on_empty` and `explode` plugins see [this article](https://understanddrupal.com/articles/migrating-taxonomy-terms-and-multivalue-fields-drupal). For the `callback` plugin see [this article](https://understanddrupal.com/articles/using-process-plugins-data-transformation-drupal-migrations). Let's focus on the `entity_generate` plugin for now. The `field_tags` field expects an array of taxonomy terms IDs (`tid`). The *source* data contains term names so we need to query the database to get the corresponding term IDs. The taxonomy terms that will be referenced were not imported using the Migrate API. And they might exist in the system yet. If that is the case, they should be created on the fly. Therefore, `migration_lookup` cannot be used, but `entity_generate` can.
The plugin is configured using five keys. `entity_type` is set to machine name of the entity to query: `taxonomy_term` in this case. `value_key` is the name of the *entity property* to lookup. In Drupal, the taxonomy term names are stored in a property called `name`. Usually, you would include a `source` that specifies which field from the *source* contains the lookup value for the `name` entity property. In this case it is not necessary to define this configuration option. The lookup value will be passed from the previous plugin in the process pipeline. In this case, the trimmed version of the taxonomy term name.
If, and **only if**, the entity type has bundles, you also **must** define two more configuration options: `bundle_key` and `bundle`. Similar to `value_key` and `source`, these extra options will become another *condition* in the query looking for the entities. `bundle_key` is the *name* of the entity property that stores which bundle the entity belongs to. `bundle` contains the *value* of the bundle used to restrict the search. The terminology is a bit confusing, but it boils down to the following. It is possible that the same value exists in multiple bundles of the same entity. So, you must pick one bundle where the lookup operation will be performed. In the case of the taxonomy term entity, the bundles are the *vocabularies*. Which vocabulary a term belongs to is associated in the `vid` entity property. In the example, that is `tags`. Let's consider an example term of "Apple". So, this plugin will instruct Drupal to search for a taxonomy term whose `name` (term name) is "Apple" that belongs to the "tags" `vid` (vocabulary).
What happens if the plugin does not find an entity matching the conditions? It will create one on the fly! It will use the value from the *source* configuration or from the *process* pipeline. This value will be used to assign the `value_key` *entity property* for the newly created entity. The entity will be created in the proper bundle as specified by the `bundle_key` and `bundle` configuration options. In the example, the terms will be created in the `tags` vocabulary. It is important to note that values are trimmed to remove whispaces at the start and end of the name. Otherwise, if your source contains spaces after the commas that separate elements, you might end up with terms that seem duplicated like "Apple" and " Apple".
## More configuration options
Both `entity_lookup` and `entity_generate` share the previous configuration options. Additionally, the following options are only available:
`ignore_case` contains a boolean value to indicate if the query should be case sensitive or not. It defaults to true.
`access_check` contains a boolean value to indicate if the system should check whether the user has access to the entity. It defaults to true.
`values` and `default_values` apply only to the `entity_generate` plugin. You can use them to set fields that could exist in the destination entity. An [example configuration](https://git.drupalcode.org/project/migrate_plus/blob/HEAD/src/Plugin/migrate/process/EntityGenerate.php) is included in the code for the plugin.
One interesting fact about these plugins is that none of the configuration options is required. The `source` can be skipped if the value comes from the process pipeline. The rest of the configuration options can be inferred by code introspection. This has some restrictions and assumptions. For example, if you are migrating nodes, the code introspection requires the `type` *node property* defined in the *process* section. If you do not set one because you define a `default_bundle` in the *destination* section, an error will be produced. Similarly, for entity reference fields it is assumed they point to one bundle only. Otherwise, the system cannot guess which bundle to lookup and an error will be produced. Therefore, always set the `entity_type` and `value_key` configurations. And for entity types that have bundles, `bundle_key` and `bundle` must be set as well.
*Note*: There are various open issues contemplating changes to the configuration options. See [this issue](https://www.drupal.org/node/2787219) and the related ones to keep up to date with any future change.
## Compromises and limitations
The `entity_lookup` and `entity_generate` plugins violate some ETL principles. For example, they query the *destination* system from the *process* section. And in the case of `entity_generate` it even creates entities from the process section. Ideally, each phase of the [ETL process](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) is self contained. That being said, there are valid uses cases to use these plugins and they can you save time when their functionality is needed.
An important limitation of the `entity_generate` plugin is that it is not able to clean after itself. That is, if you rollback the migration that calls this plugin, any created entity will remain in the system. This would leave data that is potentially invalid or otherwise never used in Drupal. Those values could leak into the user interface like in autocomplete fields. Ideally, rolling back a migration should delete any data that was created with it.
The recommended way to maintain relationships among entities in a migration project is to have [multiple migrations](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal). Then, you use the `migration_lookup` plugin to relate them. Throughout [the series](https://understanddrupal.com/migrations), several examples have been presented. For example, [this article](https://understanddrupal.com/articles/migrating-taxonomy-terms-and-multivalue-fields-drupal) shows how to do taxonomy term migrations.
What did you learn in today's blog post? Did you know how to configure these plugins for entities that do not have bundles? Did you know that reverting a migration does not delete entities created by the `entity_generate` plugin? Did you know you can assign fields in the generated entity? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

120
27.txt Normal file
View file

@ -0,0 +1,120 @@
# How to debug Drupal migrations? - Part 1
Throughout [the series](https://understanddrupal.com/migrations) we have showed [many examples](https://github.com/dinarcon/ud_migrations). I do not recall any of them working on the first try. When working on Drupal migrations, it is often the case that things do not work right away. Today's article is the first of a two part series on [debugging Drupal migrations](https://www.drupal.org/docs/8/api/migrate-api/debugging-migrations). We start giving some recommendations of things to do before diving deep into debugging. Then, we are going to talk about migrate messages and presented the [log](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Log.php/class/Log) process plugin. Let's get started.
## Minimizing the surface for errors
The Migrate API is a very powerful [ETL framework](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) that interacts with many systems provided by Drupal core and contributed modules. This adds layers of abstraction that can make the debugging process more complicated compared to other systems. For instance, if something fails with a [remote JSON migration](https://understanddrupal.com/articles/adding-http-request-headers-and-authentication-remote-json-and-xml-drupal-migrations), the error might be produced in the [Migrate API](https://www.drupal.org/docs/8/api/migrate-api), the [Entity API](https://www.drupal.org/docs/8/api/entity-api), the [Migrate Plus module](https://www.drupal.org/project/migrate_plus), the [Migrate Tools module](https://www.drupal.org/project/migrate_tools), or even the [Guzzle HTTP Client library](https://github.com/guzzle/guzzle) that fetches the file. For a more concrete example, while working on a [recent article](https://understanddrupal.com/articles/executing-drupal-migrations-user-interface-migrate-tools), I stumbled upon an [issue](https://www.drupal.org/project/migrate_source_csv/issues/3068017#comment-13237532) that involved three modules. The problem was that when trying to rollback a [CSV migration](https://understanddrupal.com/articles/migrating-csv-files-drupal) from the user interface an exception will be thrown making the operation fail. This is related to [an issue in the core Migrate API](https://www.drupal.org/project/drupal/issues/3012001) that manifests itself when rollback operations are initiated from [the interface provided by Migrate Plus](https://understanddrupal.com/articles/executing-drupal-migrations-user-interface-migrate-tools). Then, the issue causes [a condition](https://git.drupalcode.org/project/migrate_source_csv/blob/8.x-3.x/src/Plugin/migrate/source/CSV.php#L108) in the Migrate Source CSV plugin that fails and the exception is thrown.
In general, you should aim to minimize the surface for errors. One way to do this by starting the migration with the minimum possible set up. For example, if you are going to migrate nodes, start by configuring the source plugin, one field (the title), and the destination. When that works, keep migrating one field at a time. If the field has multiple [subfields](https://understanddrupal.com/articles/migrating-data-drupal-subfields), you can even migrate one subfield at a time. Commit every progress to version control so you can go back to a working state if things go wrong. Read [this article](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow) for more recommendations on writing migrations.
## What to check first?
Debugging is a process that might involve many steps. There are a few things that you should check before diving too deep into trying to find the root of the problem. Let's begin with making sure that changes to your migrations are properly detected by the system. One common question I see people ask is where to place the migration definition files. Should they go in the `migrations` or `config/install` directory of your custom module? The answer to this is whether you want to manage your migrations as [code](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow) or [configuration](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module). Your choice will determine the workflow to follow for changes in the migration files to take effect. Migrations managed in *code* go in the `migrations` directory and require rebuilding caches for changes to take effect. On the other hand, migrations managed in configuration are placed in the `config/install` directory and require *configuration* synchronization for changes to take effect. So, make sure to follow the right workflow.
After verifying that your changes are being applied, the next thing to do is verify that *the modules that provide your plugins are enabled* and the plugins themselves are *properly configured*. Look for *typos* in the configuration options. Always refer to the [official documentation](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/list-of-core-migrate-process-plugins) to know which options are available and find the proper spelling of them. Other places to look at is the code for the plugin definition or articles like the ones in [this series](https://understanddrupal.com/migrations) documenting how to use them. Things to keep in mind include proper indentation of the configuration options. An extra whitespace or a wrong indentation level can break the migration. You can either get a fatal error or the migration can fail silently without producing the expected results. Something else to be mindful is the version of the modules you are using because the configuration options might change per version. For example, the newly released `8.x-3.x` branch of [Migrate Source CSV](https://www.drupal.org/list-changes/migrate_source_csv) changed various configuration options as described in [this change record](https://www.drupal.org/node/3060246). And the `8.x-5.x` branch of [Migrate Plus](https://www.drupal.org/list-changes/migrate_plus) changed some configurations for plugin related with DOM manipulation as described in [this change record](https://www.drupal.org/node/3062058). Keeping an eye on the issue queue and change records for the different modules you use is always a good idea.
If the problem persists, look for reports of similar problems in the issue queue. Make sure to include closed issues as well in case your problem has been fixed or documented already. Remember that a problem in a module can affect a different module. Keeping an eye on the issue queue and change records for all the modules you use is always a good idea. Another place ask questions is the [#migrate](https://drupal.slack.com/messages/C226VLXBP/) channel in [Drupal slack](https://www.drupal.org/slack). The support that is offered there is fantastic.
## Migration messages
If nothing else has worked, it is time to investigate what is going wrong. In case the migration outputs an error or a stacktrace to the terminal, you can use that to search in the code base where the problem might originate. But if there is no output or if the output is not useful, the next thing to do is check the **migration messages**.
The Migrate API allows plugins to log messages to the database in case an error occurs. Not every plugin leverages this functionality, but it is always worth checking if a plugin in your migration wrote messages that could give you a hint of what went wrong. Some plugins like [skip_on_empty](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21SkipOnEmpty.php/class/SkipOnEmpty) and [skip_row_if_not_set](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21SkipRowIfNotSet.php/class/SkipRowIfNotSet) even expose a configuration option to specify messages to log. To check the *migration messages* use the following Drush command: `drush migrate:messages [migration_id]`. If you are managing migrations as *configuration*, the interface provided by Migrate Plus also exposes them.
Messages are logged separately per migration, even if you run multiple migrations at once. This could happen if you [execute dependencies](https://understanddrupal.com/articles/introduction-migration-dependencies-drupal) or use [groups](https://understanddrupal.com/articles/using-migration-groups-share-configuration-among-drupal-migrations) or [tags](https://understanddrupal.com/articles/what-difference-between-migration-tags-and-migration-groups-drupal). In those cases, errors might be produced in more than one migration. You will have to look at the messages for each of them individually.
Let's consider the following example. In the source there is a field called `src_decimal_number` with values like `3.1415`, `2.7182`, and `1.4142`. It is needed to separate the number into two components: the integer part (`3`) and the decimal part (`1415`). For this, we are going to use the [extract](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Extract.php/class/Extract) process plugin. Errors will be purposely introduced to demonstrate the workflow to check messages and update migrations. The following example shows the process plugin configuration and the output produced by trying to import the migration:
```yaml
# Source values: 3.1415, 2.7182, and 1.4142
psf_number_components:
plugin: explode
source: src_decimal_number
```
```
<code class="language-bash">$ drush mim ud_migrations_debug
[notice] Processed 3 items (0 created, 0 updated, 3 failed, 0 ignored) - done with 'ud_migrations_debug'
In MigrateToolsCommands.php line 811:
ud_migrations_debug Migration - 3 failed.
```
The error produced in the console does not say much. Let's see if any messages were logged using: `drush migrate:messages ud_migrations_debug`. In the previous example, the messages will look like this:
```
------------------- ------- --------------------
Source IDs Hash Level Message
------------------- ------- --------------------
7ad742e...732e755 1 delimiter is empty
2d3ec2b...5e53703 1 delimiter is empty
12a042f...1432a5f 1 delimiter is empty
------------------------------------------------
```
In this case, the migration messages are good enough t o let us know what is wrong. The required `delimiter` configuration option was not set. When an error occurs, usually you need to perform at least three steps:
- Rollback the migration. This will also clear the messages.
- Make changes to definition file and make they are applied. This will depend on whether you are managing the migrations as *code* or *configuration*.
- Import the migration again.
Let's say we performed these steps, but we got an error again. The following snippet shows the updated plugin configuration and the messages that were logged:
```yaml
psf_number_components:
plugin: explode
source: src_decimal_number
delimiter: '.'
```
```
------------------- ------- ------------------------------------
Source IDs Hash Level Message
------------------- ------- ------------------------------------
7ad742e...732e755 1 3.1415000000000002 is not a string
2d3ec2b...5e53703 1 2.7181999999999999 is not a string
12a042f...1432a5f 1 1.4141999999999999 is not a string
----------------------------------------------------------------
```
The new error occurs because the explode operation works on strings, but we are providing numbers. One way to fix this is to update the *source* to add quotes around the number so it is treated as a string. This is of course not ideal and many times not even possible. A better way to make it work is setting the `strict` option to `false` in the plugin configuration. This will make sure to cast the input value to a string before applying the *explode* operation. This demonstrates the importance of reading the [plugin documentation](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Explode.php/class/Explode) to know which options are at your disposal. Of course, you can also have a look at the plugin code to see how it works.
*Note*: Sometimes the error produces an non-recoverable condition. The migration can be left in a status of "Importing" or "Reverting". Refer to [this article](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow) to learn how to fix this condition.
## The log process plugin
In the example, adding the extra configuration option will make the import operation finish without errors. But, how can you be sure the expected values are being produced? *Not getting an error does not necessarily mean that the migration works as expected*. It is possible that the transformations being applied do not yield the values we think or the format that Drupal expects. This is particularly true if you have complex [process plugin chains](https://understanddrupal.com/articles/using-process-plugins-data-transformation-drupal-migrations). As a reminder, we want to separate a decimal number from the source like `3.1415` into its components: `3` and `1415`.
The [log](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Log.php/class/Log) process plugin can be used for checking the outcome of plugin transformations. This plugin offered by the core Migrate API does two things. First, it logs the value it receives to the messages table. Second, the value is returned unchanged so that it can be used in process chains. The following snippets show how to use the `log` plugin and what is stored in the *messages* table:
```yaml
psf_number_components:
- plugin: explode
source: src_decimal_number
delimiter: '.'
strict: false
- plugin: log
```
```
------------------- ------- --------
Source IDs Hash Level Message
------------------- ------- --------
7ad742e...732e755 1 3
7ad742e...732e755 1 1415
2d3ec2b...5e53703 1 2
2d3ec2b...5e53703 1 7182
12a042f...1432a5f 1 1
2d3ec2b...5e53703 1 4142
------------------------------------
```
Because the explode plugin produces an array, each of the elements is logged individually. And sure enough, in the output you can see the numbers being separated as expected.
The `log` plugin can be used to verify that source values are being read properly and process plugin chains produce the expected results. Use it as part of your debugging strategy, but make sure to remove it when done with the verifications. It makes the migration to run slower because it has to write to the database. The overhead is not needed once you verify things are working as expected.
In the next article, we are going to cover the Migrate Devel module, the `debug` process plugin, recommendations for using a proper debugger like XDebug, and the `migrate:fields-source` Drush command.
What did you learn in today's blog post? What workflow do you follow to debug a migration issue? Have you ever used the `log` process plugin for debugging purposes? If so, how did it help to solve the issue? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

225
28.txt Normal file
View file

@ -0,0 +1,225 @@
# How to debug Drupal migrations? - Part 2
In the [previous article](https://understanddrupal.com/articles/how-debug-drupal-migrations-part-1) we began talking about debugging Drupal migrations. We gave some recommendations of things to do before diving deep into debugging. We also introduced the [log](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Log.php/class/Log) process plugin. Today, we are going to show how to use the [Migrate Devel module](https://www.drupal.org/project/migrate_devel) and the [debug](https://git.drupalcode.org/project/migrate_devel/blob/8.x-1.x/src/Plugin/migrate/process/Debug.php) process plugin. Then we will give some guidelines on using a real debugger like [XDebug](https://xdebug.org/). Next, we will share tips so you get used to migration errors. Finally, we are going to briefly talk about the `migrate:fields-source` Drush command. Let's get started.
## The migrate_devel module
The Migrate Devel module is very helpful for debugging migrations. It allows you to visualize the data as it is received from the *source*, the result of field transformation in the [process pipeline](https://understanddrupal.com/articles/using-constants-and-pseudofields-data-placeholders-drupal-migration-process-pipeline), and values that are stored in the *destination*. It works by adding extra options to Drush commands. When these options are used, you will see more output in the terminal with details on how rows are being processed.
As of this writing, you will need to [apply a patch](https://www.drupal.org/patch/apply) to use this module. Migrate Devel was originally written for Drush 8 which is [still supported, but no longer recommended](https://www.drush.org/install/#drupal-compatibility). Instead, you should use at least version 9 of Drush. Between 8 and 9 there were major changes in Drush internals.  Commands need to be updated to work with the new version. Unfortunately, the Migrate Devel module is not fully compatible with Drush 9 yet. Most of the benefits listed in the project page have not been ported. For instance, automatically reverting the migrations and applying the changes to the migration files is not yet available. The partial support is still useful and to get it you need to apply the patch from [this issue](https://www.drupal.org/node/2938677). If you are using the Drush commands provided by Migrate Plus, you will also want to apply [this patch](https://www.drupal.org/node/3024399). If you are using the [Drupal composer template](https://github.com/drupal-composer/drupal-project), you can add this to your composer.json to apply both patches:
```json
"extra": {
"patches": {
"drupal/migrate_devel": {
"drush 9 support": "https://www.drupal.org/files/issues/2018-10-08/migrate_devel-drush9-2938677-6.patch"
},
"drupal/migrate_tools": {
"--limit option": "https://www.drupal.org/files/issues/2019-08-19/3024399-55.patch"
}
}
}
```
With the patchs applied and the modules installed, you will get two new command line options for the `migrate:import` command: `--migrate-debug` and `--migrate-debug-pre`. The major difference between them is that the latter runs before the destination is saved. Therefore, `--migrate-debug-pre` does not provide debug information of the *destination*.
Using any of the flags will produce a lot of debug information for *each row being processed*. Many time sanalyzing a subset of the records is enough to stop potential issues. The patch to Migrate Tools will allow you to use the `--limit` and `--idlist` options with the `migrate:import` command to limit the number of elements to process.
To demonstrate the output generated by the module, let's use the image migration from the [CSV source example](https://understanddrupal.com/articles/migrating-csv-files-drupal). You can get the code at <https://github.com/dinarcon/ud_migrations>. The following snippets how to execute the import command with the extra debugging options and the resulting output:
```console
# Import only one element.
$ drush migrate:import udm_csv_source_image --migrate-debug --limit=1
# Use the row's unique identifier to limit which element to import.
$ drush migrate:import udm_csv_source_image --migrate-debug --idlist="P01"
```
```console
$ drush migrate:import udm_csv_source_image --migrate-debug --limit=1
┌──────────────────────────────────────────────────────────────────────────────┐
│ $Source │
└──────────────────────────────────────────────────────────────────────────────┘
array (10) [
'photo_id' => string (3) "P01"
'photo_url' => string (74) "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg"
'path' => string (76) "modules/custom/ud_migrations/ud_migrations_csv_source/sources/udm_photos.csv"
'ids' => array (1) [
string (8) "photo_id"
]
'header_offset' => NULL
'fields' => array (2) [
array (2) [
'name' => string (8) "photo_id"
'label' => string (8) "Photo ID"
]
array (2) [
'name' => string (9) "photo_url"
'label' => string (9) "Photo URL"
]
]
'delimiter' => string (1) ","
'enclosure' => string (1) """
'escape' => string (1) "\"
'plugin' => string (3) "csv"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│ $Destination │
└──────────────────────────────────────────────────────────────────────────────┘
array (4) [
'psf_destination_filename' => string (25) "picture-15-1421176712.jpg"
'psf_destination_full_path' => string (25) "picture-15-1421176712.jpg"
'psf_source_image_path' => string (74) "https://agaric.coop/sites/default/files/pictures/picture-15-1421176712.jpg"
'uri' => string (29) "./picture-15-1421176712_6.jpg"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│ $DestinationIDValues │
└──────────────────────────────────────────────────────────────────────────────┘
array (1) [
string (1) "3"
]
════════════════════════════════════════════════════════════════════════════════
Called from +56 /var/www/drupalvm/drupal/web/modules/contrib/migrate_devel/src/EventSubscriber/MigrationEventSubscriber.php
[notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_csv_source_image'
```
In the terminal you can see the data as it is passed along in the Migrate API. In the `$Source`, you can see how the *source* plugin was configured and the different columns for the row being processed. In the `$Destination`, you can see all the fields that were mapped in the *process* section and their values after executing all the process plugin transformation. In `$DestinationIDValues`, you can see the *unique identifier* of the *destination* entity that was created. This migration created an *image* so the destination array has only one element: the file ID (`fid`). For paragraphs, which are revisioned entities, you will get two values: the `id` and the `revision_id`. The following snippet shows the `$Destination` and  `$DestinationIDValues` sections for the *paragraph* migration in the same example module:
```console
$ drush migrate:import udm_csv_source_paragraph --migrate-debug --limit=1
┌──────────────────────────────────────────────────────────────────────────────┐
│ $Source │
└──────────────────────────────────────────────────────────────────────────────┘
Omitted.
┌──────────────────────────────────────────────────────────────────────────────┐
│ $Destination │
└──────────────────────────────────────────────────────────────────────────────┘
array (3) [
'field_ud_book_paragraph_title' => string (32) "The definitive guide to Drupal 7"
'field_ud_book_paragraph_author' => string UTF-8 (24) "Benjamin Melançon et al."
'type' => string (17) "ud_book_paragraph"
]
┌──────────────────────────────────────────────────────────────────────────────┐
│ $DestinationIDValues │
└──────────────────────────────────────────────────────────────────────────────┘
array (2) [
'id' => string (1) "3"
'revision_id' => string (1) "7"
]
════════════════════════════════════════════════════════════════════════════════
Called from +56 /var/www/drupalvm/drupal/web/modules/contrib/migrate_devel/src/EventSubscriber/MigrationEventSubscriber.php
[notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_csv_source_paragraph'
```
## The debug process plugin
The Migrate Devel module also provides a new process plugin called [debug](https://git.drupalcode.org/project/migrate_devel/blob/8.x-1.x/src/Plugin/migrate/process/Debug.php). The plugin works by printing the value it receives to the terminal. As [Benji Fisher](https://www.drupal.org/u/benjifisher) explains in [this issue](https://www.drupal.org/node/3021648), the `debug` plugin offers the following advantages over the `log` plugin provided by the core Migrate API:
- The use of [print_r](https://www.php.net/manual/en/function.print-r.php)() handles both arrays and scalar values gracefully.
- It is easy to differentiate debugging code that should be removed from logging plugin configuration that should stay.
- It saves time as there is no need to run the `migrate:messages` command to read the logged values.
In short, you can use the `debug` plugin in place of `log`. There is a particular case where using `debug` is really useful. If used in between of a process plugin chain, you can see how elements are being transformed in each step. The following snippet shows an example of this setup and the output it produces:
```yaml
field_tags:
- plugin: skip_on_empty
source: src_fruit_list
method: process
message: 'No fruit_list listed.'
- plugin: debug
label: 'Step 1: Value received from the source plugin: '
- plugin: explode
delimiter: ','
- plugin: debug
label: 'Step 2: Exploded taxonomy term names '
multiple: true
- plugin: callback
callable: trim
- plugin: debug
label: 'Step 3: Trimmed taxonomy term names '
- plugin: entity_generate
entity_type: taxonomy_term
value_key: name
bundle_key: vid
bundle: tags
- plugin: debug
label: 'Step 4: Generated taxonomy term IDs '
```
```console
$ drush migrate:import udm_config_entity_lookup_entity_generate_node --limit=1
Step 1: Value received from the source plugin: Apple, Pear, Banana
Step 2: Exploded taxonomy term names Array
(
[0] => Apple
[1] => Pear
[2] => Banana
)
Step 3: Trimmed taxonomy term names Array
(
[0] => Apple
[1] => Pear
[2] => Banana
)
Step 4: Generated taxonomy term IDs Array
(
[0] => 2
[1] => 3
[2] => 7
)
[notice] Processed 1 item (1 created, 0 updated, 0 failed, 0 ignored) - done with 'udm_config_entity_lookup_entity_generate_node'
```
The process pipeline is part of the *node* migration from the [entity_generate plugin example](https://understanddrupal.com/articles/understanding-entitylookup-and-entitygenerate-process-plugins-migrate-tools). In the code snippet, a `debug` step is added after each plugin in the chain. That way, you can verify that the transformations are happening as expected. In the last step you get an array of the taxonomy term IDs (`tid`) that will be associated to the `field_tags` field. Note that this plugin accepts two optional parameters:
- `label` is a string to print before the debug output. It can be used to give context of what is being printed.
- `multiple` is a boolean that when set to `true` signals the next plugin in the pipeline to process each element of an array individually. The functionality is similar to the [multiple_values](https://cgit.drupalcode.org/migrate_plus/tree/src/Plugin/migrate/process/MultipleValues.php) plugin provided by Migrate Plus.
## Using the right tool for the job: a debugger
Many migration issues can be solved by following the recommendations from the previous article and the tools provided by Migrate Devel. But there are problems so complex that you need a full blown debugger. The many layers of abstraction in Drupal, and the fact that multiple modules might be involved a single migration, makes the use of debuggers very appealing. With them, you can step through each line of code across multiple files and see how each variables changes over time.
In the next article we will explain how to configure XDebug to work with PHPStorm and [DrupalVM](https://www.drupalvm.com/). For now, let's consider where are good places to add breakpoints. In [this article](https://www.mtech-llc.com/blog/lucas-hedding/troubleshooting-drupal-8-migration), [Lucas Hedding](https://www.drupal.org/u/heddn) recommends adding them in:
- The `import` method of the [MigrateExecutable](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/modules/migrate/src/MigrateExecutable.php) class.
- The `processRow` method of the MigrateExecutable class.
- The process plugin if you know which one might be causing an issue. The `transform` method is a good place to set the breakpoint.
The use of a debugger is no guarantee that you will find the solution to your issue. It will depend on many factors including your familiarity with the system and how deep lies the problem. Previous debugging experience, even if not directly related to migrations, will help a lot. Do not get discouraged if it takes you too much time to discover what is causing the problem or if you cannot find it at all. Each time you will get a better understanding of the system.
[Adam Globus-Hoenich](https://www.drupal.org/u/phenaproxima), a migrate maintainer, once told me that the Migrate API "is impossible to understand for people that are not migrate maintainers." That was after spending about an hour together trying to debug an issue and failing to make it work. I mention this not with the intention to discourage you. But to illustrate that no single person knows everything about the Migrate API and even their maintainers can have a hard time debugging issues. Personally, I have spent countless hours in the debugger tracking how the data flows from the source to the destination entities. It is mind blowing and I barely understand what is going on. The community has come together to produce a fantastic piece of software. Anyone who uses the Migrate API is standing on the shoulders of giants.
## If it is not broken, break it on purpose
One of the best ways to reduce the time you spend debugging an issue is having experience with a similar problem. A great way to learn to learn is finding a working example and breaking it on purpose. This will let you get familiar with the requirements and assumptions made by the system and the errors it produces.
Throughout [the series](https://understanddrupal.com/migrations), we have created [many examples](https://github.com/dinarcon/ud_migrations). We have made our best effort to explain how each example work. But we were not able to document every detail in the articles. In part to keep them within a reasonable length. But also, because we do not fully comprehend the system. In any case, we highly encourage you to take the examples and break them in every imaginable way. Making one change at a time, see how the migration behaves and what errors are produced. These are some things to try:
- Do not leave a space after a colon (**:**) when setting a configuration option. Example: `id:this_is_going_to_be_fun`.
- Change the indentation of plugin definitions.
- Try to use a plugin provided by a contributed module that is not enabled.
- Do not set a required plugin configuration option.
- Leave out a full section like source, process, or destination.
- Mix the upper and lowercase letters in configuration options, variables, pseudofields, etc.
- Try to convert a migration managed as code to configuration; and vice versa.
## The migrate:fields-source Drush command
Before wrapping up the discussion on debugging migrations, let's quicky cover the `migrate:fields-source` Drush command. It lists all the fields available in the *source* that can be used later in the *process* section. Many source plugins require that you manually set the list of fields to fetch from the source. Because of this, the information provided by this command is redundant most of the time. However, it is particularly useful with CSV source migrations. The CSV plugin automatically includes all the columns in the file. Executing this command will let you know which columns are available. For example, running `drush migrate:fields-source udm_csv_source_node` produces the following output in the terminal:
```
$ drush migrate:fields-source udm_csv_source_node
-------------- -------------
Machine Name Description
-------------- -------------
unique_id unique_id
name name
photo_file photo_file
book_ref book_ref
-------------- -------------
```
The migration is part of the [CSV source example](https://understanddrupal.com/articles/migrating-csv-files-drupal). By running the command you can see that the file contains four columns. The values under "Machine Name" are the ones you are going to use for field mappings in the *process* section. The Drush command has a `--format` option that lets you change the format of the output. Execute `drush migrate:fields-source --help` to get a list of valid formats.
What did you learn in today's blog post? Have you ever used the migrate devel module for debugging purposes? What is your strategy when using a debugger like XDebug? Any debugging tips that have been useful to you? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

91
29.txt Normal file
View file

@ -0,0 +1,91 @@
# How to configure XDebug, PHPStorm, and DrupalVM to debug Drupal migrations via Drush commands and the browser?
In recent articles, we have presented some recommendations and tools to debug Drupal migrations. Using a proper debugger is definitely the best way to debug Drupal be it migrations or other substems. In today's article we are going to learn how to configure [XDebug](https://xdebug.org/) inside [DrupalVM](https://www.drupalvm.com/) to connect to PHPStorm. First, via the command line using Drush commands. And then, via the user interface using a browser. Let's get started.
**Important**: User interfaces tend to change. Screenshots and referenced on-screen text might be differ in new versions of the different tools. They can also vary per operating system. This article uses menu items from Linux. Refer the the [official DrupalVM documentation](http://docs.drupalvm.com/en/latest/) for detailed [installation](http://docs.drupalvm.com/en/latest/getting-started/installation-linux/) and [configuration](http://docs.drupalvm.com/en/latest/getting-started/configure-drupalvm/) instructions. For this article, it is assumed that [VirtualBox](https://www.virtualbox.org/), [Vagrant](https://www.vagrantup.com/), and [Ansible](https://docs.ansible.com/ansible/latest/index.html) are already installed. If you need help with those, refer to the DrupalVM's installation guide.
## Getting DrupalVM
First get a copy of DrupalVM by cloning the repository or downloading a ZIP or TAR.GZ file from the [available releases](https://github.com/geerlingguy/drupal-vm/releases). If you downloaded a compressed file, expand it to have access to the configuration files. Before creating the virtual machine, make a copy of `default.config.yml` into a new file named `config.yml`. The latter will be used by DrupalVM to configure the virtual machine (VM). In this file make the following changes:
```yaml
# config.yml file
# Based off default.config.yml
vagrant_hostname: migratedebug.test
vagrant_machine_name: migratedebug
# For dynamic IP assignment the 'vagrant-auto_network' plugin is required.
# Otherwise, use an IP address that has not been used by any other virtual machine.
vagrant_ip: 0.0.0.0
# All the other extra packages can remain enabled.
# Make sure the following three get installed by uncommenting them.
installed_extras:
- drupalconsole
- drush
- xdebug
php_xdebug_default_enable: 1
php_xdebug_cli_disable: no
```
The `vagrant_hostname` is the URL you will enter in your browser's address bar to access the Drupal installation. Set `vagrant_ip` to an IP that has not been taken by another virtual machine. If you are unsure, you can set the value to `0.0.0.0` and install the `vagrant-auto_network` plugin. The plugin will make sure that an available IP is assigned to the VM. In the `installed_extras` section, uncomment `xdebug` and `drupalconsole`. [Drupal Console](https://drupalconsole.com/) is not necessary for getting XDebug to work, but it offers many code introspection tools that are very useful for Drupal debugging in general. Finally, set `php_xdebug_default_enable` to `1` and `php_xdebug_cli_disable` to `no`. These last two settings are very important for being able to debug Drush commands.
Then, open a terminal and change directory to where the DrupalVM files are located. Keep the terminal open are going to execute various commands from there. Start the virtual machine by executing `vagrant up`. If you had already created the VM, you can still make changes to the `config.yml` file and then reprovision. If the virtual machine is running, execute the command: `vagrant provision`. Otherwise, you can start and reprovision the VM in a single command: `vagrant up --provision`. Finally, SSH into the VM executing `vagrant ssh`.
By default, DrupalVM will use the [Drupal composer template project](https://github.com/drupal-composer/drupal-project) to get a copy of Drupal. That means that you will be managing your module and theme dependencies using [composer](https://getcomposer.org/). When you SSH into the virtual machine, you will be in the `/var/www/drupalvm/drupal/web` directory. That is Drupal's docroot. The composer file that manages the installation is actually one directory up. Normally, if you run a composer command from a directory that does not have a composer.json file, composer will try to find one up in the directory hierarchy. Feel free to manually go one directory up or rely on composer's default behaviour to locate the file.
For good measure, let's install some contributed modules. Inside the virtual machine, in Drupal's docroot, execute the following command: `composer require drupal/migrate_plus drupal/migrate_tools`. You can also create directory in `/var/www/drupalvm/drupal/web/modules/custom` and place the custom module we have been working on throughout [the series](https://understanddrupal.com/migrations). You can get it at <https://github.com/dinarcon/ud_migrations>.
To make sure things are working, let's enable one example modules by executing: `drush pm-enable ud_migrations_config_entity_lookup_entity_generate`. This module comes with one migration: `udm_config_entity_lookup_entity_generate_node`. If you execute `drush migrate:status` the example migration should be listed.
## Configuring PHPStorm
With Drupal already installed and *the virtual machine running*, let's configure PHPStorm. Start a new project pointing to the DrupalVM files. Feel free to follow your preferred approach to project creation. For reference, one way to do it is by going to "Files > Create New Project from Existing Files". In the dialog, select "Source files are in a local directory, no web server is configured yet." and click next. Look for the DrupalVM directory, click on it, click on "Project Root", and then "Finish". PHPStorm will begin indexing the files and detect that it is Drupal project. It will prompt you to enable the Drupal coding standards, indicate which directory contains the installation path, and if you want to set PHP include paths. All of that is optional but recommended, especially if you want to use this VM for long term development.
Now the important part. Go to "Files > Settings > Language and Frameworks > PHP". In the panel, there is a text box labeled "CLI Interpreter". In the right end, there is a button with three dots like an ellipsis (**...**). The next step requires that the *virtual machine is running* because PHPStorm will try to connect to it. After verifying that it is the case, click the plus (**+**) button at the top left corner to add a CLI Interpreter. From the list that appears, select "From Docker, Vagrant, VM, Remote...". In the "Configure Remote PHP Interpreter" dialog select "Vagrant". PHPStorm will detect the SSH connection to connect to the virtual machine. Click "OK" to close the multiple dialog boxes. When you go back to the "Languages & Frameworks" dialog, you can set the "PHP language level" to match the same version from the Remote CLI Interpreter.
![Remote PHP interpreter](https://understanddrupal.com/sites/default/files/inline-images/remote_php_interpreter.png)
![CLI interpreters](https://understanddrupal.com/sites/default/files/inline-images/cli_interpreters.png)
You are almost ready to start debugging. There are a few things pending to do. First, let's create a breakpoint in the `import` method of the [MigrateExecutable](https://git.drupalcode.org/project/drupal/blob/8.8.x/core/modules/migrate/src/MigrateExecutable.php) class. You can go to "Navigate > Class" to the project by class name. Or click around in the Project structure until you find the class. It is located at `./drupal/web/core/modules/migrate/src/MigrateExecutable.php` in the VM directory. You can add a breakpoint by clicking on the bar to the left of the code area. A red circle will appear indicating that the breakpoint has been added.
Then, you need to instruct PHPStorm to listen for debugging connections. For this, click on "Run > Start Listening for PHP Debugging Connections". Finally, you have to set some server mappings. For this you will need the IP address of the virtual machine. If you configured the VM to assign the IP dynamically, you can skip this step momentarily. PHPStorm will detect the incoming connection, create a server with the proper IP, and then you can set the path mappings.
## Triggering the breakpoint
Let's switch back to the terminal. If you are not inside the virtual machine, you can SSH into the VM executing `vagrant ssh`. Then, execute the following command (everything in one line):
XDEBUG_CONFIG="idekey=PHPSTORM" /var/www/drupalvm/drupal/vendor/bin/drush migrate:import udm_config_entity_lookup_entity_generate_node
For the breakpoint to be triggered, the following needs to happen:
- *You must execute Drush from the `vendor` directory*. DrupalVM has a globally available Drush binary located at `/usr/local/bin/drush`. That is **not** the one to use. For debugging purposes, **always** execute Drush from the `vendor` directory.
- The command needs to have `XDEBUG_CONFIG` environment variable set to "idekey=PHPSTORM". There are many ways to accomplish this, but prepending the variable as shown in the example is a valid way to do it.
- Because the breakpoint was set in the import method, we need to execute an import command to stop at the breakpoint. The migration in the example Drush command is part of the example module that was enabled earlier.
When the command is executed, a dialog will appear in PHPStorm. In it, you will be asked to select a project or a file to debug. Accept what is selected by default for now.  By accepting the prompt a new server will be configured using the proper IP of the virtual machine.  After doing so, go to "Files > Settings > Language and Frameworks > PHP > Servers". You should see one already created. Make sure the "Use path mappings" option is selected. Then, look for the direct child of "Project files". It should be the directory in your host computer where the VM files are located. In that row, set the "Absolute path on the server" column to  `/var/www/drupalvm`. You can delete any other path mapping. There should only be one from the previous prompt. Now, click "OK" in the dialog to accept the changes.
![Incoming drush connection](https://understanddrupal.com/sites/default/files/inline-images/incoming_drush_connection.png)
![Path mappings](https://understanddrupal.com/sites/default/files/inline-images/path_mappings.png)
Finally, run the Drush command from inside the virtual machine once more. This time the program execution should stop at the breakpoint. You can use the Debug panel to step over each line of code and see how the variables change over time. Feel free to add more breakpoints as needed. In the [previous article](https://understanddrupal.com/articles/how-debug-drupal-migrations-part-2) there are some suggestions about that. When you are done, let PHPStorm know that it should no longer listen for connections. For that, click on "Run > Stop Listening for PHP Debugging Connections". And that is how you can debug Drush commands for Drupal migrations.
![Triggered breakpoint](https://understanddrupal.com/sites/default/files/inline-images/breakpoint.png)
## Debugging from the user interface
If you also want to be able to debug from the user interface, go to this URL and generate the bookmarlets for XDebug: <https://www.jetbrains.com/phpstorm/marklets/> The `IDE Key` should be `PHPSTORM`. When the bookmarlets are created, you can drag and drop them into your browser's bookmarks toolbar. Then, you can click on them to start and stop a debugging session. The IDE needs to be listening for incoming debugging connections as it was the case for Drush commands.
![PHPStorm bookmarlets generator](https://understanddrupal.com/sites/default/files/inline-images/bookmarlets.png)
*Note*: There are browser extensions that let you start and stop debugging sessions. Check the extensions repository of your browser to see which options are available.
Finally, set breakpoints as needed and go to a page that would trigger them. If you are following along with the example, you can go to <http://migratedebug.test/admin/structure/migrate/manage/default/migrations/udm_config_entity_lookup_entity_generate_node/execute> Once there, select the "Import" operation and click the "Execute" button. This should open a prompt in PHPStorm to select a project or a file to debug. Select the `index.php` located in Drupal's docroot. After accepting the connection a new server should be configured with the proper path mappings. At this point, you should hit the breakpoint again.
![Incoming web connection](https://understanddrupal.com/sites/default/files/inline-images/incoming_web_connection.png)
Happy debugging! :-)
What did you learn in today's blog post? Did you know how to debug Drush commands? Did you know how to trigger a debugging session from the browser? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

87
30.txt Normal file
View file

@ -0,0 +1,87 @@
# List of migration related Drupal modules
When one starts working with migrations, it is easy to be overwhelmed by so many modules providing migration functionality. Throughout [the series](https://understanddrupal.com/migrations), we presented many of them trying to cover module one at a time. This with the intention to help the reader understand when a particular module is truly needed and why. But we only scratched the surface. Today's article presents a list of migration related Drupal modules for quick reference. Let's get started.
## Core modules
At the time of this writing, Drupal core ships with four migration modules:
- **Migrate**: provides the base API for migrating data.
- **Migrate Drupal**: offers functionality to migrate from other Drupal installations. It serves as the foundation for upgrades from Drupal 6 and 7\. It also supports reading configuration entities from Drupal 8 sites.
- **Drupal Migrate UI**: provides a [user interface for upgrading](https://www.drupal.org/docs/8/upgrade/upgrade-using-web-browser) a Drupal 6 or 7 site to Drupal 8.
- **Migrate Drupal Multilingual**: is an experimental module required by multilingual translations. When they become stable, the module will be removed from Drupal core. See [this article](https://www.drupal.org/node/2959712) for more information.
## Migration runners
Once the migration definition files have been created, there are many options to execute them:
- [Migrate Tools](https://www.drupal.org/project/migrate_tools): provides [Drush](https://www.drush.org/) commands to [run migrations from the command line](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow). It also exposes a [user interface to run migrations](https://understanddrupal.com/articles/executing-drupal-migrations-user-interface-migrate-tools) created as [configuration entities](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module). It offers support for [migration groups](https://understanddrupal.com/articles/using-migration-groups-share-configuration-among-drupal-migrations) and [tags](https://understanddrupal.com/articles/what-difference-between-migration-tags-and-migration-groups-drupal). The module *depends* on [Migrate Plus](https://www.drupal.org/project/migrate_plus).
- [Migrate Run](https://www.drupal.org/project/migrate_run): provides Drush commands to run migrations from the command line. It *does not* offer support for migration groups, but tags are supported.  The module *does not* depend on Migrate Plus.
- [Migrate Upgrade](https://www.drupal.org/project/migrate_upgrade): provides [Drush support for upgrading](https://www.drupal.org/docs/8/upgrade/upgrade-using-drush) a Drupal 6 or 7 site to Drupal 8.
- [Migrate Manifest](https://www.drupal.org/project/migrate_manifest): provides a Drush command for running migrations using a manifest file. See [this article](https://www.drupal.org/node/2350651#s--running-specific-migrations-using-migrate-manifest) for an example of using this module for Drupal upgrades.
- [Migrate Scheduler](https://www.drupal.org/project/migrate_scheduler): integrates with Drupal Core's [Cron API](https://api.drupal.org/api/drupal/core!core.api.php/function/hook_cron/8.8.x) to execute migrations on predefined schedules.
- [Migrate Cron](https://www.drupal.org/project/migrate_cron): exposes a user interface to execute migrations when Cron is triggered. At the time of this writing, the module does not execute dependent migrations. Follow [this issue](https://www.drupal.org/project/migrate_cron/issues/3051619) for updates.
- [Migrate source UI](https://www.drupal.org/project/migrate_source_ui): provides a form for uploading files to use as source for already defined [CSV](https://understanddrupal.com/articles/migrating-csv-files-drupal), [JSON](https://understanddrupal.com/articles/migrating-json-files-drupal), and [XML](https://understanddrupal.com/articles/migrating-xml-files-drupal) migrations. At the time of this writing, it seems that JSON and XML migrations are not being detected. Follow [this issue](https://www.drupal.org/node/3076725) for updates.
## Source plugins
The Migrate API offers many options to fetch data from:
- **Migrate** (core module): provides the `SqlBase` abstract class to help with fetching data from a database connection. See [this article](https://www.drupal.org/docs/8/api/migrate-api/migrate-source-plugins/migrating-data-from-a-sql-source) for an example. It also exposes the `embedded_data` plugin which allows the source data to be defined inside the migration definition file. It was used extensively in the [example migrations](https://github.com/dinarcon/ud_migrations) of [this series](https://understanddrupal.com/migrations). It also offers the `empty` plugin which returns a row based on provided constants. It is used in multilingual migrations for entity references.
- [Migrate Plus](https://www.drupal.org/project/migrate_plus): combining various plugins, it allows fetching data in [JSON](https://understanddrupal.com/articles/migrating-json-files-drupal), [XML](https://understanddrupal.com/articles/migrating-xml-files-drupal), and SOAP formats. It also provides various plugins for parsing HTML. See [this article](https://isovera.com/blog/handling-html-with-drupals-migrate-api/) by [Benji Fisher](https://www.drupal.org/u/benjifisher) for an example. There is also a [patch to add support for PDF parsing](https://www.drupal.org/project/migrate_plus/issues/3019758).
- [Migrate Source CSV](https://www.drupal.org/project/migrate_source_csv): allows fetching data from [CSV](https://understanddrupal.com/articles/migrating-csv-files-drupal) files.
- [Migrate Google Sheets](https://www.drupal.org/project/migrate_google_sheets): leverages Migrate Plus functionality to allow fetching data from [Google Sheets](https://understanddrupal.com/articles/migrating-google-sheets-drupal).
- [Migrate Spreadsheet](https://www.drupal.org/project/migrate_spreadsheet): allows fetching data from [Microsoft Excel and LibreOffice Calc](https://understanddrupal.com/articles/migrating-microsoft-excel-and-libreoffice-calc-files-drupal) files.
- [Migrate Source YAML](https://www.drupal.org/project/migrate_source_yaml): allows fetching data from YAML files.
- [WP Migrate](https://www.drupal.org/project/wp_migrate): allows fetching data from a WordPress database.
## Destination plugins
The Migrate API is mostly used to move data into Drupal, but it is possible to [write to other destinations](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process):
- **Migrate** (core): provides classes for creating content and configuration entities. It also offers the `null` plugin which in itself does not write to anything. It is used in multilingual migrations for entity references.
- [Migrate Plus](https://www.drupal.org/project/migrate_plus): provides the `table` plugin for migrating into tables not registered with Drupal Schema API.
- [CSV file](https://github.com/jonathanfranks/d8migrate/tree/master/web/modules/custom/migrate_destination_csv): example destination plugin implementation to write CSV files. The module was created by [J Franks](https://www.drupal.org/u/franksj) for a [DrupalCamp presentation](https://2018.tcdrupal.org/session/drupal-8-migrate-its-not-rocket-science). Check out the [repository](https://github.com/jonathanfranks/d8migrate/tree/master/web/modules/custom/migrate_destination_csv) and [video recording](https://2018.tcdrupal.org/session/drupal-8-migrate-its-not-rocket-science).
## Development related
These modules can help with [writing Drupal migrations](https://understanddrupal.com/articles/writing-your-first-drupal-migration):
- **Migrate** (core): provides the [log](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21Log.php/class/Log) process plugin. See [this article](https://understanddrupal.com/articles/how-debug-drupal-migrations-part-1) for an example of its use.
- [Migrate Devel](https://www.drupal.org/project/migrate_devel): offers Drush options for printing debug information when executing migrations. It also provides the [debug](https://git.drupalcode.org/project/migrate_devel/blob/8.x-1.x/src/Plugin/migrate/process/Debug.php) process plugin. See [this article](https://understanddrupal.com/articles/how-debug-drupal-migrations-part-2) for an example of its use.
- [Migrate Process Vardump](https://www.drupal.org/project/migrate_process_vardump): provides the [vardump](https://git.drupalcode.org/project/migrate_process_vardump/blob/8.x-1.x/src/Plugin/migrate/process/Vardump.php) plugin. It works like the `debug` plugin.
## Field and module related
- [Migrate Media Handler](https://www.drupal.org/project/migrate_media_handler): provides migration process plugins to facilitate the migration into Drupal 8 media entities. The source can be Drupal 7 file or image fields. It also supports inline file embeds in rich text. It leverages the DOM parsing plugins provided by Migrate Plus.
- [Media Migration](https://www.drupal.org/project/media_migration): provides an upgrade path from Drupal 7 to Drupal 8 media entities. The source can be image fields and fields attached to media and file entities.
- [Migrate File Entities to Media Entities](https://www.drupal.org/project/migrate_file_to_media): migrates Drupal 8.0 file entities to Drupal 8.5 media entities.
- [Migrate Files](https://www.drupal.org/project/migrate_file): provides process plugins for [migrating files and images](https://understanddrupal.com/articles/migrating-files-and-images-drupal).
- [Webform Migrate](https://www.drupal.org/project/webform_migrate): provides plugin to help migrating from the Drupal 6 and 7 versions of the Webform module.
- [Migrate HTML to Paragraphs](https://www.drupal.org/project/migrate_html_to_paragraphs): turns HTML markup into paragraph entities.
- [Commerce Migrate](https://www.drupal.org/project/commerce_migrate): offers a general purpose migration framework for bringing store information into Drupal Commerce.
- [Address](https://www.drupal.org/project/address): offers a process plugin to migrate data into fields of type address. It also provides an upgrade path from Drupal 7's [Address Field module](https://www.drupal.org/project/addressfield). See [this article](https://understanddrupal.com/articles/migrating-addresses-drupal) for an example.
- [Geofield](https://www.drupal.org/project/geofield): offers a process plugin to migrate data into fields of type geofield. See [this article](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/contrib-process-plugin-geofield_latlon) for an example.
- [Office Hours](https://www.drupal.org/project/office_hours): offers a process plugin to migrate data into fields of type office hours.
- [Workbench Moderation to Content Moderation](https://www.drupal.org/project/wbm2cm): migrates configuration from one module to the other.
## Modules created by Tess Flynn (socketwench)
While doing the research for this article, we found many useful modules created by [Tess Flynn (socketwench)](https://www.drupal.org/u/socketwench). She is a [fantastic presenter](http://drupal.tv/all-videos?search_api_fulltext=socketwench) who also has written about Drupal migrations, testing, and [much more](https://deninet.com/topic/drupal). Here are some of her modules:
- [Migrate Directory](https://www.drupal.org/project/migrate_directory): imports files from a directory into Drupal as managed files.
- [Migrate Process S3](https://www.drupal.org/project/migrate_process_s3): downloads objects from an S3 bucket into Drupal.
- [Migrate Process URL](https://www.drupal.org/project/migrate_process_url): provides a process plugin to make it easier to migrate into link fields.
- [Migrate Process Vardump](https://www.drupal.org/project/migrate_process_vardump): helps with [debugging migrations](https://understanddrupal.com/articles/how-configure-xdebug-phpstorm-drupalvm-debug-drupal-migrations-drush-commands-browser).
- Many process plugins that wrap PHP functions. For example: [Migrate Process Array](https://www.drupal.org/project/migrate_process_array), [Migrate Process Trim](https://www.drupal.org/project/migrate_process_trim), [Migrate Process Regex](https://www.drupal.org/project/migrate_process_regex), [Migrate Process Skip](https://www.drupal.org/project/migrate_process_skip), and [Migrate Process XML](https://www.drupal.org/project/migrate_process_xml).
## Miscellaneous
- [Feeds Migrate](https://www.drupal.org/project/feeds_migrate): it aims to provide a user interface similar to the one from Drupal 7's [Feeds module](https://www.drupal.org/project/feeds), but working on top of Drupal 8's Migrate API.
- [Migrate Override](https://www.drupal.org/project/migrate_override): allows flagging fields in a content entity so they can be manually changed by side editors without being overridden in a subsequent migration.
- [Migrate Status](https://www.drupal.org/project/migrate_status): checks if migrations are currently running.
- [Migrate QA](https://www.drupal.org/project/migrate_qa): provides tools for validating content migrations. See [this presentation](https://events.drupal.org/seattle2019/sessions/introducing-the-migrate-qa-module) for more details.
And [many, many more modules](https://www.drupal.org/project/project_module?f[3]=drupal_core%3A7234&f[4]=sm_field_project_type%3Afull&text=migrate&solrsort=score+desc)!
What did you learn in today's blog post? Did you find a new module that could be useful in current or future projects? Did we miss a module that has been very useful to you? Share your answers in the comments. Also, I would be grateful if you shared this blog post with others.

70
31.txt Normal file
View file

@ -0,0 +1,70 @@
# Introduction to Drupal 8 upgrades
Throughout the series, we explored many migration topics. We started with an [overview of the ETL process](https://understanddrupal.com/articles/drupal-migrations-understanding-etl-process) and [workflows for managing migrations](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow). Then, we presented example migrations for different entities: [node](https://understanddrupal.com/articles/writing-your-first-drupal-migration)s, [files, images](https://understanddrupal.com/articles/migrating-files-and-images-drupal), [taxonomy terms](https://understanddrupal.com/articles/migrating-taxonomy-terms-and-multivalue-fields-drupal), [users](https://understanddrupal.com/articles/migrating-users-drupal-part-1), and [paragraphs](https://understanddrupal.com/articles/introduction-paragraphs-migrations-drupal). Next, we shifted focus to migrations from different sources: [CSV](https://understanddrupal.com/articles/migrating-csv-files-drupal), [JSON](https://understanddrupal.com/articles/migrating-json-files-drupal), [XML](https://understanddrupal.com/articles/migrating-xml-files-drupal), [Google Sheet](https://understanddrupal.com/articles/migrating-google-sheets-drupal), [Microsoft Excel, and LibreOffice Calc files](https://understanddrupal.com/articles/migrating-microsoft-excel-and-libreoffice-calc-files-drupal). Later, we explored how to [manage migrations as configuration](https://understanddrupal.com/articles/defining-drupal-migrations-configuration-entities-migrate-plus-module), [use groups to share configuration](https://understanddrupal.com/articles/using-migration-groups-share-configuration-among-drupal-migrations), and [execute migrations from the user interface](https://understanddrupal.com/articles/executing-drupal-migrations-user-interface-migrate-tools). Finally, we [gave recommendations](https://understanddrupal.com/articles/how-debug-drupal-migrations-part-1) and [provided tools](https://understanddrupal.com/articles/how-debug-drupal-migrations-part-2) for [debugging migrations from the command line and the user interface](https://understanddrupal.com/articles/how-configure-xdebug-phpstorm-drupalvm-debug-drupal-migrations-drush-commands-browser). Although we covered a lot of ground, we only scratched the surface. The Migrate API is so flexible that its use cases are virtually endless. To wrap up the series, we present an introduction to a very popular topic: **Drupal upgrades**. Let's get started.
*Note*: In this article, when we talk about Drupal 7, the same applies for Drupal 6.
## What is a Drupal upgrade?
The information we presented in the series is generic enough that it applies to many types of Drupal migrations. There is one particular use case that stands out from the rest: **Drupal upgrades**. An *upgrade* is the process of taking your existing Drupal site and copy its *configuration* and *content* over to a new major version of Drupal. For example, going from Drupal 6 or 7 to Drupal 8\. The following is an oversimplification of the workflow to perform the upgrade process:
- Install a fresh Drupal 8 site.
- Add credentials so that the new site can connect to Drupal 7's database.
- Use the Migrate API to generate migration definition files. They will copy over Drupal 7's configuration and content. This step is only about generating the YAML files.
- Execute those migrations to bring the configuration and content over to Drupal 8.
## Preparing your migration
Any migration project requires a good plan of action, but this is particularly important for Drupal upgrades. You need to have a general sense of how the upgrade process works, what assumptions are made by the system, and what limitations exist. Read [this article](https://www.drupal.org/docs/8/upgrade/preparing-a-site-for-upgrade-to-drupal-8) for more details on how to prepare a site for upgrading it to Drupal 8\. Some highlights include:
- Both sites need to be in the latest stable version of their corresponding branch. That means, the latest release of Drupal 7 and 8 at the time of performing the upgrade process. This also applies to any contributed module.
- Do not do any configuration of the Drupal 8 site until the upgrade process is completed. Any configuration you make will be overridden and there is no need for it anyways. Part of the process includes recreating the old site's configuration: content types, fields, taxonomy vocabularies, etc.
- Do not create content on the Drupal 8 site until the upgrade process is completed. The upgrade process will keep the unique identifiers from the source site: `nid`, `uid`, `tid`, `fid`, etc. If you were to create content, the references among entities could be broken when the upgrade process overrides the unique identifiers. To prevent data loss, wait until the old site's content has been migrated to start adding content to the new site.
- For the system to detect a module's configuration to be upgraded automatically, it has to be enabled on both sites. This applies to contributed modules in Drupal 7 (e.g., [link](https://www.drupal.org/project/link)) that were moved to core in Drupal 8\. Also to Drupal 7 modules (e.g. [address field](https://www.drupal.org/project/addressfield)) that were superseded by a different one in Drupal 8 (e.g. [address](https://www.drupal.org/project/address)). In any of those cases, as long as the modules are enabled on both ends, their configuration and content will be migrated. This assumes that the Drupal 8 counterpart offers an automatic upgrade path.
- Some modules do not offer automatic upgrade paths. The primary example is the [Views module](https://www.drupal.org/project/views). This means that any view create in Drupal 7 needs to be manually recreated in Drupal 8.
- The upgrade procedure is all about moving data, not logic in custom code. If you have custom modules, the custom code needs to be ported separately. If those modules store data in Drupal's database, you can use the Migrate API to move it over to the new site.
- Similarly, you will have to recreate the theme from scratch. Drupal 8 introduced [Twig](https://www.drupal.org/docs/8/theming/twig) which is significantly different to the [PHPTemplate engine](https://api.drupal.org/api/drupal/themes%21engines%21phptemplate%21phptemplate.engine/7.x) used by Drupal 7.
## Customizing your migration
Note that the *creation* and *execution* of the migration files are separate steps. Upgrading to a major version of Drupal is often a good opportunity to introduce changes to the website. For example, you might want to change the content modeling, navigation, user permissions, etc. To accomplish that, you can modify the generated migration files to account for any scenario where the new site's configuration diverts from the old one. And only when you are done with the customizations, you *execute* the migrations. Examples of things that could change include:
- Combining or breaking apart content types.
- Moving data about people from node entities to user entities, or vice versa.
- Renaming content types, fields, taxonomy vocabularies and terms, etc.
- Changing field types. For example, going from [Address Field module](https://www.drupal.org/project/addressfield) in Drupal 7 to [Address module](https://www.drupal.org/project/address) in Drupal 8.
- Merging multiple taxonomy vocabularies into one.
- Changing how your content is structured. For example, going from a monolithic body field to paragraph entities.
- Changing how your multimedia files are stored. For example, going from image fields to media entities.
## Performing the upgrade
There are two options to perform the upgrade. In both cases, the process is initiated from the Drupal 8 site. One way is using the [Migrate Drupal UI core module](https://understanddrupal.com/articles/list-migration-related-drupal-modules) to perform the upgrade from the [browser's user interface](https://www.drupal.org/docs/8/upgrade/upgrade-using-web-browser). When the module is enabled, go to `/upgrade` and provide the database credentials of the Drupal 7 site. Based on the installed modules on both sites, the system will give you a report of what can be automatically upgraded. Consider the limitations explained above. While the upgrade process is running, you will see a stream of messages about the operation. These messages are logged to the database so you can read them after the upgrade is completed. If your dataset is big or there are many expensive operations like password encryption, the process can take too long to complete or fail altogether.
The other way to perform the upgrade procedure is from the [command line using Drush](https://www.drupal.org/docs/8/upgrade/upgrade-using-drush). This requires the [Migrate Upgrade contributed module](https://www.drupal.org/project/migrate_upgrade). When enabled, it adds Drush commands to import and rollback a full upgrade operation. You can provide database connection details of the old site via command line options. One benefit of using this approach is that you can *create the migration files without running them*. This lets you do customizations as explained above. When you are done, you can run the migrations following the same [workflow](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow) of [manually created ones](https://understanddrupal.com/articles/writing-your-first-drupal-migration).
## Known issues and limitations
Depending on whether you are upgrading from Drupal 6 or 7, there is a list of known issues you need to be aware of. Read [this article](https://www.drupal.org/docs/8/upgrade/known-issues-when-upgrading-from-drupal-6-or-7-to-drupal-8) for more information. One area that can be tricky is multilingual support. As of this writing, the upgrade path for multilingual sites is not complete. Limited support is available via the [Migrate Drupal Multilingual core module](https://understanddrupal.com/articles/list-migration-related-drupal-modules). There are many things to consider when working with multilingual migrations. For example, are you using node or field translations? Do entities have revisions? Read [this article](https://www.drupal.org/docs/8/upgrade/upgrading-multilingual-drupal-6-to-drupal-8) for more information.
## Upgrade paths for contributed modules
The automatic upgrade procedure only supports Drupal core modules. This includes modules that were added to core in Drupal 8\. For any other contributed module, it is the maintainers' decision to include an automatic upgrade path or not. For example, the [Geofield module](https://www.drupal.org/project/geofield) provides an [upgrade path](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/contrib-process-plugin-geofield_latlon). It is also possible that a module in Drupal 8 offers an upgrade path from a different module in Drupal 7\. For example, the [Address module](https://www.drupal.org/project/address) provides an [upgrade path](https://www.drupal.org/docs/8/api/migrate-api/migrate-process-plugins/contrib-process-plugin-addressfield) from the [Address Field module](https://www.drupal.org/project/addressfield). [Drupal Commerce](https://www.drupal.org/project/commerce) also provides some support via the [Commerce Migrate module](https://www.drupal.org/project/commerce_migrate).
Not every module offers an automated upgrade path. In such cases, you can write custom plugins which ideally are contributed back to [Drupal.org](https://www.drupal.org/) ;-) Or you can use the techniques learned in the series to [transform your source data](https://understanddrupal.com/articles/using-process-plugins-data-transformation-drupal-migrations) into the [structures expected by Drupal 8](https://understanddrupal.com/articles/migrating-addresses-drupal). In both cases, having a [broad understanding of the Migrate API](https://understanddrupal.com/migrations) will be very useful.
## Upgrade strategies
There are multiple migration strategies. You might even consider manually recreating the content if there is only a handful of data to move. Or you might decide to use the Migrate API to upgrade part of the site automatically and do a manual copy of a different portion of it. You might want to execute a fully automated upgrade procedure and manually clean up edge cases afterwards. Or you might want to customize the migrations to account for those edge cases already. [Michael Anello](https://www.drupal.org/u/ultimike) created an insightful [presentation on different migration strategies](https://www.youtube.com/watch?v=HsPcnZS_qL4). Our [tips for writing migrations](https://understanddrupal.com/articles/tips-writing-drupal-migrations-and-understanding-their-workflow) apply as well.
Drupal upgrades tend to be fun, challenging projects. The more you know about the Migrate API the easier it will be to complete the project. We enjoyed writing this [overview of the Drupal Migrate API](https://understanddrupal.com/migrations). **We would love to work on a follow up series focused on Drupal upgrades. If you or your organization could sponsor such endeavor, please reach out to us** via the [site's contact form](https://understanddrupal.com/contact/feedback).
## What about upgrading to Drupal 9?
In March 2017, project lead [Dries Buytaert](https://www.drupal.org/u/dries) announced a plan to make [Drupal upgrades easier forever](https://dri.es/making-drupal-upgrades-easy-forever). This was reinforced during his [keynote at DrupalCon Seattle 2019](https://dri.es/state-of-drupal-presentation-april-2019). You can watch the video recording in [this link](https://www.youtube.com/watch?v=Nf_aD3dTloY). In short, [Drupal 9.0](https://www.drupal.org/docs/9) will be the latest point release of Drupal 8 minus deprecated APIs. This has very important implications:
- When Drupal 9 is released, the Migrate API should be mostly the same of Drupal 8\. Therefore, anything that you learn today will be useful for Drupal 9 as well.
- As long as your code does not use deprecated APIs, [upgrading](https://www.drupal.org/docs/8/upgrade) from Drupal 8 to Drupal 9 will be as easy as [updating](https://www.drupal.org/docs/8/update) from Drupal 8.7 to 8.8.
- Because of this, there is no need to wait for Drupal 9 to upgrade your Drupal 6 or 7 site. You can upgrade to Drupal 8 today.
What did you learn in today's blog post? Did you know the upgrade process is able to copy content and configuration? Did you know that you can execute the upgrade procedure either from the user interface or the command line? Share your answers in the comments. Also, we would be grateful if you shared this blog post with others.

View file

@ -7,4 +7,25 @@
07.txt 07.txt
08.txt 08.txt
09.txt 09.txt
10.txt 10.txt
11.txt
12.txt
13.txt
14.txt
15.txt
16.txt
17.txt
18.txt
19.txt
20.txt
21.txt
22.txt
23.txt
24.txt
25.txt
26.txt
27.txt
28.txt
29.txt
30.txt
31.txt