31dom/10.md
Mauricio Dinarte ba253739b1 Rename files
2023-08-04 10:43:39 -06:00

103 lines
8.6 KiB
Markdown

# Migrating users into Drupal - Part 1
It is time we are how to migrate users into Drupal. In this case, the explanation will be divided into two chapters. In this one, we cover the migration of email, timezone, username, password, and status. In the next one, we will cover creation date, roles, and profile pictures. Several techniques will be implemented to ensure that the migrated data is valid. For example, making sure that usernames are not duplicated.
## Getting the code
You can get the full code example at <https://github.com/dinarcon/ud_migrations> The module to enable is `UD users` whose machine name is `ud_migrations_users`. The two migrations to execute are `udm_user_pictures` and `udm_users`. Notice that both migrations belong to the same module.
The example assumes Drupal was installed using the `standard` installation profile. Particularly, we depend on a Picture (`user_picture`) _image_ field attached to the user entity. The word in parenthesis represents the _machine name_ of the image field.
The explanation below is only for the user migration. It depends on a file migration to get the profile pictures. One motivation to have two migrations is for the images to be deleted if the file migration is rolled back. Note that other techniques exist for migrating images without having to create a separate migration. We have covered two of them in the chapters about `subfields` and `constants and pseudofields`.
## Understanding the source
It is very important to understand the format of your _source_ data. This will guide the transformation process required to produce the expected destination format. For this example, it is assumed that the legacy system from which users are being imported did not have unique usernames. Emails were used to uniquely identify users, but that is not desired in the new Drupal site. Instead, a username will be created from a `public_name` source column. Special measures will be taken to prevent duplication as Drupal usernames must be unique. Two more things to consider. First, source passwords are provided in _plain_ text (never do this!). Second, some elements might be missing in the source like roles and profile picture. The following snippet shows a sample record for the _source_ section:
```yaml
source:
plugin: embedded_data
data_rows:
- legacy_id: 101
public_name: "Michele"
user_email: "micky@example.com"
timezone: "America/New_York"
user_password: "totally insecure password 1"
user_status: "active"
member_since: "January 1, 2011"
user_roles: "forum moderator, forum admin"
user_photo: "P01"
ids:
legacy_id:
type: integer
```
## Configuring the destination and dependencies
The _destination_ section specifies that _user_ is the target entity. When that is the case, you can set an optional `md5_passwords` configuration. If it is set to `true`, the system will take an MD5 hashed password and convert it to the encryption algorithm that Drupal uses.[^10-passmig] To migrate the profile pictures, a separate migration is created. The dependency of user on file is added explicitly. The following code snippet shows how the destination and dependencies are set:
[^10-passmig]: For more information password migrations refer to these articles for basic: <https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users> and advanced: <https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users-advanced-password> use cases.
```yaml
destination:
plugin: "entity:user"
md5_passwords: true
migration_dependencies:
required:
- udm_user_pictures
optional: []
```
## Processing the fields
The interesting part of a _user_ migration is the field mapping. The specific transformation will depend on your _source_, but some arguably complex cases will be addressed in the example. Let's start with the basics: verbatim copies from source to destination. The following snippet shows three mappings:
```yaml
mail: user_email
init: user_email
timezone: user_timezone
```
The `mail`, `init`, and `timezone` entity properties are copied directly from the source. Both `mail` and `init` are _email addresses_. The difference is that `mail` stores the current email, while `init` stores the one used when the account was first created. The former might change if the user updates its profile, while the latter will never change. The `timezone` needs to be a string taken from a specific set of values. [^10-timezone]
[^10-timezone]: Refer to this page for a list of supported timezones: <https://www.php.net/manual/en/timezones.php>
```yaml
name:
- plugin: machine_name
source: public_name
- plugin: make_unique_entity_field
entity_type: user
field: name
postfix: _
```
The `name`, _entity property_ stores the _username_. This has to be unique in the system. If the _source_ data contained a unique value for each record, it could be used to set the username. None of the unique source columns (eg., `legacy_id`) is suitable to be used as username. Therefore, extra processing is needed. The [machine_name](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21MachineName.php/class/MachineName) plugin converts the `public_name` _source_ column into transliterated string with some restrictions: any character that is not a number or letter will be converted to an underscore. The transformed value is sent to the `make_unique_entity_field`. This plugin makes sure its input value is not repeated in the whole system for a particular entity field. In this example, the username will be unique. The plugin is configured indicating which _entity type_ and _field_ (property) you want to check. If an equal value already exists, a new one is created appending what you define as `postfix` plus a number. In this example, there are two records with `public_name` set to `Benjamin`. Eventually, the usernames produced by running the process plugins chain will be: `benjamin` and `benjamin_1`.
```yaml
process:
pass:
plugin: callback
callable: md5
source: user_password
destination:
plugin: "entity:user"
md5_passwords: true
```
The `pass`, entity property stores the user's password. In this example, the source provides the passwords in plain text. Needless to say, that is a terrible idea. But let's work with it for now. Drupal uses portable PHP password hashes implemented by PhpassHashedPassword. Understanding the details of how Drupal converts one algorithm to another will be left as an exercise for the curious reader. In this example, we are going to take advantage of a feature provided by the migrate API to automatically convert MD5 hashes to the algorithm used by Drupal. The `callback` plugin is configured to use the `md5` PHP function to convert the plain text password into a hashed version. The last part of the puzzle is set, in the _process_ section, the `md5_passwords` configuration to `true`. This will take care of converting the already md5-hashed password to the value expected by Drupal. The migrate API documentation provides more examples for migrating already [MD5 hashed passwords](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users) and [other hashing algorithms](https://www.drupal.org/docs/8/api/migrate-api/migrate-destination-plugins-examples/migrating-users-advanced-password).
_Note_: MD5-hash passwords are insecure. In the example, the password is encrypted with MD5 as an **intermediate step only**. Drupal uses other algorithms to store passwords securely.
```yaml
status:
plugin: static_map
source: user_status
map:
inactive: 0
active: 1
```
The `status`, _entity property_ stores whether a user is active or blocked from the system. The source `user_status` values are strings, but Drupal stores this data as a boolean. A value of zero (**0**) indicates that the user is _blocked_ while a value of one (**1**) indicates that it is _active_. The [static_map](https://api.drupal.org/api/drupal/core%21modules%21migrate%21src%21Plugin%21migrate%21process%21StaticMap.php/class/StaticMap) plugin is used to manually map the values from source to destination. This plugin expects a `map` configuration containing an _array of key-value mappings_. The value from the source is on the left. The value expected by Drupal is on the right.
_Technical note_: Booleans are `true` or `false` values. Even though Drupal treats the `status` property as a boolean, it is internally stored as a `tiny int` in the database. That is why the numbers zero or one are used in the example. For this particular case, using a number or a boolean value on the right side of the mapping produces the same result.