14 KiB
Migrating addresses into Drupal
Today we will learn how to migrate addresses into Drupal. We are going to use the field provided by the Address module which depends on the third-party library `commerceguys/addressing. When migrating addresses you need to be careful with the data that Drupal expects. The address components can change per country. The way to store those components also varies per country. These and other important consideration will be explained. Let's get started.
Getting the code
You can get the full code example at https://github.com/dinarcon/ud_migrations The module to enable is UD address
whose machine name is ud_migrations_address
. The migration to execute is udm_address
. Notice that this migration writes to a content type called UD Address
and one field: field_ud_address
. This content type and field will be created when the module is installed. They will also be removed when the module is uninstalled. The demo module itself depends on the following modules: address
and migrate
.
Note: Configuration placed in a module's config/install
directory will be copied to Drupal's active configuration. And if those files have a dependencies/enforced/module
key, the configuration will be removed when the listed modules are uninstalled. That is how the content type and fields are automatically created and deleted.
The recommended way to install the Address module is using composer: composer require drupal/address
. This will grab the Drupal module and the commerceguys/addressing
library that it depends on. If your Drupal site is not composer-based, an alternative is to use the Ludwig module. Read this article if you want to learn more about this option. In the example, it is assumed that the module and its dependency were obtained via composer. Also, keep an eye on the Composer Support in Core Initiative as they make progress.
Source and destination sections
The example will migrate three addresses from the following countries: Nicaragua, Germany, and the United States of America (USA). This makes it possible to show how different countries expect different address data. As usual, for any migration you need to understand the source. The following code snippet shows how the source and destination sections are configured:
source:
plugin: embedded_data
data_rows:
- unique_id: 1
first_name: 'Michele'
last_name: 'Metts'
company: 'Agaric LLC'
city: 'Boston'
state: 'MA'
zip: '02111'
country: 'US'
- unique_id: 2
first_name: 'Stefan'
last_name: 'Freudenberg'
company: 'Agaric GmbH'
city: 'Hamburg'
state: ''
zip: '21073'
country: 'DE'
- unique_id: 3
first_name: 'Benjamin'
last_name: 'Melançon'
company: 'Agaric SA'
city: 'Managua'
state: 'Managua'
zip: ''
country: 'NI'
ids:
unique_id:
type: integer
destination:
plugin: 'entity:node'
default_bundle: ud_address
Note that not every address component is set for all addresses. For example, the Nicaraguan address does not contain a ZIP code. And the German address does not contain a state. Also, the Nicaraguan state is fully spelled out: Managua
. On the contrary, the USA state is a two letter abbreviation: MA
for Massachusetts
. One more thing that might not be apparent is that the USA ZIP code belongs to the state of Massachusetts. All of this is important because the module does validation of addresses. The destination is the custom ud_address
content type created by the module.
Available subfields
The Address field has 13 subfields available. They can be found in the schema method of the AddresItem class. Fields are not required to have a one-to-one mapping between their schema and the form widgets used for entering content. This is particularly true for addresses because input elements, labels, and validations change dynamically based on the selected country. The following is a reference list of all subfields for addresses:
langcode
for language code.country_code
for country.administrative_area
for administrative area (e.g., state or province).locality
for locality (e.g. city).dependent_locality
for dependent locality (e.g. neighbourhood).postal_code
for postal or ZIP code.sorting_code
for sorting code.address_line1
for address line 1.address_line2
for address line 2.organization
for company.given_name
for first name.additional_name
for middle name.family_name
for last name:
Properly describing an address is not trivial. For example, there are discussions to add a third address line component. Check this issue if you need this functionality or would like to participate in the discussion.
Address subfield mappings
In the example, only 9 out of the 13 subfields will be mapped. The following code snippet shows how to do the processing of the address field:
field_ud_address/given_name: first_name
field_ud_address/family_name: last_name
field_ud_address/organization: company
field_ud_address/address_line1:
plugin: default_value
default_value: 'It is a secret ;)'
field_ud_address/address_line2:
plugin: default_value
default_value: 'Do not tell anyone :)'
field_ud_address/locality: city
field_ud_address/administrative_area: state
field_ud_address/postal_code: zip
field_ud_address/country_code: country
The mapping is relatively simple. You specify a value for each subfield. The tricky part is to know the name of the subfield and the value to store in it. The format for an address component can change among countries. The easiest way to see what components are expected for each country is to create a node for a content type that has an address field. With this example, you can go to /node/add/ud_address
and try it yourself. For simplicity sake, let's consider only 3 countries:
- For USA, city, state, and ZIP code are all required. And for state, you have a specific list form which you need to select from.
- For Germany, the company is moved above first and last name. The ZIP code label changes to Postal code and it is required. The city is also required. It is not possible to set a state.
- For Nicaragua, the Postal code is optional. The State label changes to Department. It is required and offers a predefined list to choose from. The city is also required.
Pay very close attention. The available subfields will depend on the country. Also, the form labels change per country or language settings. They do not necessarily match the subfield names. Moreover, the values that you see on the screen might not match what is stored in the database. For example, a Nicaraguan address will store the full department name like Managua
. On the other hand, a USA address will only store a two-letter code for the state like MA
for Massachusetts
.
Something else that is not apparent even from the user interface is data validation. For example, let's consider that you have a USA address and select Massachusetts
as the state. Entering the ZIP code 55111
will produce the following error: Zip code field is not in the right format.
At first glance, the format is correct, a five-digits code. The real problem is that the Address module is validating if that ZIP code is valid for the selected state. It is not valid for Massachusetts. 55111
is a ZIP code for the state of Minnesota which makes the validation fail. Unfortunately, the error message does not indicate that. Nine-digits ZIP codes are accepted as long as they belong to the state that is selected.
Note: If you are upgrading from Drupal 7, the D8 Address module offers a process plugin to upgrade from the D7 Address Field module.
Finding expected values
Values for the same subfield can vary per country. How can you find out which value to use? There are a few ways, but they all require varying levels of technical knowledge or access to resources:
- You can inspect the source code of the address field widget. When the country and state components are rendered as select input fields (dropdowns), you can have a look at the
value
attribute for theoption
that you want to select. This will contain the two-letter code for countries, the two-letter abbreviations for USA states, and the fully spelled string for Nicaraguan departments. - You can use the Devel module. Create a node containing an address. Then use the
devel
tab of the node to inspect how the values are stored. It is not recommended to have thedevel
module in a production site. In fact, do not deploy the code even if the module is not enabled. This approach should only be used in a local development environment. Make sure no module or configuration is committed to the repo nor deployed. - You can inspect the database. Look for the records in a table named
node__field_[field_machine_name]
, if migrating nodes. First create some example nodes via the user interface and then query the table. You will see how Drupal stores the values in the database.
If you know a better way, please share it in the comments.
The commerceguys addressing library
With version 8 came many changes in the way Drupal is developed. Now there is an intentional effort to integrate with the greater PHP ecosystem. This involves using already existing libraries and frameworks, like Symfony. But also, making code written for Drupal available as external libraries that could be used by other projects. commerceguys\addressing
is one example of a library that was made available as an external library. That being said, the Address module also makes use of it.
Explaining how the library works or where its fetches its database is beyond the scope of this article. Refer to the library documentation for more details on the topic. We are only going to point out some things that are relevant for the migration. For example, the ZIP code validation happens at the validatePostalCode method of the AddressFormatConstraintValidator class. There is no need to know this for a migration project. But the key thing to remember is that the migration can be affected by third-party libraries outside of Drupal core or contributed modules. Another example, is the value for the state subfield. Address module expects a subdivision
code as listed in one of the files in the resources/subdivision
directory.
Does the validation really affect the migration? We have already mentioned that the Migrate API bypasses Form API validations. And that is true for address fields as well. You can migrate a USA address with state Florida
and ZIP code 55111
. Both are invalid because you need to use the two-letter state code FL
and use a valid ZIP code within the state. Notwithstanding, the migration will not fail in this case. In fact, if you visit the migrated node you will see that Drupal happily shows the address with the data that you entered. The problems arrives when you need to use the address. If you try to edit the node you will see that the state will not be preselected. And if you try to save the node after selecting Florida
you will get the validation error for the ZIP code.
This validation issues might be hard to track because no error will be thrown by the migration. The recommendation is to migrate a sample combination of countries and address components. Then, manually check if editing a node shows the migrated data for all the subfields. Also check that the address passes Form API validations upon saving. This manual testing can save you a lot of time and money down the road. After all, if you have an ecommerce site, you do not want to be shipping your products to wrong or invalid addresses. ;-)
Technical note: The commerceguys/addressing
library actually follows ISO standards. Particularly, ISO 3166 for country and state codes. It also uses CLDR and Google's address data. The dataset is stored as part of the library's code in JSON format.
Migrating country and zone fields
The Address module offer two more fields types: Country
and Zone
. Both have only one subfield value
which is selected by default. For country, you store the two-letter country code. For zone, you store a serialized version of a Zone object.
What did you learn in today's blog post? Have you migrated address before? Did you know the full list of subcomponents available? Did you know that data expectations change per country? Please share your answers in the comments. Also, I would be grateful if you shared this blog post with others.