Scanbot SDK has been acquired by Apryse! Learn more

Learn more
Skip to content

How to scan MRZs with OCR: Considerations and first steps

Kevin January 12, 2023 9 mins read
MRZ OCR (machine-readable zone)

MRZs consist of lines of alphanumeric characters and are primarily found on the bottom of ID documents and passports. They encode information about the document The machine-readable zones (MRZs) on passports and ID documents from all over the globe are fundamental to modern identity verification.

In this article, you will learn how to use modern OCR software to accurately extract data from MRZs – in just a few steps.

What is an MRZ?

Machine-readable zones (MRZs) consist of lines of alphanumeric characters and are primarily found on the bottom of ID documents and passports. They encode information about the document holder in a standardized format designed for easy processing by machines.

At border crossings, airports, and other touchpoints, MRZs enable fast, accurate, and secure verification of the document holder’s identity. 

The development of machine-readable zones is closely intertwined with the introduction of standardized travel documents in many countries. Shortly after WWII, the United Nations established the International Civil Aviation Organization (ICAO). Its purpose is to coordinate air transport on an international scale and improve the safety of air navigation. 

The ICAO’s Doc 9303, published in 1980, describes standards for international travel documents. These documents could be read automatically using optical character recognition (OCR) – a relatively new technology at that time. 

In the 1980s, the first countries began issuing machine-readable passports (MRP) with MRZs. Since then, the ICAO standards have evolved to include biometric features and electronic data storage, resulting in today’s e-passports.

Different types of MRZs

Nowadays, machine-readable zones are present on all kinds of identity documents, not just passports. They encode the most important information (but not all) recorded in a document’s visual inspection zone (VIZ).

Depending on the size of the document, the MRZ’s formatting varies slightly, which reading devices must account for. There are five standard document formats, each with a different type of MRZ: TD1, TD2, TD3, MRV-A, and MRV-B.

TD1

This format is used for credit card-sized documents, like ID cards. The convenient size has the downside that the MRZ is located on the back, which means both sides of the document must be scanned.

The TD1 MRZ consists of three lines of 30 characters each, including check digits. If the encoded data in a line does not fill it entirely, filler characters (<) are used to complete it. The MRZ code includes the following information:

  1. Document type
  2. Issuing state
  3. Document number
  4. Check digit for document number
  5. (Space for optional data)
  6. Date of birth
  7. Check digit for date of birth
  8. Gender (can be omitted by replacing with “<”)
  9. Expiry date of document
  10. Check digit for expiry date
  11. Nationality
  12. (Space for optional data)
  13. Check digit for first two MRZ lines
  14. Last name(s) (or other primary identifier)
  15. First name(s) (or other secondary identifier)

TD2

The TD2 document format is slowly being replaced by the TD1 standard, but is still being used for ID cards and visas in some countries. Since the format is bigger, the MRZ fits on the same side as the human-readable information. It also occupies just two lines of 36 characters each.

The encoded data is the same as for TD1, but in a different order. Notably, the holder’s name is on the first line rather than the last.

TD3

MRZ example on passport

This format is used for most international passports, specifically the identification card at the beginning of the booklet. Just like TD2, it has two MRZ lines on the front of the document, but with 44 characters each. The first line begins with a “P” (for “passport”) and the second ends with an additional check digit for optional data.

MRV-A

MRV stands for machine-readable visa. This document type is indicated by a “V” at the beginning of the first MRZ line. The order and types of information encoded in the MRZ of an MRV-A document are similar to that of TD2 and TD3, but with no check digit for the second line. Each line is 44 characters long.

MRV-B

MRV-B documents are slightly smaller than MRV-A ones, thus the two MRZ lines are just 36 characters long. The information encoded is the same, however.

The MRZ formats at a glance

FormatDocument type(s)MRZ linesCharacters per lineMRZ locationOther features
TD1ID cards, passport cards330Back of cardSpace for optional data, compact size
TD2Visas, some ID cards236Front of documentUsed for visa stickers and larger ID cards
TD3Passport booklets244Bottom of data pageMost common for passports worldwide
MRV-AVisa labels (large)244On visa labelUsed when more space is available on page
MRV-BVisa labels (small)236On visa labelSuited for smaller passport pages

How to scan an MRZ with OCR

Today, just like when machine-readable passports were introduced in the 80s, MRZs are scanned using OCR (optical character recognition). For enhanced readability, they use a special font called OCR-B, specially designed to be read by electronic devices.

OCR is a technology for converting images of text, such as scanned MRZs, into machine-readable, editable data. The result is suitable for automated processing. 

Typically, OCR engines would check characters against a database of varying fonts or of topological features. However, modern OCR software achieves superior pattern recognition by using neural networks. 

There are three steps to scanning and decoding an MRZ with OCR:

  1. MRZ scanner software captures an image of the identity document. This image can later be exported alongside the information contained in the MRZ.
  2. The MRZ is read using OCR. Some software also extracts the information as key-value pairs.
  3. The software validates the extracted information against the check digits in the MRZ. If this fails, there is an error and the scan must be repeated.

Modern text recognition software can extract information from the visual inspection zone as well. With a scanned image of the travel document, the information from the VIZ, and the MRZ data, scanning errors can be ruled out almost entirely.

Common problems with scanning MRZs

Traditional MRZ scanners are stationary with a level scanning surface. They are still being used in places like airports or administrative offices. While inflexible, they have some advantages for OCR: First, the document lies completely flat while being scanned. Second, the document does not move during the scanning process. And third, lighting is controlled by the device.

On the other hand, their inflexibility has several disadvantages. During peak times with higher MRZ scanning volume, stationary devices can become bottlenecks, causing long queues and delays. Simply buying additional MRZ scanners is inefficient, as they would sit idle most of the time.

Mobile MRZ OCR scanner software, conversely, enables you to handle a growing scanning volume effortlessly. Integrated into a mobile or web app, it delivers fast and accurate scan results on any smart device with a camera.

While mobile MRZ scanners offer much more flexibility than fixed hardware, they also have to handle a greater number of variables. The software must deal with typical challenges of mobile scanning, such as skewed documents due to scanning angles, blur caused by hand movement, and varying lighting conditions.

This is in addition to the challenges of MRZ scanning specifically. Variations in print quality, slight deviations in old documents, and the usual wear and tear – scratches, peeling lamination, fading ink – can cause scan failures or data extraction errors. 

This is why using powerful MRZ scanning software is crucial.

Integrating an MRZ OCR scanner into your mobile or web app

Software development kits (SDKs) are like comprehensive toolboxes for app development. They contain software tools, libraries, documentation, and code samples, among other useful resources. 

The Scanbot MRZ Scanner SDK for iOS, Android, cross-platform, and web apps gives you all the features you need to quickly and reliably scan MRZs on ID and travel documents. It extracts all relevant MRZ data fields and returns them as key-value pairs, ideal for automated processing. All of this works without connection to third-party servers, ensuring maximum data security and full functionality in areas without connectivity.

MRZ scanning with smartphone
MRZ scan results with the Scanbot MRZ Scanner SDK

The Ready-to-Use UI Components for MRZ scanning offer a pre-built, user-friendly interface that still provides extensive customization options. It can be integrated into your app within minutes.

Start MRZ scanning today

You can set up a fully functional MRZ OCR scanner app with just a few lines of code – even in a single, self-contained HTML file, thanks to the Scanbot Web MRZ Scanner SDK.

<!DOCTYPE html>
<html lang="en">

<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0, maximum-scale=1.0, user-scalable=0" />
    <title>Web MRZ Scanner</title>
</head>

<body style="margin: 0">
    <button id="scan-mrz">Scan MRZ</button>
    <p id="mapped"></p>
    <script type="module">
        import "https://cdn.jsdelivr.net/npm/scanbot-web-sdk@7.2.0/bundle/ScanbotSDK.ui2.min.js";
        const sdk = await ScanbotSDK.initialize({
            enginePath:
                "https://cdn.jsdelivr.net/npm/scanbot-web-sdk@7.2.0/bundle/bin/complete/"
        });
        document
            .getElementById("scan-mrz")
            .addEventListener("click", async () => {
                const config = new ScanbotSDK.UI.Config.MrzScannerScreenConfiguration();
                const result = await ScanbotSDK.UI.createMrzScanner(config);

                const mappedParagraph = document.getElementById("mapped");
                mappedParagraph.innerText = "";

                if (!result) {
                    mappedParagraph.innerText = "No result. Please try again.";
                    return;
                }

                if (!result?.mrzDocument) {
                    mappedParagraph.innerText = "No MRZ document found. Please try again.";
                    return;
                }

                const mapped = result?.mrzDocument.fields.map(field => {
                    return `${field.type.name}:  ${field.value?.text}`
                });

                mappedParagraph.innerHTML = mapped.join('<br>');
            })
    </script>
</body>

</html>

Feel free to copy the code into an HTML file and open it in your browser. When moving to production, we recommend you download the SDK and host it on your server instead of using a CDN.

You can run the Scanbot SDK from your project for 60 seconds per session without a license. For more in-depth testing, generate a free trial license

Do you want to learn more about the SDK? Our experts are always happy to chat. Simply message us at sdk@scanbot.io.

What does MRZ mean?

MRZ stands for “machine-readable zone”. It is a standardized part of a travel or identity document that can be read automatically for fast identity verification.

What is an MRZ scanner?

An MRZ scanner is hardware or software designed to scan documents with machine-readable zones and extract the information.

Where is the MRZ on a passport?

The MRZ is located at the bottom of the identification card at the beginning of your passport booklet.

Where is the MRZ on a green card?

Depending on your green card’s format, the MRZ is located either on the lower half of the front side or on the back.

How do I find my MRZ code?

MRZs use a unique font called OCR-B and standardized formatting. You can easily recognize the MRZ code on your document by looking for the “<” characters used to fill the empty space between the encoded information.

Related blog posts

MRZ OCR (machine-readable zone)

What is OCR software?

How does optical character recognition work? And what can we do to achieve better end results? This article will help you understand the underlying mechanics of OCR technology.

Kevin 4 mins read