NAV Navbar
shell python php
  • Verifai Serverside OCR
  • Verifai Serverside OCR

    Serverside version of the OCR component of Verifai.

    index-page

    Introduction

    OCR stands for Oprical Character Recognition. It is a technology that allows machines to read text from images. In this case it only reads Machine Readable Zones (MRZ).

    Because a picture is worth a thousand words, lets start with the high level overview of the serverside solution we offer. This one is different from the Verifai Serverside Classifier documentation, that misses the OCR component (because it is optional).

    index-page

    At the top-right you see the Verifai Serverside OCR. This document describes how you can get it up and running, and send data to the classifier.

    As you can see in the image we do not send the document to our servers. After classification we only get the metadata on the document to help you process it.

    Prerequisites

    Setting up the service

    Get credentials via the "Verifai Dashboard" > "Downloads" > "Docker Registry"

    $ docker login docker-registry.verifai.com -u \<username> -p \<password>
    $ docker pull docker-registry.verifai.com/serverside-ocr:latest
    
    version: "3"
    
    services:
      verifai-classifier:
        image: docker-registry.verifai.com/serverside-ocr:latest
        container_name: verifai-ocr
        volumes:
          - './:/data'
        hostname: "<application identifier from the dashboard>"
        ports:
          - "5001:80"
    

    The service will extract the required credentials from a license file, this license file can be obtained through the Verifai Dashboard.

    In the data folder there needs to be a licence.txt file. That should contain the licence that can be found on the Verifai Dashboard. With the example docker-compose (on the right) it mounts the current directory into the container.

    The most convenient way of running the service on your development environment is by using Docker Compose. You can use the example on the right, to quickly get the service up and running.

    Starting the service

    $ docker-compose up
    

    Console output should be something like this:

    @TODO paste log in here
    

    Running docker-compose up should do the trick. At the end of the log it should expose a webservice on http://localhost:5001/.

    As you can see in the log there are serval steps it takes:

    Sending an MRZ image to the classifier

    $ curl -F "file=@dutch-id-mrz-sample.jpg" http://localhost:5001/api/ocr/
    
    import requests
    
    image = 'dutch-id-mrz-sample.jpg'
    
    r = requests.post(
        'http://localhost:5001/api/ocr/',
        files={'file': image}
    )
    # Print the raw JSON response
    print(r.text)
    
    <?php
    $image = 'dutch-id-mrz-sample.jpg';
    
    $ch = curl_init('http://localhost:5001/api/ocr/');
    $data = array(
        'file' => '@'.$image,
    );
    curl_setopt($ch, CURLOPT_POSTFIELDS, $data);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
    $output = curl_exec($ch);
    curl_close($ch);
    
    # Print the raw JSON response
    print($output);
    ?>
    

    It will return the following:

       {
          "message": "",
          "result": {
            "checksums": {
              "score": 1,
              "valid_composite": true,
              "valid_date_of_birth": true,
              "valid_expiration_date": true,
              "valid_number": true
            },
            "fields": {
              "check_composite": "8",
              "check_date_of_birth": "1",
              "check_expiration_date": "6",
              "check_number": "2",
              "country": "NLD",
              "date_of_birth": "650310",
              "expiration_date": "240309",
              "mrz_type": "TD1",
              "names": "WILLEKE LISELOTTE",
              "nationality": "NLD",
              "number": "SPECI2014",
              "optional1": "999999990     8",
              "optional2": "",
              "sex": "F",
              "surname": "DE BRUIJN",
              "type": "I"
            },
            "fields_raw": {
              "check_composite": "8",
              ...
            },
            "raw": "HS\nI<NLDSPECI20142999999990<<<<<8\n6503101F2403096NLD<<<<<<<<<<<8\nDE<BRUIJN<<WILLEKE<LISELOTTE<<\n\n4",
            "rotation": -2
          },
          "status": "SUCCESS"
        }
    

    dutch-sample-mrz

    You can send files to the API endpoint (/api/ocr/) using for example curl. If you've done that you will get a response. The faster your computer, the quicker the response. As you can see in the response the MRZ is read successfully.

    Processing the response

    There are several fields that get returned. Depending of the type of MRZ it outputs a different set of fields.

    field example TD1 TD2 TD3
    check_composite 8 X X X
    check_date_of_birth 1 X X X
    check_expiration_date 6 X X X
    check_number 2 X X X
    check_personal_number 3 - - X
    country NLD X X X
    date_of_birth 650310 X X X
    expiration_date 240309 X X X
    mrz_type TD1 X X X
    names WILLEKE LISELOTTE X X X
    nationality NLD X X X
    number SPECI2014 X X X
    optional1 999999990 8 X X -
    personal_number 999999990 - - X
    optional2 `` X - -
    sex F X X X
    surname DE BRUIJN X X X
    type I X X X

    Fields that start with check_ are used as checksum for other fields.

    Interpreting the result.checksums

    Depending on the result.mrz_type different checksums are calculated.

    The result.score is a float value represending the percentage of characters has been successfully checked. For example 1.0 is the maximum score.

    When the result.score is 1.0 the status field will be SUCCESS. Complete list of status options:

    Issues

    Service starting errors

    Check the logs before you do anything

    $ docker-compose logs
    

    If the licence is not valid, the service will exit while starting. You should check the logs first for abnormalities.