NAV Navbar
python
  • Verifai for Python (v 2.0.0)
  • Verifai for Python (v 2.0.0)

    You landed on the full documentation page of Verifai for Python. Our main development language is Python, so this SDK is very "pythonic".

    We tried to keep the implementation as simple as possoble for you as developer. Our goal is that you should be able to implement this in a matter of minutes.

    Changed in this version

    New

    Older versions

    If you're looking for documentation for previous versions of Verifai then check below:

    Version Changed Link
    1.0.0 Confidence of classification, Manual mode, Security Features Python v1.0.0 SDK docs
    0.2.0 Added OCR, refactor and API design changes Python v0.2.0 SDK docs
    0.1.1 Initial version Python v0.1.1 SDK docs

    Quick introduction

    index-page

    The basic idea is that all of your users or client's private data stays within your own network. Data we do not have, we can not loose.

    Therefore the heavy lifting takes place in the "Verifai Server-side Classifier" and the SDK. The SDK sends a JPEG image to it via a HTTP POST request, and it responds with a JSON result. The SDK processes that response for you. You only have to tell it where the "Server-side Classifier" is within your network.

    When you need more information, like the kind of document you are dealing with, the name, or what data is where on the document, it fetches that from the Verifai servers.

    If you need to read the Machine Readable Zone (MRZ) you can also run the Verifai Serverside OCR service. It operates entirely on it's own, just like the Serverside Classifer. It uses advanced OCR softare to read the text on the cropped MRZ zone of the document, and postprocesses the data.

    No personal information is sent to us, never, ever.

    Prerequisites

    Install

    Install via pip

    $ pip install verifai-sdk
    

    Check if it is installed:

    Python 3.6.1 (v3.6.1:69c0db5050, May 24 2017, 01:21:04)
    [GCC 4.2.1 (Apple Inc. build 5666) (dot 3)] on darwin
    
    >>> import verifai_sdk
    >>> verifai_sdk.VERSION
    '0.2.0'
    
    >>>
    

    Setup the SDK for use first. You can install it via pip.

    The SDK has a few depencencies:

    Initialize SDK

    from verifai_sdk import VerifaiService
    
    # Setup the service with your API token
    # the token is used for `SDK` <-> `Verifai API` communication.
    service = VerifaiService(token='<API TOKEN IN HERE>')
    
    # Tell the service where on your network the "Verifai Server-side
    # Classifier" can be found.
    # See the Verifai Server-side Classifier docs for info about how to
    # set it up.
    service.add_classifier_url('http://localhost:5000/api/classify/')
    

    If the setup is tested we can continue with the initialization of the SDK. From now on we will assume that you initialized the SDK before use.

    Instead of <API TOKEN IN HERE> you fill in the API token that you got from the Verifai Dashboard.

    Please note that the SDK will check if the classifier endpoint is actualy working. It will raise a ValueError("No server available at that URL") if there is nothting there. If you go to the bare URL of the classifier you should see a example webpage with a basic drag and drop implementation.

    Classify a JPEG image

    import os
    import json
    from verifai_sdk import VerifaiService
    
    # Setup
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_classifier_url('http://localhost:5000/api/classify/')
    
    sample_dir = 'docs/sample_images/'
    image_path = os.path.join(sample_dir, 'dutch-id-front-sample.jpg')
    
    # Classify
    document, confidence = service.classify_image_path(image_path)
    print(document)  # <verifai_sdk.VerifaiDocument object at 0x10e44e710>
    print(confidence)  # 0.9984574578
    
    print(document.model)  # "Dutch ID 011"
    print(document.country)  # "NL"
    

    You now know that there is a "Dutch ID 011" in this image. Lets get it's position now.

    print(
        json.dumps(document.position_in_image)
    )
    # {
    #   'xmax': 0.6118464469909668,
    #   'xmin': 0.21217313408851624,
    #   'ymax': 0.8527788519859314,
    #   'ymin': 0.45336633920669556
    # }
    

    There are Dutch ID document sample images in the docs/sample_images/ directory. We wille be using the dutch-id-front-sample.jpg here.

    The postion is the relative position (.5 means half way on the image). If the top-left and bottom-right are given. You can use this for further processing if you like.

    In the Verifai Serverside Classifier docs is a section about how to interpert the coordinates.

    dutch-sample-processed

    Getting more information

    Classifying a image

    document, confidence = service.classify_image_path('docs/sample_images/dutch-id-front-sample.jpg')
    print(document)
    # <verifai_sdk.VerifaiDocument object at 0x10e44e710>
    

    Reading some properties of a VerifaiDocument

    print(document.model)
    # "Dutch ID 011"
    
    print(document.country)
    # "NL"
    

    The classificaion service returns a VerifaiDocument. It is the proxy we use to do operations on the result.

    The VerifaiDocument holds a reference to the image, and the data collected on it. For example the id_uuid, which is a internal reference to every type of ID document known to Verifai.

    Propterties of the VerifaiDocument:

    propterty description
    id_uuid The internal uniqe reference for every ID document known to Verifai
    id_side The side of the document, either F for Front or B for Back
    zones List of zones on the document
    image A PIL.Image object, used for all the operations
    model The model, for example "Dutch ID 011"
    country Country that issued this document, for example: "NL"
    position_in_image The coordinates of where the document is in the image

    Getting the zones of a VerifaiDocument

    print(document.zones)
    # [<verifai_sdk.VerifaiDocumentZone object at 0x10d5a5898>, ...]
    

    Displaying some information about a VerifaiDocumentZone

    print(document.zones[0].title)  # "Photo"
    print(document.zones[0].side)  # "F"
    

    Document have zones, for example, were the "photo" is located. They are listed in the "zones". We ware that the zones are mixed between front and back. Some methods in the VerifaiDocument object apply filters by them selfs to prevent for example masking the wrong parts.

    Propterties of the VerifaiDocumentZone:

    propterty description
    document Reference to the parent VerifaiDocument object
    title Name of the zone, for example: MRZ
    side The side it is on, rither F for Front or B for Back
    position_in_image The position on the zone within the bounds of the document.

    Getting raw data from the backend

    Get raw data from API

    import json
    
    # Assuming you initialized VerifaiService as `service`
    
    data = service.get_model_data('c6beadb8-ef56-43d4-b69f-e0df8163279f')
    print(json.dumps(data))
    

    It sould result in the raw API response like this

    {
            "uuid": "c6beadb8-ef56-43d4-b69f-e0df8163279f",
            "type": "IT",
            "model": "NLD-HO-02003",
            "country": "NL",
            "width_mm": 86.0,
            "height_mm": 54.0,
            "sample_front": "https://media.verifai.com/backen....nt_sample.jpg",
            "has_mrz": true,
            "zones": [
                {
                    "title": "Photo",
                    "side": "FRONT",
                    "x": 0.08,
                    "y": 0.34,
                    "width": 0.26,
                    "height": 0.54,
                    "block": true
                },
                ...
            ]
        }
    

    If you have a workflow that does not match the way VerifaiDocument and the VerifaiDocmentZones work, you can access the raw API via the VerifaiService.

    You can call service.get_model_data(uuid) to get the raw response from the API and process it however you like.

    You could process it to a VerifaiDocument. It has a __init__ method that allows you to create the object by passing this response, the binary JPEG image, and a reference to the service. In the sourcecode of the SDK you can check how the internals work.

    Using a binary of a JPEG image

    To accomodate more usecases there is also a classify_image method in VerifaiService. It will return the same as the classify_image_path.

    You can use this when processing binary files and you don't want to save it in a temp file first.

    Using other image formats and PDFs

    from verifai_sdk import VerifaiService, VerifaiPdfProcessor
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_clasifier_url('https://serverside-classifier.verifai.io/api/classify/')
    
    # A 3 page PDF with one page 2 a ID document
    processor = VerifaiPdfProcessor('NLD-AO-04002p-vertical-out-046.pdf', service)
    
    page_2_doc, page_2_confidence = processor.get_result_for_page(1)
    print(page_2_doc, page_2_confidence)
    # <verifai_sdk.document.VerifaiDocument object at 0x10efc7550> 0.9952084422111511
    
    results = processor.get_results_for_all_pages()
    print(results)
    # [(None, 0.0), (<verifai_sdk.document.VerifaiDocument object at 0x10efd5b38>, 0.9952084422111511), (None, 0.0)]
    
    # Don't forget to clean up! It should do this automatically when the
    # object gets destroyed, but better safe than sorry with personal data.
    processor.cleanup()
    

    Sometimes you have source data in a different format than JPEG. We provide several convenience methods to convert that data for you.

    There a two major types of image types, the paginated (PDF) and the bitmap like images. Because PDFs can be multi-paged, the results have to be processed page by page.

    PDF documents

    The PDF processor VerifaiClassifyPdf requires the following tools to be on the system:

    You can initialize the VerifaiClassifyPdf with a path to the PDF, and an instance of VerifaiService. See example code. The result is that you can get a result of each individual page. So for example if you know that the ID should be on page 5, you can only run the classifier on page 4 (it is zero index based).

    Bitmap images

    from PIL import Image
    from verifai_sdk import VerifaiService, VerifaiPdfProcessor
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_clasifier_url('https://serverside-classifier.verifai.io/api/classify/')
    
    im = Image.open("id_document.png")
    document, confidence = service.classify_pil_image(im)
    

    The reason you have to use JPEG is because the Neural Network only has a JPEG decoder onboard that is optimized for GPU performance.

    These convinence functions convert the image for you, but it will add a delay and processing power on your part.

    The classify_pil_image() function of the VerifaiService takes in a PIL.Image instance, converts it to a JPEG, and then sends it to the classifier.

    See Pillow docs for more info about how to open all kinds of files. A list of supported images can be found on the image file formats page.

    Cropping Documents

    Saving the cropped document and the photo

    from verifai_sdk import VerifaiService
    
    # Setup
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_classifier_url('http://localhost:5000/api/classify/')
    
    # Classify
    document, confidence = service.classify_image_path('tmp/uploaded_file.jpg')
    
    if document:  # Check if it has been classified
        # Get and save the cropped document
        document.get_cropped_image().save('strorage/cropped_document.jpg')
    
        # Find the MRZ and save it
        # We add 3% tolerance around the MRZ so we get a little bit wider crop
        for zone in document.zones():
            if zone.title == 'MRZ':
                zone = document.get_part_of_card_image(
                    zone.coordinates, tolerance=.03
                )
                zone.save('strorage/mrz.jpg')
        # cropped_document.jpg and mrz.jpg are saved.
    else:
        print('No document found')
    

    Maybe you need to save a specific part of the document for example the photo. Or you just want to have the cropped document from the image.

    Because of the modular build of the SDK you are able to do both things.

    In the example we classify first. If that is successfull we continue by saving the cropped document. After that we save the MRZ from the document.

    The cropped document: cropped-document

    The MRZ: cropped-mrz-document

    Masking documents (GDPR/Privacy filter)

    Masking the documents

    from verifai_sdk import VerifaiService
    
    # Setup
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_classifier_url('http://localhost:5000/api/classify/')
    
    # Classify
    document, confidence = service.classify_image_path('tmp/uploaded_file.jpg')
    
    # Process
    masking_whitelist = ['Photo', 'Place of birth', 'MRZ', 'V-NR']
    zones_to_mask = []
    
    for zone in document.zones:
        if zone.title in masking_whitelist:
            zones_to_mask.append(zone)  # You don't have to worry about front and back zones
    
    # Now we can pass the zones to the masking function
    masked_image = document.mask_zones(zones_to_mask)
    
    # And save the masked image
    masked_image.save('masked.jpg')
    

    You should not store any data about users or customers you do not need. That best practise is the basis of the GDPR legislation.

    In the example code we build a GDPR filter so a few zones get masked. However every industry has different needs. We have a list of fields you are not allowed to save on our website.

    cropped-mrz-document

    Reading the MRZ (Machine Readable Zone)

    Setup with OCR webservice

    import os
    from verifai_sdk import VerifaiService
    
    # Setup
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_classifier_url('http://localhost:5000/api/classify/')
    
    # This line is added to provide the url to the endpoint.
    service.add_ocr_url('http://localhost:5001/api/ocr/')
    

    The SDK itself can't read the MRZ. It uses an additional Verifai Serverside OCR webservice for that.

    In the following examples we assume that you have it running on your development machine.

    We will continue with the same example that we used before, the image is provided with the SDK and available on GitHub.

    Getting the MRZ data

    Getting the MRZ data

    # Classify
    sample_dir = 'docs/sample_images/'
    image_path = os.path.join(sample_dir, 'dutch-id-back-sample.jpg')
    document, confidence = service.classify_image_path(image_path)
    
    # You can check if the document has a MRZ. Returns None of not available.
    print(document.mrz_zone)  # <verifai_sdk.VerifaiDocumentZone object at 0x...>
    
    # Lazy method that creates a VerifaiDocumentMrz object that
    print(document.mrz)  # <verifai_sdk.VerifaiDocumentMrz object at 0x...>
    
    # Checking if the MRZ had been read successfull (triggers the webservice)
    print(document.mrz.is_successful)  # True if worked 100%
    
    # Get the fields form the MRZ
    for key, value in document.mrz.fields.items():
        print('{0}: {1}'.format(key, value))  # "names: WILLEKE LISELOTTE" etc...
    
    # Getting the score and checksum fields
    print(document.mrz.checksums.items()):  # {'score': 1.0, 'valid_composite': True, 'valid_date_of_birth': True, 'valid_expiration_date': True, 'valid_number': True}
    
    # The OCR engine tries to find the optimal rotation to read the MRZ
    print(document.mrz.rotation)  # -2 degrees rotated to straighten the MRZ
    
    # If you would like you can also access the fields before thy were processed.
    print(document.mrz.fields_raw)
    

    You obviously are doing this operation to use the results.

    There are several field you can use. We postprocess the results for you but you are still able to get the raw data by using VerifaiDocumentMrz.read_mrz().

    You can check if the MRZ is successful by checking the VerifaiDocumentMrz.is_successful bool.

    The fields are different for every document. Below is a list of all possible fields:

    field example TD1 TD2 TD3
    check_composite 8 X X X
    check_date_of_birth 1 X X X
    check_expiration_date 6 X X X
    check_number 2 X X X
    check_personal_number 3 - - X
    country NLD X X X
    date_of_birth 650310 X X X
    expiration_date 240309 X X X
    mrz_type TD1 X X X
    names WILLEKE LISELOTTE X X X
    nationality NLD X X X
    number SPECI2014 X X X
    optional1 999999990 8 X X -
    personal_number 999999990 - - X
    optional2 `` X - -
    sex F X X X
    surname DE BRUIJN X X X
    type I X X X

    Fields that start with check_ are used as checksum for other fields.

    If you get the VerifaiDocumentMrz.checkums['score'] value you will have a float value of the percentage of checked fields (in lenght) is correct. Everything correct is 1.0. If is_successful is False that does not mean there are no results at all. You can always access the raw response from the OCR service.

    Manual mode

    Lets say you would like to build a manual flow like in the serverside demo. You will have to implement the following steps:

    Below you find the SDK functions to call to make all of this happen.

    Suported Countries

    from verifai_sdk import VerifaiService
    
    service = VerifaiService(token='<API TOKEN IN HERE>')
    countries = service.get_supported_countries()
    
    print(countries)  
    
    # Outputs: 
    # [
    #   {'flag': '🇦🇩', 'name': 'Andorra', 'code': 'AD'}, 
    #   {'flag': '🇦🇪', 'name': 'United Arab Emirates', 'code': 'AE'}, 
    #   ...
    # ] 
    

    To get a listing of supported countries you can just ask a initialized VerifaiService.

    When you call VerifaiService.get_supported_countries() it wil return a dict with countries. See table below for the fields.

    property description
    flag The Unicode flag of the country, for example 🇳🇱
    name The name of the country, for example Netherlands
    code The ALPHA-2 code of the country, for example NL

    ID models for country

    from verifai_sdk import VerifaiService
    service = VerifaiService(token='<API TOKEN IN HERE>')
    
    id_models = service.get_id_models_for_country('NL')
    
    print(id_models[0])  # <verifai_sdk.id_model.IdModel object at 0x10efd51d0>
    print(id_models[0].model)  # NLD-FO-07001v
    print(id_models[0].sample_front)  # https://media.verifai.com/backend/id-samples/NLD-FO-06001-biodata.jpg
    

    When the user has selected the country you have to present a list of ID documents to choose from. You can use the service for that too.

    When you call VerifaiService.get_id_models_for_country() it returns a list with IdModel objects.

    The IdModel has several properties:

    property description
    uuid UUID of the ID model you will need later in te process
    type The type of document, P for passport etc
    model The model name, this can be used for user display
    country The ALPHA-2 country code
    width_mm Width of the document in mm
    height_mm Height of the document in mm
    sample_front Image of the front of the document
    has_mrz Bool if the document has a MRZ
    zones Raw dump of the zones from the API

    Building a VerifaiDocument by hand

    from verifai_sdk import VerifaiService, VerifaiDocument
    
    service = VerifaiService(token='<API TOKEN IN HERE>')
    
    # This data is build from user data (like a POST)
    uuid = '3553ad49-d91c-4e14-8cda-c4296e70193d'
    side = 'F'
    xmin = float(0.21)
    ymin = float(0.12)
    xmax = float(0.92)
    ymax = float(0.95)
    file = open('docs/sample_images/dutch-id-back-sample.jpg', 'rb')
    
    # Building a VerifaiDocument by hand
    document = VerifaiDocument(None, file.read(), service)
    document.set_model_data(uuid, side)
    document.set_coordinates(xmin, ymin, xmax, ymax)
    
    print(document)  # <verifai_sdk.document.VerifaiDocument object at 0x20eef51d0>
    
    # Don't forget to close open files
    file.close()
    

    When you have received the results from the user, and you have an image that has been uploaded you can process it by hand.

    You need to build a VerifaiDocument by hand. You can do that by initializing a new VerifaiDocument(None, b_jpeg_image, service) and call the set_model_data(id_uuid, id_side) and set_coordinates(xmin, ymin, xmax, ymax).

    The VerifaiDocument will populate all the data and features of the SDK by itself, giving you the exact same functionality as in the code samples above.

    Security Features

    Setup SDK

    import os
    from verifai_sdk import VerifaiService
    
    # Setup
    service = VerifaiService(token='<API TOKEN IN HERE>')
    service.add_classifier_url('http://localhost:5000/api/classify/')
    

    example-serverside

    Security Features are special properties of a document so that a person or a machine can check if that document is authentic.

    There are several types of features, ones that you can feel and see. If you have a lens you can also see details like micro prints.

    The screenshot above is taken from the serverside demo project. This SDK is the basis for that, you have all the ingredients to make something like that.

    Getting the Security Features of a VerifaiDocument

    Getting the Security Features data

    sample_dir = 'docs/sample_images/'
    image_path = os.path.join(sample_dir, 'dutch-id-front-sample.jpg')
    document, confidence = service.classify_image_path(image_path)
    
    # Get the Security Feature zones from the VerifaiDocument
    sfs = document.security_features
    i = 0
    for sf in sfs:
        # Print data from the VerifaiDocumentSecurityFeatureZone objects
        print("Security Feature {}".format(i))
        print(" - Position: min(x {xmin}, y {ymin}), max(x {xmax}, y {ymax})".format(**sf.coordinates))
        print(" - Score: {}".format(sf.score))
        print(" - Check type: {}".format(sf.check_type))
        print(" - Check question: {}".format(sf.check_question))
        print(" - Ref Image: {}".format(sf.reference_image))
        print(" - Type: {}".format(sf.type))
        print(" - Properties:")
        for key, value in sf.properties.items():
            print("   - {} = \"{}\"".format(key, value))
        i += 1
    
    # This outputs the following:
    #
    # Security Feature 1
    #  - Position: min(x 0.54, y 0.55), max(x 0.65, y 0.64)
    #  - Score: 0.3
    #  - Check type: Is the year of birth tactile?
    #  - Check question: None
    #  - Ref Image: None
    #  - Type: LaserEngraving
    #  - Properties:
    #    - content_type = "Year of birth"
    #    - kinegram_type = "Multi colour"
    #    - effect = "Kinematic"
    #    - colors = "PlainColor Black (#000000)"
    #    - method = "Raised tactile"
    # ...
    

    The VerifaiDocumentSecurityZones are a sibling of VerifaiDocumentZones. They share a common abstract parent.

    Unlike a normal zone, a security feature zone has a lot of extra parameters like the score, check_question and check_type.

    property description
    type Raw object type that is the basis for this SF
    check_question Instruction to the user what to do
    score For example, 3x 0.33 score could say a document is valid
    check_type Category of check type. For example "TILT" or "VISUAL"
    reference_image URL to a image (can be None)
    properties A dict of key value properties
    document Reference to the VerifaiDocument
    side The side the feature is on. "F" or "B"
    coordinates Dict of xmin, ymin, xmax, ymax coordinates

    Scoring system

    When you get the security features of a document there is a score field. You can use this to guide a user through checking of the security features.

    Keeping score is quite easy. If you achieve a score of 1.0 the document is probably real. From our services you always receive enough security features to reach at least 1.0, but if you let the user do all the checks, it might reach well over that value.

    The scoring system is in place to prevent spending too much time to check all the document's features. Also it can be used to randomise the checks so a fraudulent person can't know what to expect.

    Issues & FAQ

    Python 2.7 issues

    The SDK is only available for Python 3.5 and higher. Most customers run the software in containers to control the dependencies.

    It can't find the classifier / ocr server

    Check if you can curl from the container/server to the services. Most of the time it is a routing or firewall issue.

    It doesn't seem to find the document

    Check the serverside classifier directly, and debug it from there on.

    I require feature X or Y

    You should contact support, they can tell you about upcomming releases and if it's not on the roadmap, ask the development team to add a feature.