Skip to content

Add callback to optionally "repair" fields #24

@mjl

Description

@mjl

I'm not really sure this functionality belongs here, but as the knowledge of the MRZ internal structure is only present in this module, why not... let me know what you think!

I work with scanned MRZ, and as comes with the process, the OCR sometimes mis-reads similar characters. For example, I have seen countries read as "R0U" or a name "SZ0BO5ZLAI". And the MRZ checker correctly warns that the nationality or the identifier is not valid. However, if you could add a method repair() to the checkers

def __init__(self, mrz_code: str, check_expiry=False, compute_warnings=False, precheck=True):
        precheck and check.precheck("TD1", mrz_code, 92)
        lines = mrz_code.splitlines()
        self._document_type = self.repair('document type', lines[0][0: 2])
        self._country = self.repair('country', lines[0][2: 5])
        [...]

def repair(self, field_name: str, field_content: str):
        return field_content

that would allow me to do things like:

class MyChecker(TD1CodeChecker):
    def repair(self, name, content):
        if name in ('country', 'identifier', ...):
            # I know those can only contain alphas
            return self.replace_often_mistaken_numbers_by_alphas(content)

        if name in ('expiry date', 'birth date'):
            return self.replace_often_mistaken_alphas_by_numbers(content)

    def replace_often_mistaken_numbers_by_alphas(self, s):
        return s.replace('5', 'S').replace('1', 'I').replace('0', 'O')

This would make the checker more useful when presented with badly scanned data.

The alternative would be that I somehow preprocess the MRZ, but then I would have to re-implement the MRZ structure definition in my code too. As said above, I'm not a big fan of shoehorning that functionality into this module, but I don't see any other place that has enough knowledge of the MRZ structure.

Metadata

Metadata

Assignees

No one assigned

    Labels

    CHECKERMRZ.CHECKER IssuesenhancementNew feature or request

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions