Add callback to optionally "repair" fields

I'm not really sure this functionality belongs here, but as the knowledge of the MRZ internal structure is only present in this module, why not... let me know what you think!

I work with scanned MRZ, and as comes with the process, the OCR sometimes mis-reads similar characters. For example, I have seen countries read as "R0U" or a name "SZ0BO5ZLAI". And the MRZ checker correctly warns that the nationality or the identifier is not valid. However, if you could add a method `repair()` to the checkers

```
def __init__(self, mrz_code: str, check_expiry=False, compute_warnings=False, precheck=True):
        precheck and check.precheck("TD1", mrz_code, 92)
        lines = mrz_code.splitlines()
        self._document_type = self.repair('document type', lines[0][0: 2])
        self._country = self.repair('country', lines[0][2: 5])
        [...]

def repair(self, field_name: str, field_content: str):
        return field_content
```

that would allow me to do things like:

```
class MyChecker(TD1CodeChecker):
    def repair(self, name, content):
        if name in ('country', 'identifier', ...):
            # I know those can only contain alphas
            return self.replace_often_mistaken_numbers_by_alphas(content)

        if name in ('expiry date', 'birth date'):
            return self.replace_often_mistaken_alphas_by_numbers(content)

    def replace_often_mistaken_numbers_by_alphas(self, s):
        return s.replace('5', 'S').replace('1', 'I').replace('0', 'O')
```

This would make the checker more useful when presented with badly scanned data. 

The alternative would be that I somehow preprocess the MRZ, but then I would have to re-implement the MRZ structure definition in my code too. As said above, I'm not a big fan of shoehorning that functionality into this module, but I don't see any other place that has enough knowledge of the MRZ structure.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add callback to optionally "repair" fields #24

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Add callback to optionally "repair" fields #24

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions