-
Notifications
You must be signed in to change notification settings - Fork 129
Description
I'm not really sure this functionality belongs here, but as the knowledge of the MRZ internal structure is only present in this module, why not... let me know what you think!
I work with scanned MRZ, and as comes with the process, the OCR sometimes mis-reads similar characters. For example, I have seen countries read as "R0U" or a name "SZ0BO5ZLAI". And the MRZ checker correctly warns that the nationality or the identifier is not valid. However, if you could add a method repair()
to the checkers
def __init__(self, mrz_code: str, check_expiry=False, compute_warnings=False, precheck=True):
precheck and check.precheck("TD1", mrz_code, 92)
lines = mrz_code.splitlines()
self._document_type = self.repair('document type', lines[0][0: 2])
self._country = self.repair('country', lines[0][2: 5])
[...]
def repair(self, field_name: str, field_content: str):
return field_content
that would allow me to do things like:
class MyChecker(TD1CodeChecker):
def repair(self, name, content):
if name in ('country', 'identifier', ...):
# I know those can only contain alphas
return self.replace_often_mistaken_numbers_by_alphas(content)
if name in ('expiry date', 'birth date'):
return self.replace_often_mistaken_alphas_by_numbers(content)
def replace_often_mistaken_numbers_by_alphas(self, s):
return s.replace('5', 'S').replace('1', 'I').replace('0', 'O')
This would make the checker more useful when presented with badly scanned data.
The alternative would be that I somehow preprocess the MRZ, but then I would have to re-implement the MRZ structure definition in my code too. As said above, I'm not a big fan of shoehorning that functionality into this module, but I don't see any other place that has enough knowledge of the MRZ structure.