Skip to content

Improve translation memory management #7346

@rhofer

Description

@rhofer

Situation

In our weblate self-hosted approach, any translation components are premanently onboarded. In order to benefit cross component / project suggestions e.g. with "automatic suggestions", both default machineries are activated (Weblate: live component look-up, Weblate Translation Memory: TM look-up).

Working on tanslation, one essential aspect is to harmonize terminology in use across various components or even across various projects. For example, we may start with a first translation then later on it is needed to revise it for the sake of harmonized terminology.
This results in having old, obsoleted strings enriched in TM, as well as the latest, harmonized and approved string.

Over the time, this leads to a polluted TM, where the machinery Weblate Translation Memory provides outdated, obsolete or (meanwhile) even forbidden strings. In e.g. "automatic suggestion" tab, potential suggestions meanwhile became a mix of old TM results, latest TM results and live results from active components.

Meanwhile, this is heavily puzzling translators and even leads to mistakes in a way, that translators pick outdated terms or even forbidden ones from TM.

Goal

As a translator, I don't want to see a history of text memory. More specifically, I only want to see auto suggestions based on texts which are currently valid and approved translations.

Problem

Today, weblate provides no means to manually or automatically clean up TMs in order to get rid of "old" stuff and hence avoiding translation mistakes if translators base on TM results. Therefore, with translations continuously happening and todays automatic enrichement of TM, the pollution of TM continuously grows.

This issue is collecting options in order to improve TM management. In order to make a specific option implementable, this is/will be carved out to a specific, individual issue.

Option 1: improve global TM management - delete and recreate

This option is described with #7347

This would be very helpful in order to counteract TM pollution in a manual way, but with a kind of "mass operation", where clean-up does not happen on individual string base but on full TM scopes.

Option 2: individual deletion of TM entries

This option is described with #6440

In selective situations this is helpful. Currently, for my situation, Option 1 would be sufficient.

Option 3: automatic TM maintenance based on review state

This option is not yet put to a individual issue, since this requires discussion first.

  • Only approved strings shall go to TM
  • If string enjoys changes and is approved again, this shall replace the old string in TM (automatic clean-up). Old string to be removed from TM, so that NOT both are proposed for automatic suggestion
  • This shall be an automated solution, no manual intervention required.
  • Provide a configuration option in order to enable/disable this automatic clean-up.
    • allow to configure it for shared, project and personal TM
    • Default value to be set by env variable (docker use case)
    • If env var not set, put default = disable for backwards compatibility

Excluded:

  • Manually uploaded TMs either on project level or personal level shall not be affected. This still shall work as-is.

Affects:

  • Shared TM
  • Project TM
  • Personal TM

Pros

  • Automated solution, no manual interaction

Cons

  • Algorithm for automatic action might be difficult to define in order to not miss a use case or even in order to not destroy a use case.
  • Automatic actions on TM management are globally executed. Selectiveness depends on algorithm implemented.

Option 4: provide option to switch off automatic TM enrichement

This optoin is described with #7348

In our use case, we primarily build on the Weblate machinery providing a live look-up to all connected components. Once this option is available, we would switch off automatic TM enrichement.

Remark to any options

All this only affects the automatic enrichement of TMs.
What still shall be possible (as-is today):

  • Upload manually an external TM
  • TM machinery to offer results from any existing TM (e.g. in "Automatic suggestion" tab)

Preferred solution approach

For our use case we are requiring the following options as a best fit for a next improvement step:

  • Option 1 and Option 4

Metadata

Metadata

Assignees

Labels

Waiting for: ImplementationAdded to a milestone, will be resolved according to the milestone timeline.enhancementAdding or requesting a new feature.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions