Most Common False Positive Issues in Arabic Translation QA Checks

Our clients usually wonder why Arabic translation QA reports always have more false positive (FP) errors than any other language, especially Latin languages. As an Arabic translation and localization services provider, we have been through the same scenario over and over again, where clients get back to us with a big panic asking for a review and implementation of long QA reports generated by translation QA tools. They are usually surprised, and of course relieved, when we send back the reports with all issues, or most of them, marked as false positive issues. We usually provide general comments along with the commented reports stating reasons why these errors are FPs, which is the main reason behind the idea to write this blog.

It is very important that clients, especially ones who are not native speakers of the language, are aware of at least the most common reasons for having many false positive results in Arabic translation QA reports. This saves both the Arabic language service provider and client's time and effort.

Below are some of these most common issues:

1- Punctuation issues; missing spaces before and after tags/placeholders

a) This happens when there is a tag after the conjunction "and" translated to the Arabic conjunction letter "و", which should not be followed by a space, but directly attached to the next word.
b) It can also happen due to the different structure of the English and Arabic sentences, causing different word/tag order.

2- Missing numbers

a) This happens when numbers are translated into words.
b) It also happens with numbers 1 & 2, which are not usually kept in translation in cases like "1 file" and "2 files", as singular form in Arabic does not require adding number "1"; it is already reflected when simply using the word "file". Same applies for "2 files"; the dual form "ملفان" already reflects the number "2".
c) Missing numbers are also reported when Hindi number format is used, while Latin numbers where expected. Using Latin vs. Hindi numbers depends on the client preference.

3- Terminology - non-matching glossary items

This type of false positive issues is basically caused by any of the following:
a) Diacritics being used in either the glossary term translation or the target translation being checked.
b) Different encoding for some letters, although they are appearing exactly the same and are both correct.
c) Plural and singular forms being used in either the glossary term translation or the target translation being checked, which means a different spelling, resulting in a false positive glossary issue.
d) Articles being used in either the glossary term translation or the target translation being checked; forming a different spelling again.
e) Differentiation between genders; like in adjectives, resulting in different spellings as well.

4- Inconsistency

a) You might have two source segments with exact same text and only different numbers. In this case, you shouldn't always expect to have exact same translation with only different numbers in Arabic translation. Example to this "2 files found.", "3 files found." and "11 files found.". The word "files" is translated differently in each of the three segments as "ملفين", "ملفات" and "ملف", respectively, which is reported as a consistency issue.
b) Different translation for same source; for example, if the word "Open" appears in two different contexts; a button name and a status.

5- Tag Issues

a) Tag order change: The Arabic sentence has a different structure than the English one, which might cause changes to tag order in Arabic translation. An example to this: "On {DATE}, {NO_OF_POINTS} points will be added to your credit." is translated in Arabic to read: "{NO_OF_POINTS} points will be added to your credit on {DATE}." in order to better follow Arabic grammar.

In order to avoid having so many false positive issues in Arabic translation QA checks, Translation QA tools should be fed with such cases. Although feeding QA systems might not be an easy job, we can set rules to cover at least some of the most repetitive cases, like the diacritic handling, number formats, articles, different character encoding, spacing and conjunctions. This is what we are currently doing with our translation QA tool developed internally by Saudisoft.

Of course, there are more false positive issues reported by translation QA tools, some of them are specific to client preferences. Our team at Saudisoft will be happy to discuss the above in detail, answer any questions you may have or offer language consultancy, not only for Arabic, but also for other several languages.