The foreign language copycat catcher
The foreign language copycat catcher
A team led by Alberto Barron-Cedeno at the Polytechnic University of Catalonia, Spain, used a number of statistical methods to analyse suspicious-looking documents. One involved breaking each text down into fragments that were five sentences long and looking for elements of words that were similar in two languages.
Another method used a bilingual dictionary to automatically check how many words in each text were the same. The documents could also be translated into a language with a common root to make the analysis easier.
The results surprised even them: their technique showed "remarkable performance" not only in identifying entire documents that had been copied – but in spotting tracts that made use of excessive paraphrasing, too (Knowledge Based Systems, doi.org/nqc). If a document is flagged by the system as being similar to another, then human experts can take a closer look.
This article appeared in print under the headline "Cheating is cheating – in any language"