info@aglatech14.it
+39 02 36630270

Follow Us

Back to top

Aglatech14

  /  Articles   /  Where is the history of machine translation going?

Where is the history of machine translation going?

Machine translation in modern history has not always been as thriving as it is today. Indeed, not long after the Georgetown-IBM experiment in 1954, the first successful attempt at automatic translation Russian into English, something happened.

“Machine Translation” presumably means going by algorithm from machine-readable source text to useful target text, without recourse to human translation or editing. In this context, there has been no machine translation of general scientific text, and none is in immediate prospect. – ALPAC, “Language and Machines: Computers in Translation and Linguistics”[1]

In 1966, this paragraph alone (with only the other 138 pages comprised in the report) was enough to have the U.S. government reconsider the funds to be destined to the study of machine translation.

However, regardless of the ALPAC report, a few pioneering companies and researchers continued studying machine translation, until the real paradigm shift happened in the 1980s, when statistical machine translation was created.

The new technology was more than a simple geek-friendly, futuristic toy and the change it induced did not affect only the academic world of research, nor even just the language and localization industry. This new technology changed the history of machine translation.

State of the art for researchers

The rule-based approach, used after World War 2 and until the late eighties, was created by explicitly teaching the machine rules, terminology, and tricks of the two languages it had to operate with. Which means that the first challenge was to have linguists and programmers talk and understand each other, and that needed to happen whenever the system had to be updated or a new language pair had to be added.

The new data-driven engines, on the other hand, apparently required a lot less communication between humans. By leveraging machine learning algorithms, the machine was taught how to learn by itself based on the data it was fed during training. It was like teaching a child how to cook by repeatedly showing him that statistically it is more probable for cream to be used in desserts than in carbonara[2].

State of the art for us mere mortals

Not long after the new statistical, data-driven machine translation systems were invented, something crucial happened in the history of machine translation: it was released to the wide public. The first version of well-known systems like Google Translate became available online for everyone to use and the concept of automatic translation began leaving the sci-fi world to become a handy daily tool for the increasingly digitalized world.

Unfortunately, the quality of such early engines was rather low and anyone learning languages and/or translation before 2016, like me, must have heard the unfortunate comment of: “Is this Google Translate?” to underline a particularly unsuccessful attempt at making sense of a foreign language.

Enters neural machine translation

Why did I mention pre-2016 students specifically?

Not just to point out that I feel old…

In 2016, a paper titled “Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation”[3] by Google researchers launched a new type of machine translation: Neural Machine Translation. This was a milestone in the history of machine translation.

Since then, with a few crucial modifications, such as the introduction of the attention mechanism that led to modern Transformers, neural architectures have been the most widely used type of automatic translation and have replaced most of the old online translators, including the infamous Google Translator of yore.

What this means, in practical terms, is that the quality of the automatic translation that anyone can get access to has increased dramatically and now students may indeed get away with automatically translated homework, I am 90% sure that if I ask a Japanese person where the station is I will not end up at the zoo, and you may want to ask a machine to translate 𒀝𒅗𒁺𒌑 for you.

(Notice I would only be 90% sure, because I still retain a 10%-worth of doubt that any machine would know the difference between a station and the zoo.)

But then came ChatGPT

At the end 2022, a company called OpenAI released ChatGPT (GPT stands for Generative Pre-Trained Transformer). The humongous amount of data and of mathematical operations this neural network is trained with allows it to give the user the impression that it understands the questions it is fed and the answers it gives.

Whether this is true and to what degree (and in what way) is still open for debate.

But one thing is for sure: we are now in the era of the LLMs, the Large Language Models (a model is an AI algorithm; language models are algorithms designed to interact with human languages, and large because…well, it’s big.)

The indisputable feature that potentially makes this new knickknack revolutionary in the history of machine translation, is that questions can be formulated in natural language. This means you can interact with LLMs without knowing anything about programming, and possibly even without knowing English (though the actual multilingualism of such AI is still far from perfect).

So far, so good. So where is the catch?

There are, indeed, a few and some of them are also well hidden behind the veil of curiosity and fun that this out-of-the-box user-ready AI is offering.

If it’s true what Arthur C. Clarke said, that “any sufficiently advanced technology is indistinguishable from magic”, I would argue, unfortunately, that such tools are really not that advanced and that this is to be seen clearly in the numerous risks and issues that are already becoming apparent on many levels.

If you have not yet come across any of them, I’ll sum up the main ones next time, for your convenience.

 

[1] ALPAC Report

[2] Yes, no cream in the carbonara! Spaghetti alla Carbonara – Il Cucchiaio d’Argento

[3] Wu et al., Google’s Neural Machine Translation System Bridging the Gap between Human and Machine Translation

 

Elena Murgolo – Language Technology R&D Lead Orbital14

Elena started out as a conference interpreter for English, German and Italian, but grew attached to machines and ended up combining the two worlds by specialising in translation software, machine translation and language technology in general. In recent years, she has presented at various international conferences (MT Summit, NETTT, EAMT) and tried to pass on her passion through courses in specialised master courses (EM TTI). Her papers include: Murgolo E., Productivity Evaluation in MT Post-Editing and Fuzzy Matches Editing. Setting the Threshold; Murgolo E. et al., A Quality Estimation and Quality Evaluation Tool for the Translation Industry. However, to satisfy her evil side as well, she also reviews the papers of other experts as a member of the Programme Committees of the same conferences she speaks at (TRITON, Hit-IT, NLP4TIA).