In the myth about the Tower of Babel, people conspired to build a city and tower that would reach heaven. Their creator observed, “And now nothing will be restrained from them, which they have imagined to do.” According to the myth, God thwarted this effort by creating diverse languages so that they could no longer collaborate.
In our modern times, we’re experiencing a state of unprecedented connectivity thanks to technology. However, we’re still living under the shadow of the Tower of Babel. Language remains a barrier in business and marketing. Even though technological devices can quickly and easily connect, humans from different parts of the world often can’t.
Translation agencies step in, making presentations, contracts, outsourcing instructions, and advertisements comprehensible to all intended recipients. Some agencies also offer “localization” expertise. For instance, if a company is marketing in Quebec, the advertisements need to be in Québécois French, not European French. Risk-averse companies may be reluctant to invest in these translations. Consequently, these ventures haven’t achieved full market penetration.
Global markets are waiting, but AI-powered language translation isn’t ready yet, despite recent advancements in natural language processing and sentiment analysis. AI still has difficulties processing requests in one language, without the additional complications of translation. In November 2016, Google added a neural network to its translation tool. However, some of its translations are still socially and grammatically odd. I spoke to technologists and a language professor to find out why.
“To Google’s credit, they made a pretty massive improvement that appeared almost overnight. You know, I don’t use it as much. I will say this. Language is hard,” said Michael Housman , chief data science officer at RapportBoost.AI and faculty member of Singularity University.
He explained that the ideal scenario for machine learning and artificial intelligence is something with fixed rules and a clear-cut measure of success or failure. He named chess as an obvious example, and noted machines were able to beat the best human Go player. This happened faster than anyone anticipated because of the game’s very clear rules and limited set of moves.
Housman elaborated, “Language is almost the opposite of that. There aren’t as clearly-cut and defined rules. The conversation can go in an infinite number of different directions. And then of course, you need labeled data. You need to tell the machine to do it right or wrong.”
Housman noted that it’s inherently difficult to assign these informative labels. “Two translators won’t even agree on whether it was translated properly or not,” he said. “Language is kind of the wild west, in terms of data.” […]