Wednesday, November 22, 2006

By language, it's her fault...


By language, it’s her fault…

Looking at this Swiss news headline in the picture, which roughly translates into English as “She let them rape her”, the rape situation is framed as if the alleged victim had caused the criminals to commit the crime. That’s not much different than say that “almost naked women deserve to be raped since they are all ‘exposed meat’” as recently stated by a Muslim leader in a public speech in Australia.

Use of language to frame situations and concepts in different (and often opposite ways) is currently used in political discourse as pointed out by George Lakoff throughout his work on metaphor and in particular in two of his books, namely “Moral Politics: how liberals and conservative think ” and “Don’t think of an elephant ”.

Language can be used as a weapon that kills freedom much as terrorism. I believe that people need to better understand language and this is only possible through education. Education does not always equate to schools. By education I basically mean awareness and access to information. Since there is no unique way of “framing” information, so the solution would be to access to multiple (inevitably subjective) views to the same information. Moreover it is necessary to understand how language constructions in different languages already provide a frame of reference that is unconsciously rooted in speakers of a specific language. The solution to this problem would be learning different language and share cultural values across national boundaries. A condition for doing so is of course the elimination of physical borders and providing free access to information.

Since most of people are lazy enough to learn foreign language, automatic translation might help. The problem is, however, that machine translation systems do not currently embody knowledge about different framing of the same situation in different languages. In other words, if we translate the above headline from French into Italian we get a truly different connotation of the fact: “Si è fatta violentare”. Of course, common sense resolves here the misinterpretation. But what about a direct French to Arabic translation where the wrongly depicted situation would have a legal status? Isn’t this misinformation?

One way to (technologically) address the problem is to build better translation dictionaries where constructions (multiword expressions) are correctly translated. Moreover, these constructions should be considered in the context in which they appear in order to be linked to the right corresponding construction in the target language.

A major effort toward this direction is undergoing within the FrameNet project at the International Computer Science Institute, Berkeley . This project is leaded by Prof. Charles Fillmore and is part of a larger initiative supported by the University of California at Berkeley on Cognitive Linguistics .

Another relevant work towardsreliable automatic translation is that of Violeta Seretan on Collocations . Collocations are groups of words that tend to go together in language. Usually collocations are translated differently across languages, especially when language stem from different roots (e.g. Latin, Germanic, Semitic, Asiatic). Words are not isolated entities in language. Rather they are as molecules with different valences. They have tendencies to bind with other words (molecules) according to a given context (the solution). That’s why is important to take words together and study their interconnections. It is not just a matter of syntactic well-formedness. It is more about conventions, cultural biases and historical development.

Language is a complex phenomenon that deserves more (scientific) investigation. Unfortunately it is not always the case since many projects in human language technology focus more on practical solutions for limited scope problems. Naïve machine translation (such as that you can experience through Google) works 60-70% of the time and it does not take into account the language subtleties such as that we have discussed so far.

1 comment:

Anonymous said...

Indeed, language is a complex phonomenon, because it's nothing but a reflexion of the real world reflected, in its turn, in the human brain. The objective realily is the same independently of time and location, but it may be reflected in our perception in different ways, and the human language is one of the means allowing to express our perception. Each nation, speaking some language, extracts from the reality the most important, from its point of view, phenomena. That's why we can find tens of words, denoting different kinds of snow, in the language of Eskimos, and, perhaps, no word with this meaning in the language of some African tribes. The same thing can be observed in the grammar structure of a language, with the only difference that grammatical laws are less obvious, more "hidden", and more regular. That's why translation from one natural language into another is so hard for a human translator, who, having study a foreign language, have no enough knowledge about the culture, traditions, history etc. of the people speaking this language. It's impossible to translate a text without full understanding of the situation described in it.

As for machine translation, it's doesn't matter what technology it is based on (the traditional rules-based technology (PROMT, Systran), statistical MT (Google) or Translation Memory (Trados), the accuracy of the target text is far from being perfect (although in most case not bad at all, and sufficient to get a rough understaning) just because the computer have no notion of the objective world, of the links not between words, but between notions denoted by these words. Surely, this problem can be partly resolved by enhances dictionaries and translation algorithms, but it cannot change the situation radically. I agree with the author that the cognitive approach may be really usefull for finding and applying of some regularities of the externe reality, but the moderne cognitive investigations are too abstract to be used directly by MT developpers, who, on the other hand, cannot invest into academic investigations, because creating MT software is very expensive by itself. However, it's the only way to reach the goal - I mean an accurate and reliable machine translation.