How to Become a Translator - 3
Ranked #22,836 in How-To, #227,847 overall | Donates to Squidoo Charity Fund
How to Become a Translator on Internet - 3
This course on becoming a translator on Internet starts with
www.squidoo.com/internettranslator
www.squidoo.com/internettranslator2
This is the third part of the volume. We go on studying the different aspects of machine translation of languages.
The Table of Contents will show where you are in the course.
TABLE OF CONTENTS
www.sidacgroup.com
CHAPTER I - STUDIES OR KNOWLEDGE?
What does it take to become a translator?
CHAPTER II - THE TOOLS
1) THE COMPUTER
2) THE DICTIONARIES
3) THE SOFTWARES
4) THE TRANSLATION TOOLS
4.1) Machine Translation
Translation Process
4.1.2) Approaches
4.1.2.1) Rule-based
4.1.2.1.1) Transfer-based machine translation
Overview
How it works
Analysis and transformation
Transfer types
4.1.2.1.2) Interlingual
4.1.2.1.3) Dictionary-based
4.1.2.2) Statistical
Benefits
Better use of resources
More natural translations
Word-based translation
Phrase-based translation
Challenges with statistical machine translation
Syntax
4.1.2.3) Example-based machine translation
4.2) Computer Assisted Translation (CAT)
What is a CAT Tool?
4.2.1) SDL/Trados
How does it work?
The Translation Memory
What is a translation memory?
How does a translation memory work?
When would I use a translation memory?
What are the benefits of using an SDL Trados translation memory?
How does a translation memory tool differ from a terminology tool?
How does translation memory software differ from machine translation?
4.2.2) Wordfast
4.2.3) FELIX
4.2.4) Deja Vu
4.2.5) MEMO Q
What features does MemoQ basically have?
What is the difference between MemoQ and other solutions?
5) CONCLUSION
Fetching RSS feed... please stand by4.1.2.1.2) Interlingual
4.1.2.1.2) Interlingual
Interlingual machine translation is one instance of rule-based machine-translation approaches. In this approach, the source language, i.e. the text to be translated, is transformed into an interlingual, i.e. source-/target-language-independent representation. The target language is then generated out of the interlingua.
Interlingual machine translation is one of the classic approaches to machine translation. In this approach, the source language, i.e. the text to be translated is transformed into an interlingua, i.e., an abstract language-independent representation. The target language is then generated from the interlingua. Within the rule-based machine translation paradigm, the interlingual approach is an alternative to the direct approach and the transfer approach.
In the direct approach, words are translated directly without passing through an additional representation. In the transfer approach the source language is transformed into an abstract, less language specific representation. Linguistic rules which are specific to the language pair then transform the source language representation into an abstract target language representation and from this the target sentence is generated.
The interlingual approach to machine translation has advantages and disadvantages. The advantage in multilingual machine translations is that no transfer component has to be created for each language pair. The obvious disadvantage is that the definition of an interlingual is difficult and maybe even impossible for a wider domain. The ideal context for interlingual machine translation is thus multilingual machine translation in a very specific domain.
4.1.2.1.3) Dictionary-based
Machine translation can use a method based on dictionary entries, which means that the words will be translated as they are by a dictionary
Machine translation can use a method based on dictionary entries, which means that the words will be translated as a dictionary does - word by word, usually without much correlation of meaning between them. Dictionary lookups may be done with or without morphological analysis or lemmatisation. While this approach to machine translation is probably the least sophisticated, dictionary-based machine translation is ideally suitable for the translation of long lists of phrases on the subsentential (i.e., not a full sentence) level, e.g. inventories or simple catalogs of products and services. It could also be used to expedite manual translation if the person carrying it out is fluent in both languages and therefore capable of correcting syntax and grammar.
4.1.2.2) Statistical
Statistical machine translation tries to generate translations using statistical methods based on bilingual text corpora, such as the Canadian Hansard corpus, the English-French record of the Canadian parliament and EUROPARL, the record of the European Parliament. Where such corpora are available, impressive results can be achieved translating texts of a similar kind, but such corpora are still very rare. The first statistical machine translation software was CANDIDE from IBM. Google used SYSTRAN for several years, but has switched to a statistical translation method in October 2007. Recently, they improved their translation capabilities by inputting approximately 200 billion words from United Nations materials to train their system. Accuracy of the translation has improved.
Statistical machine translation (SMT) is a machine translation paradigm where translations are generated on the basis of statistical models whose parameters are derived from the analysis of bilingual text corpora. The statistical approach contrasts with the rule-based approaches to mahcine translation as well as with example-based machine translation.
The first ideas of statistical machine translation were introduced by Warren Weaver in 1949, including the ideas of applying Claude Shannon's Information Theory. Statistical machine translation was re-introduced in 1991 by researchers at IBM's Thomas j Watson research Center and has contributed to the significant resurgence in interest in machine translation in recent years. As of 2006, it is by far the most widely-studied machine translation paradigm.
Benefits
The benefits of statistical machine translation over traditional paradigms that are most often cited are the following:
Better use of resources
There is a great deal of natural language in machine-readable format.
Generally, SMT systems are not tailored to any specific pair of languages.
Rule-based translation systems require the manual development of linguistic rules, which can be costly, and which often do not generalize to other languages.
More natural translations
The ideas behind statistical machine translation come out of Information Theory. Essentially, the document is translated on the probability p (e/f) that a string e in native language (for example, English) is the translation of a string f in foreign language (for example, French). Generally, these probabilities are estimated using techniques of parameter estimation.
The Bayes Theorem is applied to p(e/f), the probability that the foreign string produces the native string to get , where the translation model p(f/e) is the probability that the native string is the translation of the foreign string, and the language model p(e) is the probability of seeing that native string. Mathematically speaking, finding the best translation is done by picking up the one that gives the highest probability:
.
For a rigorous implementation of this one would have to perform an exhaustive search by going through all strings e * in the native language. Performing the search efficiently is the work of a machine translation decoder that uses the foreign string, heuristics and other methods to limit the search space and at the same time keeping acceptable quality. This trade-off between quality and time usage can also be found in speech recognition.
As the translation systems are not able to store all native strings and their translations, a document is typically translated sentence by sentence, but even this is not enough. Language models are typically approximated by smoothed n-gram models, and similar approaches have been applied to translation models, but there is additional complexity due to different sentence lengths and word orders in the languages.
The statistical translation models were initially word based (Models 1-5 from IBM), but significant advances were made with the introduction of phrase based models. Recent work has incorporated syntax or quasi-syntactic structures.
Word-based translation
In word-based translation, translated elements are words. Typically, the number of words in translated sentences are different due to compound words, morphology and idioms. The ratio of the lengths of sequences of translated words is called fertility, which tells how many foreign words each native word produces. Simple word-based translation is not able to translate language pairs with fertility rates different from one. To make word-based translation systems manage, for instance, high fertility rates, the system could be able to map a single word to multiple words, but not vice versa. For instance, if we are translating from French to English, each word in English could produce zero or more French words. But there's no way to group two English words producing a single French word.
An example of a word-based translation system is the freely available GIZA++ package (GPLed), which includes IBM models
Phrase-based translation
In phrase-based translation, the restrictions produced by word-based translation have been tried to reduce by translating sequences of words to sequences of words, where the lengths can differ. The sequences of words are called, for instance, blocks or phrases, but typically are not linguistic phrases but phrases found using statistical methods from the corpus. Restricting the phrases to linguistic phrases has been shown to decrease translation quality.
Challenges with statistical machine translation
Problems that statistical machine translation have to deal with include:
Compound words
Idioms
Morphology
Different word orders
Word order in languages differ. Some classification can be done by naming the typical order of subject (S), verb (V) and object (O) in a sentence and one can talk, for instance, of SVO or VSO languages. There are also additional differences in word orders, for instance, where modifiers for nouns are located.
In Speech Recognition, the speech signal and the corresponding textual representation can be mapped to each other in blocks in order. This is not always the case with the same text in two languages. For SMT, the translation model is only able to translate small sequences of words and word order has to be taken into account somehow. Typical solution has been re-ordering models, where a distribution of location changes for each item of translation is approximated from aligned bi-text. Different location changes can be ranked with the help of the language model and the best can be selected.
Syntax
Out of vocabulary (OOV) words
SMT systems store different word forms as separate symbols without any relation to each other and word forms or phrases that were not in the training data cannot be translated. Main reasons for out of vocabulary words are the limitation of training data, domain changes and morphology.
To be followed...
http://www.squidoo.com/internettranslator4
Build Your Own Library With Valuable Books
New Guestbook
Like this lens? Want to share your feedback, or just give a thumbs up? Be the first to submit a blurb!
How to Become a Translator on Internet- Part 3
This course on becoming a translator on Internet starts with
www.squidoo.com/internettranslator
www.squidoo.com/internettranslator2
This is the third part of the volume. We go on studying the different aspects of machine translation of languages.
The Table of Contents will show where you are in the course.
- Jollo Language Translation Comparison - ProgrammableWeb Mashup Detail
- Jollo Language Translation Comparison: Compare search results of popular Translation Engines including Google Translate, Microsoft, Yahoo Babelfish and request human translations to ensure ...
- Jollo Language Translation Comparison | domainmacher.com
- Compare search results of popular Translation Engines including Google Translate, Microsoft, Yahoo Babelfish and request human translations to ensure best quality. The site supports more than 21 languages. Date Updated: 2009-07-05 Tags: ...
- Language Translation English-Arabic Translations Project | Make ...
- Best Transalation Services - Free Translation Tools.
- Language Translation English-Arabic Translations Project
- Filed in: Language Translation I need a Arabic Translator to work with me. an ongoing project. If i like your work, i will join you my team. Bid for 1000 words, as low as you can, because this project would give you more work in future ...
by chrisger
I was born on December 7, 1941 in Hanoi, Vietnam, ex-French Indochina where I spent the first 5 years of my life.
I came... (more)

