Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2015OmegaT/OmegaT/doc_src/en/MachineTranslation.xml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
295 lines (240 sloc)
11.6 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"../../../docbook-xml-4.5/docbookx.dtd"> | |
<chapter id="chapter.machine.translate"> | |
<title>Machine Translation<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
</indexterm></title> | |
<section> | |
<title>Introduction<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Introduction</secondary> | |
</indexterm></title> | |
<para>As opposed to user-generated translation memories (as in the case of | |
<application>OmegaT</application>) Machine translation (MT) tools use | |
rule-based linguistic tools to create a translation of the source segment | |
without the need for a translation memory. Statistical learning | |
techniques, based on source and target texts, are used to build a | |
translation model. Machine translation services have been achieving good | |
and steadfastly improving results in research evaluations.</para> | |
<para>To activate any of the Machine Translation services, go to | |
<guimenuitem>Options > Machine Translate ...</guimenuitem> and activate | |
the service desired. Note that they are all web-based: you will have to be | |
on-line if you want to use them.</para> | |
</section> | |
<section id="introduction"> | |
<title>Google Translate<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Google Translate</secondary> | |
</indexterm></title> | |
<para>Google Translate is a payable service offered by Google, for | |
translating sentences, web sites and complete texts between an | |
ever-growing number of languages. At the time of writing the list includes | |
more than 50 languages, from Albanian to Yiddish, including of course all | |
the major languages. The current version of the service is based on usage, | |
with the price of 20 USD per million characters at the time of writing. | |
<emphasis role="bold"/></para> | |
<para><emphasis role="bold">Important: </emphasis>Google Translate API v2 | |
requires billing information for all accounts before you can start using | |
the service (see <ulink | |
url="https://developers.google.com/translate/v2/pricing?hl=en-US">Pricing | |
and Terms of Service</ulink> for more). To identify yourself as a valid | |
user for the Google services, you use your private unique key sent to you | |
by Google, when you have registered for the service. See chapter <link | |
linkend="chapter.installing.and.running">Installing and Running</link>, | |
section Launch command arguments, for details on how to add this key to | |
the OmegaT environment.</para> | |
<para>The quality of the Google Translate translation depends on one side | |
on the reservoir of target-language texts and the availability of their | |
bilingual versions, on the other hand on the quality of the models built. | |
It is pretty much certain that while the quality may be insufficient in | |
some cases, it will definitely get better with time and not worse.</para> | |
</section> | |
<section> | |
<title><application>OmegaT</application> users and Google | |
Translate</title> | |
<para>The <application>OmegaT</application> user is not forced to use | |
Google Translate. If used, neither the user's decision to accept the | |
translation nor the final translation are made available to Google. The | |
following window shows an example of a) the English source b) Spanish and | |
c) Slovenian Google Translate translation.</para> | |
<figure id="moby.dick"> | |
<title>Google Translate - example</title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata fileref="images/MobyDick.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/MobyDick.png" width="60%"/> | |
</imageobject> | |
</mediaobject> | |
</figure> | |
<para>The Spanish translation is better than the Slovenian. Note | |
<emphasis>interesar</emphasis> and<emphasis> navegar</emphasis> in | |
Spanish, are correctly translated as the verbs interest and sail | |
respectively. In the Slovenian version both words have been translated as | |
nouns. It is actually quite probable that the Spanish translation is based | |
at least partially on the actual translation of the book.</para> | |
<para>Once you have activated the service, a suggestion for the | |
translation will appear in the Machine Translate pane every time a new | |
source segment is opened. If you find the suggestion acceptable, press | |
<keycombo> | |
<keycap><indexterm class="singular"> | |
<primary>Shortcuts</primary> | |
<secondary>Machine Translate - Ctrl+M</secondary> | |
</indexterm>Ctrl</keycap> | |
<keycap>M</keycap> | |
</keycombo> to replace the target part of the opened segment with the | |
suggestion. In the above segment, for instance, <keycombo> | |
<keycap>Ctrl</keycap> | |
<keycap>M</keycap> | |
</keycombo> would replace the Spanish version with the Slovenian | |
suggestion.</para> | |
<para>If you do not wish <application>OmegaT</application> to send your | |
source segments to Google to get translated, untick the Google Translate | |
menu entry in Options.</para> | |
<para>Note that nothing but your source segment is sent to the MT service. | |
The online version of Google Translate allows the user to correct the | |
suggestion and send the corrected segment in. This feature, however, is | |
not implemented in OmegaT.</para> | |
</section> | |
<section> | |
<title>Belazar<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Belazar</secondary> | |
</indexterm></title> | |
<para><ulink url="http://belazar.info/">Belazar</ulink> is a Machine | |
language translation tool for the Russian-Belarusian language pair.</para> | |
</section> | |
<section> | |
<title>Apertium<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Apertium</secondary> | |
</indexterm></title> | |
<para><ulink url="http://www.apertium.org/">Apertium</ulink> is a | |
free/open-source machine translation platform, initially aimed at | |
related-language pairs, like CA, ES, GA, PT, OC and FR but recently | |
expanded to deal with more divergent language pairs (such as | |
English-Catalan). Check the web site for the latest list of implemented | |
language pairs.</para> | |
<para>The platform provides</para> | |
<itemizedlist> | |
<listitem> | |
<para>a language-independent machine translation engine</para> | |
</listitem> | |
<listitem> | |
<para>tools to manage the linguistic data necessary to build a machine | |
translation system for a given language pair </para> | |
</listitem> | |
<listitem> | |
<para>linguistic data for a growing number of language pairs</para> | |
</listitem> | |
</itemizedlist> | |
<para>Apertium uses a shallow-transfer machine translation engine which | |
processes the input text in stages, as in an assembly line: de-formatting, | |
morphological analysis, part-of-speech disambiguation, shallow structural | |
transfer, lexical transfer, morphological generation, and | |
re-formatting.</para> | |
<para>It is possible to use Apertium to build machine translation systems | |
for a variety of language pairs; to that end, Apertium uses simple | |
XML-based standard formats to encode the linguistic data needed (either by | |
hand or by converting existing data), which are compiled using the | |
provided tools into the high-speed formats used by the engine.</para> | |
</section> | |
<section> | |
<title>Microsoft Translator<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Microsoft Translator</secondary> | |
</indexterm></title> | |
<para>In order to get credentials for MS Translator, follow these | |
steps:</para> | |
<orderedlist> | |
<listitem> | |
<para>Log into Microsoft Azure Marketplace: <ulink | |
url="http://datamarket.azure.com/">http://datamarket.azure.com/</ulink></para> | |
<para>If you do not already have an Azure Marketplace account you will | |
need to register one first.</para> | |
</listitem> | |
<listitem> | |
<para>Click on the My Account option at the top of the page.</para> | |
</listitem> | |
<listitem> | |
<para>Near the bottom you will see entries and values for: | |
<itemizedlist> | |
<listitem> | |
<para>Primary Account Key (corresponds to | |
<code>microsoft.api.client_secret</code> command-line | |
parameter)</para> | |
</listitem> | |
<listitem> | |
<para>Customer ID (corresponds to | |
<code>microsoft.api.client_id</code> command-line | |
parameter)</para> | |
</listitem> | |
</itemizedlist></para> | |
</listitem> | |
</orderedlist> | |
<para>To enable MS Translator in OmegaT edit its launcher or see chapter | |
<link linkend="chapter.installing.and.running">Installing and | |
Running</link> to learn how to start OmegaT from the command line.</para> | |
</section> | |
<section> | |
<title>Yandex Translate<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Yandex Translate</secondary> | |
</indexterm></title> | |
<para>In order to be able to use Yandex Translate in OmegaT, you need to | |
<ulink url="http://api.yandex.com/key/form.xml?service=trnsl">obtain an | |
API key from Yandex</ulink>.</para> | |
<para>The obtained API key needs to be passed to OmegaT at startup through | |
<code>yandex.api.key</code> command-line parameter. To do that edit OmegaT | |
launcher or see chapter <link | |
linkend="chapter.installing.and.running">Installing and Running</link> to | |
learn how to start OmegaT from the command line.</para> | |
</section> | |
<section id="Google.Translate.troubleshooting"> | |
<title>Machine translation - trouble shooting<indexterm class="singular"> | |
<primary>Machine Translation</primary> | |
<secondary>Troubleshooting</secondary> | |
</indexterm></title> | |
<para>If there's nothing appearing in the Machine Translate pane, then | |
check the following:</para> | |
<itemizedlist> | |
<listitem> | |
<para>Are you online? You need to be online to be able to use an MT | |
tool.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>What is the language pair you need? Check if the selected | |
service offers it.</para> | |
</listitem> | |
<listitem> | |
<para>Google Translate does not work: have you applied <ulink | |
url="https://developers.google.com/translate/v2/faq">Translate API | |
service</ulink>? Note that Google Translate service is not free of | |
charge, see chapter <link | |
linkend="chapter.installing.and.running">Installing and Running</link> | |
(runtime parameters) for more on that.</para> | |
</listitem> | |
<listitem> | |
<para>"Google Translate returned HTTP response code: 403 ...": check | |
that the 38-characters key, entered in the pinfo.list file, is | |
correct. Check that <ulink | |
url="https://developers.google.com/translate/v2/faq">Translate API | |
service </ulink>has been activated.</para> | |
</listitem> | |
<listitem> | |
<para>Google Translate does not work: - with the Google API key | |
entered as requested. Check in Options > Machine Translate, that | |
Google Translate V2 is checked.</para> | |
</listitem> | |
<listitem> | |
<para>Google Translate V2 reports "Bad request" - check the source and | |
target languages for your project. Having no languages defined elicits | |
this kind or a response.</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
</chapter> |