Skip to content
Permalink
8bed31719c
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
694 lines (575 sloc) 31 KB
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../docbook-xml-4.5/docbookx.dtd">
<chapter id="chapter.translation.memories">
<title>Translation memories<indexterm class="singular">
<primary>Translation memories</primary>
</indexterm><indexterm class="singular">
<primary>TMX</primary>
<see>Translation memories</see>
</indexterm></title>
<section id="OmegaT.and.tmx.files">
<title>Translation memories in OmegaT</title>
<section id="tmx.files.location.and.purpose">
<title>tmx folders - location and purpose</title>
<para><application>OmegaT</application> projects can have translation
memory files - i.e. files with the extension tmx - in five different
places:</para>
<variablelist>
<varlistentry>
<term><indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Subfolder omegat</secondary>
<seealso>Project files</seealso>
</indexterm>omegat folder</term>
<listitem>
<para>The omegat folder contains the
<filename>project_save.tmx</filename> and possibly a number of
backup TMX files. The <filename>project_save.tmx</filename> file
contains all the segments that have been recorded in memory since
you started the project. This file always exists in the project.
Its contents will always be sorted alphabetically by the source
segment.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Project main folder</secondary>
</indexterm>main project folder</term>
<listitem>
<para>The main project folder contains 3 tmx files,
<filename>project_name-omegat.tmx</filename>,
<filename>project_name-level1.tmx</filename> and
<filename>project_name-level2.tmx</filename> (project_name being
the name of your project).</para>
<itemizedlist>
<listitem>
<para>The level1 file contains only textual
information.</para>
</listitem>
<listitem>
<para>The level2 file encapsulates
<application>OmegaT</application> specific tags in correct tmx
tags so that the file can be used with its formatting
information in a translation tool that supports tmx level 2
memories, or <application>OmegaT</application> itself.</para>
</listitem>
<listitem>
<para>The <application>OmegaT</application> file includes
<application>OmegaT</application> specific formatting tags so
that the file can be used in other
<application>OmegaT</application> projects</para>
</listitem>
</itemizedlist>
<para>These files are copies of the file
<filename>project_save.tmx</filename>, i.e. of the project's main
translation memory, excluding the so-called orphan segments. They
carry appropriately changed names, so that its contents still
remain identifiable, when used elsewhere, for instance in the
<filename>tm</filename> subfolder of some other project (see
below).</para>
</listitem>
</varlistentry>
<varlistentry>
<term><filename><indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Subfolder tm</secondary>
<seealso>Project files</seealso>
</indexterm>tm</filename> folder</term>
<listitem>
<para>The /tm/ folder can contain any number of ancillary
translation memories - i.e. tmx files. Such files can be created
in any of the three varieties indicated above. Note that other CAT
tools can export (and import as well) tmx files, usually in all
three forms. The best thing of course is to use OmegaT-specific
TMX files (see above), so that the in-line formatting within the
segment is retained.</para>
<para>The contents of translation memories in the tm subfolder
serve to generate suggestions for the text(s) to be translated.
Any text, already translated and stored in those files, will
appear among the fuzzy matches, if it is sufficiently similar to
the text currently being translated.</para>
<para>If the source segment in one of the ancillary TMs is
identical to the text being translated, OmegaT acts as defined in
the <menuchoice>
<guimenu>Options</guimenu>
<guimenuitem>Editing Behavior...</guimenuitem>
</menuchoice> dialog window. For instance (if the default is
accepted), the translation from the ancillary TM is accepted and
prefixed with<emphasis> [fuzzy]</emphasis>, so that the translator
can review the translations at a later stage and check whether the
segments tagged this way, have been translated correctly (see the
<link linkend="chapter.translation.editing">Editing
behavior</link> chapter) <menuchoice>
<guimenu>.</guimenu>
</menuchoice></para>
<para>It may happen, that translation memories, available in the
<filename>tm</filename> subfolder, contain segments with identical
source text, but differing targets. TMX files are read sorted by
their names and segments within a given TMX file line by line. The
last segment with the identical source text will thus prevail
(Note: of course it makes more sense to avoid this to happen in
the first place).</para>
<para>Note that the TMX files in the tm folder can be compressed
with gzip.<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>compressed</secondary>
</indexterm></para>
</listitem>
</varlistentry>
<varlistentry>
<term><indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Subfolder tm/auto</secondary>
<seealso>Project files</seealso>
</indexterm>tm/auto folder<indexterm class="singular">
<primary>Project</primary>
<secondary>Pretranslation</secondary>
</indexterm></term>
<listitem>
<para>If it is clear from the very start, that translations in a
given TM (or TMs) are all correct, one can put them into
the<emphasis role="bold"> tm/auto</emphasis> folder and avoid
confirming a lot of<emphasis> [fuzzy]</emphasis> cases. This will
effectively <emphasis role="bold">pre-translate </emphasis>the
source text: all the segments in the source text, for which
translations can be found in those "auto" TMs, will land in the
main TM of the project without any user intervention.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>tm/enforce folder</term>
<listitem>
<para>If you have no doubt that a TMX is more accurate than the
<filename>project_save.tmx</filename> of OmegaT, put this TMX in
/tm/enforce to overwrite existing default translations
unconditionally.</para>
</listitem>
</varlistentry>
<varlistentry>
<term>tm/mt folder</term>
<listitem>
<para>In the editor pane, when a match is inserted from a TMX
contained in a folder named <emphasis role="bold">mt</emphasis>,
the background of the active segment is changed to red. The
background is restored to normal when the segment is left.</para>
</listitem>
</varlistentry>
<varlistentry>
<term><indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Subfolders tm/penalty-xxx</secondary>
<seealso>Project files</seealso>
</indexterm>tm/penalty-xxx folders</term>
<listitem>
<para>Sometimes, it is useful to distinguish between high-quality
translation memories and those that are, because of the subject
matter, client, revision status, etc., less reliable. For
translation memories in folders with a name "penalty-xxx" (with
xxx between 0 and 100), matches will be degraded according to the
name of the folder: a 100% match in any of TMs, residing in a
folder called Penalty-30 for instance, will be lowered to a 70%
match. The penalty applies to all three match percentages: matches
75, 80, 90 will in this case be lowered to 45, 50, 60.</para>
</listitem>
</varlistentry>
</variablelist>
<para>Optionally, you can let <application>OmegaT</application> have an
additional tmx file (<application>OmegaT</application>-style) anywhere
you specify, containing all translatable segments of the project. See
pseudo-translated memory below.</para>
<para>Note that all the translation memories are loaded into memory when
the project is opened. Back-ups of the project translation memory are
produced regularly (see next chapter), and
<filename>project_save.tmx</filename> is also saved/updated when the
project is closed or loaded again. This means for instance that you do
not need to exit a project you are currently working on if you decide to
add another ancillary TM to it: you simply reload the project, and the
changes you have made will be included.</para>
<para>The locations of the various different translation memories for a
given project are user-defined (see Project dialog window in <link
linkend="chapter.project.properties">Project properties)</link></para>
<para>Depending on the situation, different strategies are thus
possible, for instance:</para>
<para><emphasis role="bold">several projects on the same subject:
</emphasis>keep the project structure, and change source and target
folders (Source = source/order1, target = target/order1 etc). Note that
you segments from order1, that are not present in order2 and other
subsequent jobs, will be tagged as orphan segments; however, they will
still be useful for getting fuzzy matches.</para>
<para><emphasis role="bold">several translators working on the same
project:</emphasis> split the source files into source/Alice,
source/Bob... and allocate them to team members (Alice, Bob ...). They
can then create their own projects and, deliver their own
<filename>project_save.tmx</filename>, when finished or when a given
milestone has been reached. The <filename>project_save.tmx</filename>
files are then collected and possible conflicts as regards terminology
for instance get resolved. A new version of the master TM is then
created, either to be put in team members'
<emphasis>tm/auto</emphasis>subfolders or to replace their
<filename>project_save.tmx</filename> files. The team can also use the
same subfolder structure for the target files. This allows them for
instance to check at any moment, whether the target version for the
complete project is still OK</para>
</section>
<section id="tmx.backup">
<title>tmx backup<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Backup</secondary>
</indexterm></title>
<para>As you translate your files, <application>OmegaT</application>
stores your work continually in <filename>project_save.tmx</filename> in
the project's /<filename>omegat</filename> subfolder.</para>
<para><application>OmegaT</application> also backups translation memory
to <filename>project_save.tmx.YEARMMDDHHNN.bak</filename> in the same
subfolder whenever a project is opened or reloaded. YEAR is 4-digit
year, MM is a month, DD day of the month, HH and NN are hours and
minutes when the previous translation memory was saved.</para>
<para>If you believe you have lost translation data, follow the
following procedure:</para>
<orderedlist>
<listitem>
<para>Close the project</para>
</listitem>
<listitem>
<para>Rename the current <filename>project_save.tmx</filename> file
( e.g. to <filename>project_save.tmx.temporary</filename>)</para>
</listitem>
<listitem>
<para>Select the backup translation memory that is most likely -
e.g. the most recent one, or the last version from the day before)
to contain the data you are looking for</para>
</listitem>
<listitem>
<para>Copy it to <filename>project_save.tmx</filename></para>
</listitem>
<listitem>
<para>Open the project</para>
</listitem>
</orderedlist>
</section>
<section id="tmx.files.and.language">
<title>tmx files and language<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Language</secondary>
</indexterm></title>
<para>Tmx files contain translation units, made of a number of
equivalent segments in several languages. A translation unit comprises
at least two translation unit variants (TUV). Either can be used as the
source or target.</para>
<para>The settings in your project indicate which is the source and
which the target language. OmegaT thus takes the TUV segments
corresponding to the project's source and target language codes and uses
them as the source and target segments respectively. OmegaT recognizes
the language codes using the following two standard conventions :</para>
<itemizedlist>
<listitem>
<para>2 letters (e.g. JA for Japanese), or</para>
</listitem>
<listitem>
<para>2- or 3-letter language code followed by the 2-letter country
code (e.g. EN-US - See <xref linkend="appendix.languages"/> for a
partial list of language and country codes).</para>
</listitem>
</itemizedlist>
<para>If the project language codes and the tmx language codes fully
match, the segments are loaded in memory. If languages match but not the
country, the segments still get loaded. If neither the language code not
the country code match, the segments will be ignored.</para>
<para><indexterm class="singular">
<primary>Translation memories</primary>
<secondary>multilingual, handling of</secondary>
</indexterm>TMX files can generally contain translation units with
several candidate languages. If for a given source segment there is no
entry for the selected target language, all other target segments are
loaded, regardless of the language. For instance, if the language pair
of the project is DE-FR, it can be still be of some help to see hits in
the DE-EN translation, if there's none in the DE-FR pair.</para>
</section>
<section>
<title>Orphan segments<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Orphan segments</secondary>
</indexterm></title>
<para>The file <filename>project_save.tmx</filename> contains all the
segments that have been translated since you started the project. If you
modify the project segmentation or delete files from the source, some
matches may appear as <emphasis role="bold">orphan strings</emphasis> in
the Match Viewer: such matches refer to segments that do not exist any
more in the source documents, as they correspond to segments translated
and recorded before the modifications took place.</para>
</section>
</section>
<section id="using.translation.memories.from.previous.projects">
<title>Reusing translation memories<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Reusing translation memories</secondary>
</indexterm></title>
<para>Initially, that is when the project is created, the main TM of the
project, <filename>project_save.tmx</filename> is empty. This TM gradually
becomes filled during the translation. To speed up this process, existing
translations can be reused. If a given sentence has already been
translated once, and translated correctly, there is no need for it to be
retranslated. Translation memories may also contain reference
translations: multinational legislation, such as that of the European
Community, is a typical example.</para>
<para>When you create the target documents in an
<application>OmegaT</application> project, the translation memory of the
project is output in the form of three files in the root folder of your
<application>OmegaT</application> project (see the above description). You
can regard these three tmx files (<filename>-omegat.tmx</filename>,
<filename>-level1.tmx</filename> and <filename>-level2.tmx</filename>) as
an "export translation memory", i.e. as an export of your current
project's content in bilingual form.</para>
<para>Should you wish to reuse a translation memory from a previous
project (for example because the new project is similar to the previous
project, or uses terminology which might have been used before), you can
use these translation memories as "input translation memories", i.e. for
import into your new project. In this case, place the translation memories
you wish to use in the <emphasis>/tm</emphasis> or
<emphasis>/tm</emphasis>/auto folder of your new project: in the former
case you will get hits from these translation memories in the fuzzy
matches viewer, and in the latter case these TMs will be used to
pre-translate your source text.</para>
<para>By default, the /tm folder is below the project's root folder (e.g.
...<emphasis>/MyProject/tm</emphasis>), but you can choose a different
folder in the project properties dialog if you wish. This is useful if you
frequently use translation memories produced in the past, for example
because they are on the same subject or for the same customer. In this
case, a useful procedure would be:</para>
<itemizedlist>
<listitem>
<para>Create a folder (a "repository folder") in a convenient location
on your hard drive for the translation memories for a particular
customer or subject.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Whenever you finish a project, copy one of the three "export"
translation memory files from the root folder of the project to the
repository folder.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>When you begin a new project on the same subject or for the same
customer, navigate to the repository folder in the
<guimenuitem>Project &gt; Properties &gt; Edit Project
dialog</guimenuitem> and select it as the translation memory
folder.</para>
</listitem>
</itemizedlist>
<para>Note that all the tmx files in the /tm repository are parsed when
the project is opened, so putting all different TMs you may have on hand
into this folder may unnecessarily slow OmegaT down. You may even consider
removing those that are not required any more, once you have used their
contents to fill up the <filename>project-save.tmx</filename> file.</para>
<section id="importing.and.exporting.translation.memories">
<title>Importing and exporting translation memories<indexterm
class="singular">
<primary>Translation memories</primary>
<secondary>Importing and exporting</secondary>
</indexterm></title>
<para>OmegaT supports imported tmx versions 1.1-1.4b (both level 1 and
level 2). This enables the translation memories produced by other tools
to be read by OmegaT. However, OmegaT does not fully support imported
level 2 tmx files (these store not only the translation, but also the
formatting). Level 2 tmx files will still be imported and their textual
content can be seen in OmegaT, but the quality of fuzzy matches will be
somewhat lower.</para>
<para>OmegaT follows very strict procedures when loading translation
memory (tmx) files. If an error is found in such a file, OmegaT will
indicate the position within the defective file at which the error is
located.</para>
<para>Some tools are known to produce invalid tmx files under certain
conditions. If you wish to use such files as reference translations in
OmegaT, they must be repaired, or OmegaT will report an error and fail
to load them. Fixes are trivial operations and OmegaT assists
troubleshooting with the related error message. You can ask the user
group for advice if you have problems.</para>
<para>OmegaT exports version 1.4 TMX files (both level 1 and level 2).
The level 2 export is not fully compliant with the level 2 standard, but
is sufficiently close and will generate correct matches in other
translation memory tools supporting TMX Level 2. If you only need
textual information (and not formatting information), use the level 1
file that OmegaT has created.</para>
</section>
<section id="Creating.a.translation.memory.for.selected.documents">
<title>Creating a translation memory for selected documents</title>
<para>In case translators need like to share their TMX bases while
excluding some of their parts or including just translations of certain
files, sharing the complete <filename>ProjectName-omegat.tmx</filename>
is out of question. The following recipee is just one of the
possibilities, but simple enough to follow and without any dangers for
the assets.</para>
<itemizedlist>
<listitem>
<para>Create a project, separate for other projects, in the desired
language pair, with an appropriate name - note that the TMXs created
will include this name.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Copy the documents, you need the translation memory for, into
the source folder of the project.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Copy the translation memories, containing the translations of
the documents above, into <filename>tm/auto</filename> subfolder of
the new project.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Start the project. Check for possible Tag errors with
<keycap>Ctrl+T </keycap>and untranslated segments with
<keycap>Ctrl+U</keycap>. To check everything is as expected, you may
press <keycap>Ctrl+D</keycap> to create the target documents and
check their contents.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>When you exit the project. the TMX files in the main project
folder (see above) now contain the transltions in the selected
language pair, for the files, you have copied into the source
folder. Copy them to a safe place for future referrals.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>To avoid reusing the project and thus possibly polluting
future cases, delete the project folder or archive it away from your
workplace.</para>
</listitem>
</itemizedlist>
</section>
<section id="sharing.translation.memories">
<title>Sharing translation memories<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Sharing</secondary>
<seealso>Project,Download Team Project...</seealso>
</indexterm></title>
<para>In cases where a team of translators is involved, translators will
prefer to share common translation memories rather than distribute their
local versions.</para>
<para>OmegaT interfaces to SVN and Git, two common team software
versioning and revision control systems (RCS), available under an open
source license. In case of OmegaT complete project folders - in other
words the translation memories involved as well as source folders,
project settings etc - are managed by the selected RCS. see more in
Chapter</para>
</section>
<section>
<title>Using TMX with alternative language pairs<indexterm
class="singular">
<primary>Translation memories</primary>
<secondary>Alternative language pairs</secondary>
</indexterm></title>
<para>There may be cases where you have done a project with e.g. Dutch
sources, and a translation in say English. Then you need a translation
in e.g. Chinese, but your translator does not understand Dutch; she,
however, understands perfectly English. In this case NL-EN translation
memory can serve as a go-between to help generate NL to ZH
translation.</para>
<para>The solution in our example is to copy the existing translation
memory into the tm/tmx2source/ subfolder and rename it ZH_CN.tmx to indicate the
target language of the tmx. The translator will be shown English
translations for source segments in Dutch and use them to create the
Chinese translation.</para>
<para><emphasis role="bold">Important: </emphasis>the supporting TMX
must be renamed XX_YY.tmx, where XX_YY is the target language of the
tmx, for instance to ZH_CN.tmx in the example above. The project and TMX
source languages should of course be identical - NL in our example. Note
that only one TMX for a given language pair is possible, so if several
translation memories should be involved, you will need to merge them all
into the XX_YY.tmx.</para>
</section>
</section>
<section>
<title>Sources with existing translations<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>PO and OKAPI TTX files</secondary>
<seealso>Translation memories Subfolder tm/auto</seealso>
</indexterm></title>
<para>Some types of source files (for instance PO, TTX, etc.) are
bilingual, i.e. they serve both as a source and as a translation memory.
In such cases, an existing translation, found in the file, is included in
the <filename>project_save.tmx</filename>. It is treated as a default
translation, if no match has been found, or as an alternative translation,
in case the same source segment exists, but with a target text. The result
will thus depend on the order in which the source segments have been
loaded.</para>
<para>All translations from source documents are also displayed in the
Comment pane, in addition to the Match pane. In case of PO files, a 20%
penalty applied to the alternative translation (i.e., a 100% match becomes
an 80% match). The word [Fuzzy] is displayed on the source segment.</para>
<para>When you load a segmented TTX file, segments with source = target
will be included, if "Allow translation to be equal to source" in Options
→ Editing Behavior... has been checked. This may be confusing, so you may
consider unchecking this option in this case.</para>
</section>
<section id="pseudo.translated.memory">
<title>Pseudo-translated memory<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Pseudotranslation</secondary>
</indexterm></title>
<note>
<para>Of interest for advanced users only!</para>
</note>
<para>Before segments get translated, you may wish to pre-process them or
address them in some other way than is possible with OmegaT. For example,
if you wish to create a pseudo-translation for testing purposes, OmegaT
enables you to create an additional tmx file that contains all segments of
the project. The translation in this tmx can be either</para>
<itemizedlist>
<listitem>
<para>translation equals source (default)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>translation segment is empty</para>
</listitem>
</itemizedlist>
<para>The tmx file can be given any name you specify. A pseudo-translated
memory can be generated with the following command line parameters:</para>
<para><literal>java -jar omegat.jar --pseudotranslatetmx=&lt;filename&gt;
[pseudotranslatetype=[equal|empty]]</literal></para>
<para>Replace <literal>&lt;filename&gt;</literal> with the name of the
file you wish to create, either absolute or relative to the working folder
(the folder you start <application>OmegaT</application> from). The second
argument <literal>--pseudotranslatetype</literal> is optional. Its value
is either <literal>equal</literal> (default value, for source=target) or
<literal>empty</literal> (target segment is empty). You can process the
generated tmx with any tool you want. To reuse it in
<application>OmegaT</application> rename it to <emphasis>project_save.tmx
</emphasis> and place it in the <literal>omegat</literal>-folder of your
project.</para>
</section>
<section id="upgrading.translation.memories">
<title>Upgrading translation memories<indexterm class="singular">
<primary>Translation memories</primary>
<secondary>Upgrading to sentence segmentation</secondary>
</indexterm></title>
<para>Very early versions of <application>OmegaT</application> were
capable of segmenting source files into paragraphs only and were
inconsistent when numbering formatting tags in HTML and Open Document
files. <application>OmegaT</application> can detect and upgrade such tmx
files on the fly to increase fuzzy matching quality and leverage your
existing translation better, saving you the work of doing this
manually.</para>
<para>A project's tmx will be upgraded only once, and will be written in
upgraded form into the <filename>project-save.tmx</filename>; legacy tmx
files will be upgraded on the fly each time the project is loaded. Note
that in some cases changes in file filters in
<application>OmegaT</application> may lead to totally different
segmentation; as a result, you will have to upgrade your translation
manually in such rare cases.</para>
</section>
</chapter>