Skip to content
Permalink
8bed31719c
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
465 lines (382 sloc) 16.8 KB
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../docbook-xml-4.5/docbookx.dtd">
<chapter id="chapter.files.to.translate">
<title>Files to translate</title>
<section id="file.formats">
<title>File formats<indexterm class="singular">
<primary>Source files</primary>
<secondary>File formats</secondary>
</indexterm><indexterm class="singular">
<primary>Target files</primary>
<secondary>File formats</secondary>
</indexterm></title>
<para><application>You can use OmegaT</application> to translate files in
a number of file formats. There are basically two types of file formats,
plain text and formatted text.</para>
<section>
<title>Plain text files<indexterm class="singular">
<primary>Target files</primary>
<secondary>Plain text files</secondary>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Plain text files</secondary>
</indexterm></title>
<para>Plain text files contain text only, so their translation is as
simple as typing the translation. There are several methods to specify
the file's encoding so that its contents are not garbled when opened in
<application>OmegaT</application>. Such files do not contain any
formatting information beyond the "white space" used to align text,
indicate paragraphs or insert page breaks. They are not able to contain
or retain information regarding the color, font etc of the text.
Currently, <application>OmegaT</application> supports the following
plain text formats:<indexterm class="singular">
<primary>File formats</primary>
<secondary>Unformatted</secondary>
<seealso>Source files</seealso>
</indexterm></para>
<itemizedlist>
<listitem>
<para>ASCII text (.txt, etc.)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Encoded text (*.UTF8)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Java resource bundles (*.properties)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>PO files (*.po)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>INI (key=value) files (*.ini)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>DTD files (*.DTD)</para>
</listitem>
<listitem>
<para>DokuWiki files (*.txt)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>SubRip title files (*.srt)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Magento CE Locale CSV files (*.csv)</para>
</listitem>
</itemizedlist>
<para>Other plain text file types can be handled by
<application>OmegaT</application> by associating their file extension to
a supported file type (for example, .pod files can be associated to the
ASCII text filter) and by pre-processing them with specific segmentation
rules.</para>
<para>PO files<indexterm class="singular">
<primary>Source files</primary>
<secondary>PO as bilingual files</secondary>
</indexterm> can contain both the source and the target text. Seen
from this point of view, they are plain text files<emphasis>
plus</emphasis> translation memories. If for a given source segment
there is as yet no existing translation in the project translation
memory (project_save.tmx), the current translation will be saved in the
project_save.tmx as the default translation. In case, however, the same
source segment already exists with a different translation, the new
translation will be saved as an alternative.</para>
</section>
<section>
<title>Formatted text files<indexterm class="singular">
<primary>Target files</primary>
<secondary>Formatted text files</secondary>
<seealso>Tagged text</seealso>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Formatted text files</secondary>
</indexterm></title>
<para>Formatted text files contain information such as font type, size,
color etc. as well as text. They are commonly created in word processors
or HTML editors. Such file formats are designed to hold formatting
information. The formatting information can be as simple as “this is
bold”, or as complex as table data with different font sizes, colors,
positions, etc. In most translation jobs, it is considered important for
the formatting of the original text to be retained in the translation.
OmegaT allows you to do this by marking the characters/words that have a
special formatting with easy-to-handle tags. Simplifying the original
text formatting greatly contributes to reducing the number of tags.
Where possible, unifying the fonts, font sizes, colors, etc. used in the
document simplifies the task of translation and reduces the possible
number of tag errors. Each file type is handled differently in OmegaT.
Specific behavior can be set up in the file filters. At the time of
writing, OmegaT supports the following formatted text formats:
<indexterm class="singular">
<primary>File formats</primary>
<secondary>formatted</secondary>
<seealso>Source files</seealso>
</indexterm></para>
<para><itemizedlist>
<listitem>
<para>ODF - OASIS Open Document Format (*.ods, *.ots, *.odt,
*.ott, *.odp, *.otp)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>Microsoft Office Open XML (*.docx, *.dotx, *.xlsx, *.xltx,
*.pptx)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>(X)HTML (*.html, *.xhtml,*.xht)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>HTML Help Compiler (*.hhc, *.hhk)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>DocBook (*.xml)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>XLIFF (*.xlf, *.xliff, *.sdlxliff) - of the source=target
variety</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>QuarkXPress CopyFlowGold (*.tag, *.xtg)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>ResX files (*.resx)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>Android resource (*.xml)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>LaTex (*.tex, *.latex)</para>
</listitem>
<listitem>
<para>Help (*.xml) and Manual (*.hmxp) files</para>
</listitem>
<listitem>
<para>Typo3 LocManager (*.xml)</para>
</listitem>
</itemizedlist><itemizedlist>
<listitem>
<para>WiX Localization (*.wxl)</para>
</listitem>
<listitem>
<para>Iceni Infix (*.xml)</para>
</listitem>
<listitem>
<para>Flash XML export (*.xml)</para>
</listitem>
<listitem>
<para>Wordfast TXML (*.txml)</para>
</listitem>
<listitem>
<para>Camtasia for Windows (*.camproj)</para>
</listitem>
<listitem>
<para>Visio (*.vxd)</para>
</listitem>
</itemizedlist></para>
<para>Other formatted text file types may also be handled by OmegaT by
associating their file extensions to a supported file type, assuming
that the corresponding segmentation rules will segment them
correctly.</para>
</section>
</section>
<section id="other.file.formats">
<title>Other file formats<indexterm class="singular">
<primary>Target files</primary>
<secondary>Other file formats</secondary>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Other file formats</secondary>
</indexterm></title>
<para>Other plain text or formatted text file formats suitable for
processing in OmegaT may also exist.</para>
<para><indexterm class="singular">
<primary>Target files</primary>
<secondary>File conversion tools</secondary>
</indexterm>External tools can be used to convert files to supported
formats. The translated files will then need to be converted back to the
original format. For example, if you have an outdated Microsoft Word
version, that does not handle the ODT format, here's a round trip for Word
files with the DOC extension:</para>
<itemizedlist>
<listitem>
<para>import the file into ODF writer</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>save the file in ODT format</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>translate it into the target ODT file</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>load the target file in ODF writer</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>save the file as a DOC file</para>
</listitem>
</itemizedlist>
<para>The quality of formatting of the translated file will depend on the
quality of the round-trip conversion. Before proceeding with such
conversions, be sure to test all options. Check the <ulink
url="http://www.omegat.org">OmegaT home page</ulink> for an up-to-date
listing of auxiliary translation tools.</para>
</section>
<section id="right.to.left.languages">
<title>Right to left languages<indexterm class="singular">
<primary>Right to left languages</primary>
</indexterm><indexterm class="singular">
<primary>Target files</primary>
<secondary>Right to left languages</secondary>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Right to left languages</secondary>
</indexterm></title>
<para>Justification of source and target segments depends upon the project
languages. By default, left justification is used for Left-To-Right (LTR)
languages and right justification for Right-To-Left (RTL) languages. You
can toggle between different display modes by pressing <keycombo>
<keycap>Shift</keycap>
<keycap>Ctrl</keycap>
<keycap>O</keycap>
</keycombo> (this is the letter O and not the numeral 0). The <keycombo>
<keycap>Shift</keycap>
<keycap>Ctrl</keycap>
<keycap>O</keycap>
</keycombo> toggle has three states:</para>
<itemizedlist>
<listitem>
<para>default justification, that is as defined by the language</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>left justification</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>right justification</para>
</listitem>
</itemizedlist>
<para>Using the RTL mode in <application>OmegaT</application> has no
influence whatsoever on the display mode of the translated documents
created in <application>OmegaT</application>. The display mode of the
translated documents must be modified within the application (such as
Microsoft Word) commonly used to display or modify them (check the
relevant manuals for details). Using <keycombo>
<keycap>Shift</keycap>
<keycap>Ctrl</keycap>
<keycap>O</keycap>
</keycombo> causes both text input and display in
<application>OmegaT</application> to change. It can be used separately for
all three panes (Editor, Fuzzy Matches and Glossary) by clicking on the
pane and toggling the display mode. It can also be used in all the input
fields found in <application>OmegaT</application> - in the search window,
for segmentation rules etc.</para>
<para>Mac OS X users, note: use <keycombo>
<keycap>Shift</keycap>
<keycap>Ctrl</keycap>
<keycap>O</keycap>
</keycombo> shortcut and <emphasis role="bold">not
</emphasis>cmd+Ctrl+O.</para>
<section>
<title>Mixing RTL and LTR strings in segments<indexterm class="singular">
<primary>Right to left languages</primary>
<secondary>Mixing RTL and LTR strings</secondary>
</indexterm><indexterm class="singular">
<primary>Target files</primary>
<secondary>Mixing RTL and LTR strings</secondary>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Mixing RTL and LTR strings</secondary>
</indexterm></title>
<para>When writing purely RTL text, the default (LTR) view may be used.
In many cases, however, it is necessary to embed LTR text in RTL text.
For example, in OmegaT tags, product names that must be left in the LTR
source language, place holders in localization files, and numbers in
text. In cases like these it becomes necessary to switch to RTL mode, so
that the RTL (in fact bidirectional) text is displayed correctly. It
should be noted that when <application>OmegaT</application> is in RTL
mode, both source and target are displayed in RTL mode. This means that
if the source language is LTR and the target language is RTL, or vice
versa, it may be necessary to toggle back and forth between RTL and LTR
modes to view the source and enter the target easily in their respective
modes.</para>
</section>
<section>
<title><application>OmegaT</application> tags in RTL segments<indexterm
class="singular">
<primary>Right to left languages</primary>
<secondary>OmegaT tags in RTL languages</secondary>
</indexterm></title>
<para>As stated above, OmegaT tags are LTR strings. When translating
between RTL and LTR languages, correctly reading the tags from the
source and entering them properly in the target may require the
translator to toggle between LTR and RTL modes numerous times.</para>
<para>If the document allows, the translator is strongly encouraged to
remove style information from the original document so that as few tags
as possible appear in the OmegaT interface. Follow the indications given
in Hints for tags management. Frequently validate tags (see Tag
validation) and produce translated documents (see below and Menu) at
regular intervals to make it easier to catch any problems that arise. A
hint: translating a plain text version of the text and adding the
necessary style in the relevant application at a later stage may turn
out to be less hassle.</para>
</section>
<section>
<title>Creating translated RTL documents<indexterm class="singular">
<primary>Right to left languages</primary>
<secondary>Creating RTL target files</secondary>
</indexterm><indexterm class="singular">
<primary>Right to left languages</primary>
<secondary>Target files</secondary>
</indexterm><indexterm class="singular">
<primary>Right to left languages</primary>
<secondary>Creating RTL target text</secondary>
</indexterm></title>
<para>When the translated document is created, its display direction
will be the same as that of the original document. If the original
document was LTR, the display direction of the target document must be
changed manually to RTL in its viewing application. Each output format
has specific ways of dealing with RTL display; check the relevant
application manuals for details.</para>
<para>For .docx files, a number of changes are however done automatically:
<itemizedlist>
<listitem>Paragraphs, sections and tables are set to bidi</listitem>
<listitem>Runs (text elements) are set to RTL</listitem>
</itemizedlist>
</para>
<para>To avoid changing the target files display parameters each time
the files are opened, it may be possible to change the source file
display parameters such that such parameters are inherited by the target
files. Such modifications are possible in ODF files for example.</para>
</section>
</section>
</chapter>