Skip to content
Permalink
8bed31719c
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
342 lines (267 sloc) 13.8 KB
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../docbook-xml-4.5/docbookx.dtd">
<chapter id="chapter.formatted.text">
<title>Working with formatted text<indexterm class="singular">
<primary>Target files</primary>
<secondary>Formatted text</secondary>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Formatted text</secondary>
</indexterm></title>
<para>Formatting information present in the source file usually needs to be
reproduced in the target file. The in-line formatting information made
possible by the supported formats (in particular DocBook, HTML, XHTML, Open
Document Format(ODF) and Office Open XML (MS Office 2007 and later) at the
time of writing) is presented as tags in OmegaT. Normally tags are ignored
when considering the similarity between different texts for matching
purposes. Tags reproduced in the translated segment will be present in the
translated document.</para>
<section id="formatting.tags">
<title>Formatting tags<indexterm class="singular">
<primary>Tags</primary>
</indexterm></title>
<para><indexterm class="singular">
<primary>Tags</primary>
<secondary>Naming</secondary>
</indexterm><emphasis role="bold">Tag naming:</emphasis></para>
<para>The tags consist of one to three characters and a number. Unique
numbering allows tags, corresponding to each other to be grouped together
and differentiates between tags, that have the same shortcut character,
but are in fact different. The shortcut characters used try to reflect the
underlying meaning of the tag (e.g. b for bold, i for italics,
etc.)</para>
<para><indexterm class="singular">
<primary>Tags</primary>
<secondary>Numbering</secondary>
</indexterm><emphasis role="bold">Tag numbering:</emphasis></para>
<para>Tags are numbered incrementally by tag group. "Tag groups" in this
context are a single tag (such as &lt;i0&gt; and &lt;/i0&gt;). Within a
segment, the first group (pair or single) receives the number 0, the
second the number 1 etc. The first example below has 3 tag groups (a pair,
a single, and then another pair), the second example has one group only (a
pair).</para>
<para><indexterm class="singular">
<primary>Tags</primary>
<secondary>Pairs and singles</secondary>
</indexterm><emphasis role="bold">Pairs and singles:</emphasis></para>
<para>Tags are always either singles or paired. Single tags indicate
formatting information that does not affect the surrounding text (an extra
space or line break for example).</para>
<para><literal>&lt;b0&gt;&lt;Ctr+N&gt;&lt;/b0&gt;,
&lt;br1&gt;&lt;b2&gt;&lt;Enter&gt;&lt;/b2&gt;&lt;segment 2132&gt;
</literal></para>
<para>&lt;br1&gt; is a single tag and does not affect any surrounding
text. Paired tags usually indicate style information that applies to the
text between the opening tag and the closing tag of a pair. &lt;b0&gt; and
&lt;/b0&gt; below are paired and affect the text log.txt. Note that the
opening tag must always come before the corresponding closing tag:</para>
<para>&lt;<literal>Log file (&lt;b0&gt;log.txt&lt;/b0&gt;) for tracking
operations and errors.&lt;segment 3167&gt;</literal></para>
<para><application>OmegaT</application> creates its tags before the
process of sentence segmenting. Depending upon the segmenting rules, the
pair of tags may get separated into two consecutive segments and the tag
validation will err on the side of caution and mark the two
segments.</para>
</section>
<section id="tag.operations">
<title>Tag operations<indexterm class="singular">
<primary>Tags</primary>
<secondary>Operations</secondary>
</indexterm></title>
<para>Care must be exercised with tags. If they are accidentally changed,
the formatting of the final file may be corrupted. The basic rule is that
the sequence of tags must be preserved in the same order. However, it is
possible, if certain rules are strictly followed, to deviate from this
basic rule.</para>
<para><emphasis role="bold"><indexterm class="singular">
<primary>Tags</primary>
<secondary>Duplication</secondary>
</indexterm>Tag duplication: </emphasis></para>
<para>To duplicate tag groups, just copy them in the position of your
choice. Keep in mind that in a pair group, the opening tag must come
before the closing tag. The formatting represented by the group you have
duplicated will be applied to both sections.</para>
<para>Example:</para>
<para><literal>&lt;b0&gt;This formatting&lt;/b0&gt; is going to be
duplicated here.&lt;segment 0001&gt; </literal></para>
<para>After duplication:</para>
<para><literal>&lt;b0&gt;This formatting&lt;/b0&gt; has been
&lt;b0&gt;duplicated here&lt;/b0&gt;.&lt;segment 0001&gt;
</literal></para>
<para><emphasis role="bold"><indexterm class="singular">
<primary>Tags</primary>
<secondary>Group deletion</secondary>
</indexterm>Tag group deletion:</emphasis></para>
<para>To delete tag groups, just remove them from the segment. Keep in
mind that a pair group must have both its opening and its closing tag
deleted to ensure that all traces of the formatting are properly erased,
otherwise the translated file may become corrupted. By deleting a tag
group you will remove the related formatting from the translated
file.</para>
<para>Example:</para>
<para><literal>&lt;b0&gt;This formatting&lt;/b0&gt; is going to be
deleted.&lt;segment 0001&gt; </literal></para>
<para>After deletion:</para>
<para><literal>This formatting has been deleted.&lt;segment 0001&gt;
</literal></para>
</section>
<section id="tag.group.nesting">
<title>Tag group nesting<indexterm class="singular">
<primary>Tags</primary>
<secondary>Group nesting</secondary>
</indexterm></title>
<para>Modifying tag group order may result in the nesting of a tag group
within another tag group. This is acceptable, provided the enclosing group
totally encloses the enclosed group. In other words, when moving paired
tags, ensure that both the opening and the closing tag are both either
inside or outside other tag pairs, or the translated file may be corrupted
and fail to open.</para>
<para>Example:</para>
<para><literal>&lt;b0&gt;Formatting&lt;/b0&gt; &lt;b1&gt;one&lt;/b1&gt; is
going to be nested inside formatting zero.&lt;segment 0001&gt;
</literal></para>
<para>After nesting:</para>
<para><literal>&lt;b0&gt;Formatting &lt;b1&gt;one&lt;/b1&gt;&lt;/b0&gt;
has been nested inside formatting zero.&lt;segment
0001&gt;</literal></para>
</section>
<section id="tag.group.overlapping">
<title>Tag group overlapping<indexterm class="singular">
<primary>Tags</primary>
<secondary>Group overlapping</secondary>
</indexterm></title>
<para>Overlapping is the result of bad manipulations of tag pairs and is
guaranteed to result in formatting corruption and sometimes in the
translated file not opening at all.</para>
<para>Example:</para>
<para><literal>&lt;b0&gt;Formatting&lt;/b0&gt; &lt;b1&gt;one&lt;/b1&gt; is
going to be messed up.&lt;segment 0001&gt; </literal></para>
<para>After a bad manipulation:</para>
<para><literal>&lt;b0&gt;Formatting &lt;b1&gt;one&lt;/b0&gt; &lt;/b1&gt;is
very messed up now.&lt;segment 0001&gt;</literal></para>
</section>
<section>
<title>Tag validation options</title>
<para>To customize the work with tags, one can set down some of the rules
in the <guimenuitem>Options &gt; Tag validation...</guimenuitem>
window:</para>
<mediaobject>
<imageobject>
<imagedata fileref="images/OptionsTagValidation_25.png"/>
</imageobject>
</mediaobject>
<para>The behaviour, stated here, applies to all the source files and not
just to some of the file types, like formatted text.</para>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Printf variables - do not check, check
simple, check all</emphasis></para>
<para>OmegaT can check that programming variables (like %s for
instance) in the source exist in the translation. You can decide not
to check at all, check for simple printf variables (like %s %d etc) or
for print variables of all types.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Check simple java MessageFormat
patterns</emphasis></para>
<para>Activating this check box will cause OmegaT to check if simple
java MessageFormat tags (like {0}) are processed correctly.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Custom tag(s) regular
expression</emphasis></para>
<para>A regular expression entered here will cause OmegaT treat the
detected instances as customer tags. It checks that the number of tags
and their order is identical, just like it is the case for
omegat-tags.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Fragment(s) that should be removed from
the translation regular expression</emphasis></para>
<para>One can enter a regular expression for unwanted contents in the
target. Any matches in the target segment will then be painted red,
i.e. easy to identify and correct. When looking for fuzzy matches the
remove pattern is ignored. A fixed penalty of 5 is added if the
removed part does not match some other segment, so the match does not
show up as 100%</para>
</listitem>
</itemizedlist>
</section>
<section id="tag.group.validation">
<title>Tag group validation<indexterm class="singular">
<primary>Tags</primary>
<secondary>Group validation</secondary>
</indexterm></title>
<para>The validate tags function detects changes to tag sequences (whether
deliberate or accidental), and shows the affected segments. Launching this
function – <guimenuitem>Ctrl+Shift+V<indexterm class="singular">
<primary>Shortcuts</primary>
<secondary>Tag validation - Ctrl+T</secondary>
</indexterm></guimenuitem> - opens a window containing all segments in
the file containing suspected broken or bad tags in the translation.
Repairing the tags and recreating the target documents is easy with the
validate tags function. The window that opens when
<guimenuitem>Ctrl+Shift+V</guimenuitem> is pressed features a 3-column
table with a link to the segment, the original segment and the target
segment</para>
<figure id="tag.validation">
<title>Tag validation entry</title>
<mediaobject>
<imageobject role="html">
<imagedata fileref="images/TagValidation.png"/>
</imageobject>
<imageobject role="fo">
<imagedata fileref="images/TagValidation.png" width="90%"/>
</imageobject>
</mediaobject>
</figure>
<para>The tags are highlighted in bold blue for easy comparison between
the original and the translated contents. Click on the link to activate
the segment in the Editor. Correct the error if necessary (in the case
above it is the missing &lt;i2&gt;&lt;/i2&gt; pair) and press
<guimenuitem>Ctrl+Shift+V</guimenuitem> to return to the tag validation
window to correct other errors. Tag errors are tag sequences in the
translation in which the same tag order and number as in the original
segment is not reproduced. Some tag manipulations are necessary and are
benign, others will cause problems when the translated document is
created.</para>
</section>
<section id="hints.for.tag.management">
<title>Hints for tags management<indexterm class="singular">
<primary>Tags</primary>
<secondary>Hints</secondary>
</indexterm></title>
<para><emphasis role="bold">Simplify the original text</emphasis></para>
<para>Tags generally represent formatting in some form of the original
text. Simplifying the original formatting greatly contributes to reducing
the number of tags. Where circumstances permit, unifying used fonts, font
sizes, colors, etc. should be considered, as it could simplify the
translation and reduce the potential for tag errors. Read the tag
operations section to see what can be done with tags. Remember that if you
find tags a problem in OmegaT and formatting is not extremely relevant for
the current translation, removing tags may be the easiest way out of
problems.</para>
<para><emphasis role="bold">Pay extra attention to tag
pairs</emphasis></para>
<para>If you need to see tags in OmegaT but do not need to retain most of
the formatting in the translated document you are free not to include tags
in the translation. In this case pay extra attention to tag pairs since
deleting one side of the pair but forgetting to delete the other is
guaranteed to corrupt your document's formatting. Since tags are included
in the text itself, it is possible to use segmentation rules to create
segments with fewer tags. This is an advanced feature and some experience
is required in order for it to be applied properly.</para>
<para>OmegaT is not yet able to detect mistakes in formatting fully
automatically, so it will not prompt you if you make an error or change
formatting to fit your target language better. Sometimes, however, your
translated file may look strange, and – in the worst case – may even
refuse to open.</para>
</section>
</chapter>