Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2015OmegaT/OmegaT/doc_src/en/FormattedText.xml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
342 lines (267 sloc)
13.8 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"../../../docbook-xml-4.5/docbookx.dtd"> | |
<chapter id="chapter.formatted.text"> | |
<title>Working with formatted text<indexterm class="singular"> | |
<primary>Target files</primary> | |
<secondary>Formatted text</secondary> | |
</indexterm><indexterm class="singular"> | |
<primary>Source files</primary> | |
<secondary>Formatted text</secondary> | |
</indexterm></title> | |
<para>Formatting information present in the source file usually needs to be | |
reproduced in the target file. The in-line formatting information made | |
possible by the supported formats (in particular DocBook, HTML, XHTML, Open | |
Document Format(ODF) and Office Open XML (MS Office 2007 and later) at the | |
time of writing) is presented as tags in OmegaT. Normally tags are ignored | |
when considering the similarity between different texts for matching | |
purposes. Tags reproduced in the translated segment will be present in the | |
translated document.</para> | |
<section id="formatting.tags"> | |
<title>Formatting tags<indexterm class="singular"> | |
<primary>Tags</primary> | |
</indexterm></title> | |
<para><indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Naming</secondary> | |
</indexterm><emphasis role="bold">Tag naming:</emphasis></para> | |
<para>The tags consist of one to three characters and a number. Unique | |
numbering allows tags, corresponding to each other to be grouped together | |
and differentiates between tags, that have the same shortcut character, | |
but are in fact different. The shortcut characters used try to reflect the | |
underlying meaning of the tag (e.g. b for bold, i for italics, | |
etc.)</para> | |
<para><indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Numbering</secondary> | |
</indexterm><emphasis role="bold">Tag numbering:</emphasis></para> | |
<para>Tags are numbered incrementally by tag group. "Tag groups" in this | |
context are a single tag (such as <i0> and </i0>). Within a | |
segment, the first group (pair or single) receives the number 0, the | |
second the number 1 etc. The first example below has 3 tag groups (a pair, | |
a single, and then another pair), the second example has one group only (a | |
pair).</para> | |
<para><indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Pairs and singles</secondary> | |
</indexterm><emphasis role="bold">Pairs and singles:</emphasis></para> | |
<para>Tags are always either singles or paired. Single tags indicate | |
formatting information that does not affect the surrounding text (an extra | |
space or line break for example).</para> | |
<para><literal><b0><Ctr+N></b0>, | |
<br1><b2><Enter></b2><segment 2132> | |
</literal></para> | |
<para><br1> is a single tag and does not affect any surrounding | |
text. Paired tags usually indicate style information that applies to the | |
text between the opening tag and the closing tag of a pair. <b0> and | |
</b0> below are paired and affect the text log.txt. Note that the | |
opening tag must always come before the corresponding closing tag:</para> | |
<para><<literal>Log file (<b0>log.txt</b0>) for tracking | |
operations and errors.<segment 3167></literal></para> | |
<para><application>OmegaT</application> creates its tags before the | |
process of sentence segmenting. Depending upon the segmenting rules, the | |
pair of tags may get separated into two consecutive segments and the tag | |
validation will err on the side of caution and mark the two | |
segments.</para> | |
</section> | |
<section id="tag.operations"> | |
<title>Tag operations<indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Operations</secondary> | |
</indexterm></title> | |
<para>Care must be exercised with tags. If they are accidentally changed, | |
the formatting of the final file may be corrupted. The basic rule is that | |
the sequence of tags must be preserved in the same order. However, it is | |
possible, if certain rules are strictly followed, to deviate from this | |
basic rule.</para> | |
<para><emphasis role="bold"><indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Duplication</secondary> | |
</indexterm>Tag duplication: </emphasis></para> | |
<para>To duplicate tag groups, just copy them in the position of your | |
choice. Keep in mind that in a pair group, the opening tag must come | |
before the closing tag. The formatting represented by the group you have | |
duplicated will be applied to both sections.</para> | |
<para>Example:</para> | |
<para><literal><b0>This formatting</b0> is going to be | |
duplicated here.<segment 0001> </literal></para> | |
<para>After duplication:</para> | |
<para><literal><b0>This formatting</b0> has been | |
<b0>duplicated here</b0>.<segment 0001> | |
</literal></para> | |
<para><emphasis role="bold"><indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Group deletion</secondary> | |
</indexterm>Tag group deletion:</emphasis></para> | |
<para>To delete tag groups, just remove them from the segment. Keep in | |
mind that a pair group must have both its opening and its closing tag | |
deleted to ensure that all traces of the formatting are properly erased, | |
otherwise the translated file may become corrupted. By deleting a tag | |
group you will remove the related formatting from the translated | |
file.</para> | |
<para>Example:</para> | |
<para><literal><b0>This formatting</b0> is going to be | |
deleted.<segment 0001> </literal></para> | |
<para>After deletion:</para> | |
<para><literal>This formatting has been deleted.<segment 0001> | |
</literal></para> | |
</section> | |
<section id="tag.group.nesting"> | |
<title>Tag group nesting<indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Group nesting</secondary> | |
</indexterm></title> | |
<para>Modifying tag group order may result in the nesting of a tag group | |
within another tag group. This is acceptable, provided the enclosing group | |
totally encloses the enclosed group. In other words, when moving paired | |
tags, ensure that both the opening and the closing tag are both either | |
inside or outside other tag pairs, or the translated file may be corrupted | |
and fail to open.</para> | |
<para>Example:</para> | |
<para><literal><b0>Formatting</b0> <b1>one</b1> is | |
going to be nested inside formatting zero.<segment 0001> | |
</literal></para> | |
<para>After nesting:</para> | |
<para><literal><b0>Formatting <b1>one</b1></b0> | |
has been nested inside formatting zero.<segment | |
0001></literal></para> | |
</section> | |
<section id="tag.group.overlapping"> | |
<title>Tag group overlapping<indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Group overlapping</secondary> | |
</indexterm></title> | |
<para>Overlapping is the result of bad manipulations of tag pairs and is | |
guaranteed to result in formatting corruption and sometimes in the | |
translated file not opening at all.</para> | |
<para>Example:</para> | |
<para><literal><b0>Formatting</b0> <b1>one</b1> is | |
going to be messed up.<segment 0001> </literal></para> | |
<para>After a bad manipulation:</para> | |
<para><literal><b0>Formatting <b1>one</b0> </b1>is | |
very messed up now.<segment 0001></literal></para> | |
</section> | |
<section> | |
<title>Tag validation options</title> | |
<para>To customize the work with tags, one can set down some of the rules | |
in the <guimenuitem>Options > Tag validation...</guimenuitem> | |
window:</para> | |
<mediaobject> | |
<imageobject> | |
<imagedata fileref="images/OptionsTagValidation_25.png"/> | |
</imageobject> | |
</mediaobject> | |
<para>The behaviour, stated here, applies to all the source files and not | |
just to some of the file types, like formatted text.</para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Printf variables - do not check, check | |
simple, check all</emphasis></para> | |
<para>OmegaT can check that programming variables (like %s for | |
instance) in the source exist in the translation. You can decide not | |
to check at all, check for simple printf variables (like %s %d etc) or | |
for print variables of all types.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Check simple java MessageFormat | |
patterns</emphasis></para> | |
<para>Activating this check box will cause OmegaT to check if simple | |
java MessageFormat tags (like {0}) are processed correctly.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Custom tag(s) regular | |
expression</emphasis></para> | |
<para>A regular expression entered here will cause OmegaT treat the | |
detected instances as customer tags. It checks that the number of tags | |
and their order is identical, just like it is the case for | |
omegat-tags.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Fragment(s) that should be removed from | |
the translation regular expression</emphasis></para> | |
<para>One can enter a regular expression for unwanted contents in the | |
target. Any matches in the target segment will then be painted red, | |
i.e. easy to identify and correct. When looking for fuzzy matches the | |
remove pattern is ignored. A fixed penalty of 5 is added if the | |
removed part does not match some other segment, so the match does not | |
show up as 100%</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
<section id="tag.group.validation"> | |
<title>Tag group validation<indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Group validation</secondary> | |
</indexterm></title> | |
<para>The validate tags function detects changes to tag sequences (whether | |
deliberate or accidental), and shows the affected segments. Launching this | |
function – <guimenuitem>Ctrl+Shift+V<indexterm class="singular"> | |
<primary>Shortcuts</primary> | |
<secondary>Tag validation - Ctrl+T</secondary> | |
</indexterm></guimenuitem> - opens a window containing all segments in | |
the file containing suspected broken or bad tags in the translation. | |
Repairing the tags and recreating the target documents is easy with the | |
validate tags function. The window that opens when | |
<guimenuitem>Ctrl+Shift+V</guimenuitem> is pressed features a 3-column | |
table with a link to the segment, the original segment and the target | |
segment</para> | |
<figure id="tag.validation"> | |
<title>Tag validation entry</title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata fileref="images/TagValidation.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/TagValidation.png" width="90%"/> | |
</imageobject> | |
</mediaobject> | |
</figure> | |
<para>The tags are highlighted in bold blue for easy comparison between | |
the original and the translated contents. Click on the link to activate | |
the segment in the Editor. Correct the error if necessary (in the case | |
above it is the missing <i2></i2> pair) and press | |
<guimenuitem>Ctrl+Shift+V</guimenuitem> to return to the tag validation | |
window to correct other errors. Tag errors are tag sequences in the | |
translation in which the same tag order and number as in the original | |
segment is not reproduced. Some tag manipulations are necessary and are | |
benign, others will cause problems when the translated document is | |
created.</para> | |
</section> | |
<section id="hints.for.tag.management"> | |
<title>Hints for tags management<indexterm class="singular"> | |
<primary>Tags</primary> | |
<secondary>Hints</secondary> | |
</indexterm></title> | |
<para><emphasis role="bold">Simplify the original text</emphasis></para> | |
<para>Tags generally represent formatting in some form of the original | |
text. Simplifying the original formatting greatly contributes to reducing | |
the number of tags. Where circumstances permit, unifying used fonts, font | |
sizes, colors, etc. should be considered, as it could simplify the | |
translation and reduce the potential for tag errors. Read the tag | |
operations section to see what can be done with tags. Remember that if you | |
find tags a problem in OmegaT and formatting is not extremely relevant for | |
the current translation, removing tags may be the easiest way out of | |
problems.</para> | |
<para><emphasis role="bold">Pay extra attention to tag | |
pairs</emphasis></para> | |
<para>If you need to see tags in OmegaT but do not need to retain most of | |
the formatting in the translated document you are free not to include tags | |
in the translation. In this case pay extra attention to tag pairs since | |
deleting one side of the pair but forgetting to delete the other is | |
guaranteed to corrupt your document's formatting. Since tags are included | |
in the text itself, it is possible to use segmentation rules to create | |
segments with fewer tags. This is an advanced feature and some experience | |
is required in order for it to be applied properly.</para> | |
<para>OmegaT is not yet able to detect mistakes in formatting fully | |
automatically, so it will not prompt you if you make an error or change | |
formatting to fit your target language better. Sometimes, however, your | |
translated file may look strange, and – in the worst case – may even | |
refuse to open.</para> | |
</section> | |
</chapter> |