FormattedText.xml

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../docbook-xml-4.5/docbookx.dtd">
<chapter id="chapter.formatted.text">
  <title>Working with formatted text<indexterm class="singular">
      <primary>Target files</primary>

      <secondary>Formatted text</secondary>
    </indexterm><indexterm class="singular">
      <primary>Source files</primary>

      <secondary>Formatted text</secondary>
    </indexterm></title>

  <para>Formatting information present in the source file usually needs to be
  reproduced in the target file. The in-line formatting information made
  possible by the supported formats (in particular DocBook, HTML, XHTML, Open
  Document Format(ODF) and Office Open XML (MS Office 2007 and later) at the
  time of writing) is presented as tags in OmegaT. Normally tags are ignored
  when considering the similarity between different texts for matching
  purposes. Tags reproduced in the translated segment will be present in the
  translated document.</para>

  <section id="formatting.tags">
    <title>Formatting tags<indexterm class="singular">
        <primary>Tags</primary>
      </indexterm></title>

    <para><indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Naming</secondary>
      </indexterm><emphasis role="bold">Tag naming:</emphasis></para>

    <para>The tags consist of one to three characters and a number. Unique
    numbering allows tags, corresponding to each other to be grouped together
    and differentiates between tags, that have the same shortcut character,
    but are in fact different. The shortcut characters used try to reflect the
    underlying meaning of the tag (e.g. b for bold, i for italics,
    etc.)</para>

    <para><indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Numbering</secondary>
      </indexterm><emphasis role="bold">Tag numbering:</emphasis></para>

    <para>Tags are numbered incrementally by tag group. "Tag groups" in this
    context are a single tag (such as &lt;i0&gt; and &lt;/i0&gt;). Within a
    segment, the first group (pair or single) receives the number 0, the
    second the number 1 etc. The first example below has 3 tag groups (a pair,
    a single, and then another pair), the second example has one group only (a
    pair).</para>

    <para><indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Pairs and singles</secondary>
      </indexterm><emphasis role="bold">Pairs and singles:</emphasis></para>

    <para>Tags are always either singles or paired. Single tags indicate
    formatting information that does not affect the surrounding text (an extra
    space or line break for example).</para>

    <para><literal>&lt;b0&gt;&lt;Ctr+N&gt;&lt;/b0&gt;,
    &lt;br1&gt;&lt;b2&gt;&lt;Enter&gt;&lt;/b2&gt;&lt;segment 2132&gt;
    </literal></para>

    <para>&lt;br1&gt; is a single tag and does not affect any surrounding
    text. Paired tags usually indicate style information that applies to the
    text between the opening tag and the closing tag of a pair. &lt;b0&gt; and
    &lt;/b0&gt; below are paired and affect the text log.txt. Note that the
    opening tag must always come before the corresponding closing tag:</para>

    <para>&lt;<literal>Log file (&lt;b0&gt;log.txt&lt;/b0&gt;) for tracking
    operations and errors.&lt;segment 3167&gt;</literal></para>

    <para><application>OmegaT</application> creates its tags before the
    process of sentence segmenting. Depending upon the segmenting rules, the
    pair of tags may get separated into two consecutive segments and the tag
    validation will err on the side of caution and mark the two
    segments.</para>
  </section>

  <section id="tag.operations">
    <title>Tag operations<indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Operations</secondary>
      </indexterm></title>

    <para>Care must be exercised with tags. If they are accidentally changed,
    the formatting of the final file may be corrupted. The basic rule is that
    the sequence of tags must be preserved in the same order. However, it is
    possible, if certain rules are strictly followed, to deviate from this
    basic rule.</para>

    <para><emphasis role="bold"><indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Duplication</secondary>
      </indexterm>Tag duplication: </emphasis></para>

    <para>To duplicate tag groups, just copy them in the position of your
    choice. Keep in mind that in a pair group, the opening tag must come
    before the closing tag. The formatting represented by the group you have
    duplicated will be applied to both sections.</para>

    <para>Example:</para>

    <para><literal>&lt;b0&gt;This formatting&lt;/b0&gt; is going to be
    duplicated here.&lt;segment 0001&gt; </literal></para>

    <para>After duplication:</para>

    <para><literal>&lt;b0&gt;This formatting&lt;/b0&gt; has been
    &lt;b0&gt;duplicated here&lt;/b0&gt;.&lt;segment 0001&gt;
    </literal></para>

    <para><emphasis role="bold"><indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Group deletion</secondary>
      </indexterm>Tag group deletion:</emphasis></para>

    <para>To delete tag groups, just remove them from the segment. Keep in
    mind that a pair group must have both its opening and its closing tag
    deleted to ensure that all traces of the formatting are properly erased,
    otherwise the translated file may become corrupted. By deleting a tag
    group you will remove the related formatting from the translated
    file.</para>

    <para>Example:</para>

    <para><literal>&lt;b0&gt;This formatting&lt;/b0&gt; is going to be
    deleted.&lt;segment 0001&gt; </literal></para>

    <para>After deletion:</para>

    <para><literal>This formatting has been deleted.&lt;segment 0001&gt;
    </literal></para>
  </section>

  <section id="tag.group.nesting">
    <title>Tag group nesting<indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Group nesting</secondary>
      </indexterm></title>

    <para>Modifying tag group order may result in the nesting of a tag group
    within another tag group. This is acceptable, provided the enclosing group
    totally encloses the enclosed group. In other words, when moving paired
    tags, ensure that both the opening and the closing tag are both either
    inside or outside other tag pairs, or the translated file may be corrupted
    and fail to open.</para>

    <para>Example:</para>

    <para><literal>&lt;b0&gt;Formatting&lt;/b0&gt; &lt;b1&gt;one&lt;/b1&gt; is
    going to be nested inside formatting zero.&lt;segment 0001&gt;
    </literal></para>

    <para>After nesting:</para>

    <para><literal>&lt;b0&gt;Formatting &lt;b1&gt;one&lt;/b1&gt;&lt;/b0&gt;
    has been nested inside formatting zero.&lt;segment
    0001&gt;</literal></para>
  </section>

  <section id="tag.group.overlapping">
    <title>Tag group overlapping<indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Group overlapping</secondary>
      </indexterm></title>

    <para>Overlapping is the result of bad manipulations of tag pairs and is
    guaranteed to result in formatting corruption and sometimes in the
    translated file not opening at all.</para>

    <para>Example:</para>

    <para><literal>&lt;b0&gt;Formatting&lt;/b0&gt; &lt;b1&gt;one&lt;/b1&gt; is
    going to be messed up.&lt;segment 0001&gt; </literal></para>

    <para>After a bad manipulation:</para>

    <para><literal>&lt;b0&gt;Formatting &lt;b1&gt;one&lt;/b0&gt; &lt;/b1&gt;is
    very messed up now.&lt;segment 0001&gt;</literal></para>
  </section>

  <section>
    <title>Tag validation options</title>

    <para>To customize the work with tags, one can set down some of the rules
    in the <guimenuitem>Options &gt; Tag validation...</guimenuitem>
    window:</para>

    <mediaobject>
      <imageobject>
        <imagedata fileref="images/OptionsTagValidation_25.png"/>
      </imageobject>
    </mediaobject>

    <para>The behaviour, stated here, applies to all the source files and not
    just to some of the file types, like formatted text.</para>

    <itemizedlist>
      <listitem>
        <para><emphasis role="bold">Printf variables - do not check, check
        simple, check all</emphasis></para>

        <para>OmegaT can check that programming variables (like %s for
        instance) in the source exist in the translation. You can decide not
        to check at all, check for simple printf variables (like %s %d etc) or
        for print variables of all types.</para>
      </listitem>
    </itemizedlist>

    <itemizedlist>
      <listitem>
        <para><emphasis role="bold">Check simple java MessageFormat
        patterns</emphasis></para>

        <para>Activating this check box will cause OmegaT to check if simple
        java MessageFormat tags (like {0}) are processed correctly.</para>
      </listitem>
    </itemizedlist>

    <itemizedlist>
      <listitem>
        <para><emphasis role="bold">Custom tag(s) regular
        expression</emphasis></para>

        <para>A regular expression entered here will cause OmegaT treat the
        detected instances as customer tags. It checks that the number of tags
        and their order is identical, just like it is the case for
        omegat-tags.</para>
      </listitem>
    </itemizedlist>

    <itemizedlist>
      <listitem>
        <para><emphasis role="bold">Fragment(s) that should be removed from
        the translation regular expression</emphasis></para>

        <para>One can enter a regular expression for unwanted contents in the
        target. Any matches in the target segment will then be painted red,
        i.e. easy to identify and correct. When looking for fuzzy matches the
        remove pattern is ignored. A fixed penalty of 5 is added if the
        removed part does not match some other segment, so the match does not
        show up as 100%</para>
      </listitem>
    </itemizedlist>
  </section>

  <section id="tag.group.validation">
    <title>Tag group validation<indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Group validation</secondary>
      </indexterm></title>

    <para>The validate tags function detects changes to tag sequences (whether
    deliberate or accidental), and shows the affected segments. Launching this
    function – <guimenuitem>Ctrl+Shift+V<indexterm class="singular">
        <primary>Shortcuts</primary>

        <secondary>Tag validation - Ctrl+T</secondary>
      </indexterm></guimenuitem> - opens a window containing all segments in
    the file containing suspected broken or bad tags in the translation.
    Repairing the tags and recreating the target documents is easy with the
    validate tags function. The window that opens when
    <guimenuitem>Ctrl+Shift+V</guimenuitem> is pressed features a 3-column
    table with a link to the segment, the original segment and the target
    segment</para>

    <figure id="tag.validation">
      <title>Tag validation entry</title>

      <mediaobject>
        <imageobject role="html">
          <imagedata fileref="images/TagValidation.png"/>
        </imageobject>

        <imageobject role="fo">
          <imagedata fileref="images/TagValidation.png" width="90%"/>
        </imageobject>
      </mediaobject>
    </figure>

    <para>The tags are highlighted in bold blue for easy comparison between
    the original and the translated contents. Click on the link to activate
    the segment in the Editor. Correct the error if necessary (in the case
    above it is the missing &lt;i2&gt;&lt;/i2&gt; pair) and press
    <guimenuitem>Ctrl+Shift+V</guimenuitem> to return to the tag validation
    window to correct other errors. Tag errors are tag sequences in the
    translation in which the same tag order and number as in the original
    segment is not reproduced. Some tag manipulations are necessary and are
    benign, others will cause problems when the translated document is
    created.</para>
  </section>

  <section id="hints.for.tag.management">
    <title>Hints for tags management<indexterm class="singular">
        <primary>Tags</primary>

        <secondary>Hints</secondary>
      </indexterm></title>

    <para><emphasis role="bold">Simplify the original text</emphasis></para>

    <para>Tags generally represent formatting in some form of the original
    text. Simplifying the original formatting greatly contributes to reducing
    the number of tags. Where circumstances permit, unifying used fonts, font
    sizes, colors, etc. should be considered, as it could simplify the
    translation and reduce the potential for tag errors. Read the tag
    operations section to see what can be done with tags. Remember that if you
    find tags a problem in OmegaT and formatting is not extremely relevant for
    the current translation, removing tags may be the easiest way out of
    problems.</para>

    <para><emphasis role="bold">Pay extra attention to tag
    pairs</emphasis></para>

    <para>If you need to see tags in OmegaT but do not need to retain most of
    the formatting in the translated document you are free not to include tags
    in the translation. In this case pay extra attention to tag pairs since
    deleting one side of the pair but forgetting to delete the other is
    guaranteed to corrupt your document's formatting. Since tags are included
    in the text itself, it is possible to use segmentation rules to create
    segments with fewer tags. This is an advanced feature and some experience
    is required in order for it to be applied properly.</para>

    <para>OmegaT is not yet able to detect mistakes in formatting fully
    automatically, so it will not prompt you if you make an error or change
    formatting to fit your target language better. Sometimes, however, your
    translated file may look strange, and – in the worst case – may even
    refuse to open.</para>
  </section>
</chapter>
	<?xml version="1.0" encoding="UTF-8"?>
	<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
	"../../../docbook-xml-4.5/docbookx.dtd">
	<chapter id="chapter.formatted.text">
	<title>Working with formatted text<indexterm class="singular">
	<primary>Target files</primary>

	<secondary>Formatted text</secondary>
	</indexterm><indexterm class="singular">
	<primary>Source files</primary>

	<secondary>Formatted text</secondary>
	</indexterm></title>

	<para>Formatting information present in the source file usually needs to be
	reproduced in the target file. The in-line formatting information made
	possible by the supported formats (in particular DocBook, HTML, XHTML, Open
	Document Format(ODF) and Office Open XML (MS Office 2007 and later) at the
	time of writing) is presented as tags in OmegaT. Normally tags are ignored
	when considering the similarity between different texts for matching
	purposes. Tags reproduced in the translated segment will be present in the
	translated document.</para>

	<section id="formatting.tags">
	<title>Formatting tags<indexterm class="singular">
	<primary>Tags</primary>
	</indexterm></title>

	<para><indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Naming</secondary>
	</indexterm><emphasis role="bold">Tag naming:</emphasis></para>

	<para>The tags consist of one to three characters and a number. Unique
	numbering allows tags, corresponding to each other to be grouped together
	and differentiates between tags, that have the same shortcut character,
	but are in fact different. The shortcut characters used try to reflect the
	underlying meaning of the tag (e.g. b for bold, i for italics,
	etc.)</para>

	<para><indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Numbering</secondary>
	</indexterm><emphasis role="bold">Tag numbering:</emphasis></para>

	<para>Tags are numbered incrementally by tag group. "Tag groups" in this
	context are a single tag (such as <i0> and </i0>). Within a
	segment, the first group (pair or single) receives the number 0, the
	second the number 1 etc. The first example below has 3 tag groups (a pair,
	a single, and then another pair), the second example has one group only (a
	pair).</para>

	<para><indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Pairs and singles</secondary>
	</indexterm><emphasis role="bold">Pairs and singles:</emphasis></para>

	<para>Tags are always either singles or paired. Single tags indicate
	formatting information that does not affect the surrounding text (an extra
	space or line break for example).</para>

	<para><literal><b0><Ctr+N></b0>,
	<br1><b2><Enter></b2><segment 2132>
	</literal></para>

	<para><br1> is a single tag and does not affect any surrounding
	text. Paired tags usually indicate style information that applies to the
	text between the opening tag and the closing tag of a pair. <b0> and
	</b0> below are paired and affect the text log.txt. Note that the
	opening tag must always come before the corresponding closing tag:</para>

	<para><<literal>Log file (<b0>log.txt</b0>) for tracking
	operations and errors.<segment 3167></literal></para>

	<para><application>OmegaT</application> creates its tags before the
	process of sentence segmenting. Depending upon the segmenting rules, the
	pair of tags may get separated into two consecutive segments and the tag
	validation will err on the side of caution and mark the two
	segments.</para>
	</section>

	<section id="tag.operations">
	<title>Tag operations<indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Operations</secondary>
	</indexterm></title>

	<para>Care must be exercised with tags. If they are accidentally changed,
	the formatting of the final file may be corrupted. The basic rule is that
	the sequence of tags must be preserved in the same order. However, it is
	possible, if certain rules are strictly followed, to deviate from this
	basic rule.</para>

	<para><emphasis role="bold"><indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Duplication</secondary>
	</indexterm>Tag duplication: </emphasis></para>

	<para>To duplicate tag groups, just copy them in the position of your
	choice. Keep in mind that in a pair group, the opening tag must come
	before the closing tag. The formatting represented by the group you have
	duplicated will be applied to both sections.</para>

	<para>Example:</para>

	<para><literal><b0>This formatting</b0> is going to be
	duplicated here.<segment 0001> </literal></para>

	<para>After duplication:</para>

	<para><literal><b0>This formatting</b0> has been
	<b0>duplicated here</b0>.<segment 0001>
	</literal></para>

	<para><emphasis role="bold"><indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Group deletion</secondary>
	</indexterm>Tag group deletion:</emphasis></para>

	<para>To delete tag groups, just remove them from the segment. Keep in
	mind that a pair group must have both its opening and its closing tag
	deleted to ensure that all traces of the formatting are properly erased,
	otherwise the translated file may become corrupted. By deleting a tag
	group you will remove the related formatting from the translated
	file.</para>

	<para>Example:</para>

	<para><literal><b0>This formatting</b0> is going to be
	deleted.<segment 0001> </literal></para>

	<para>After deletion:</para>

	<para><literal>This formatting has been deleted.<segment 0001>
	</literal></para>
	</section>

	<section id="tag.group.nesting">
	<title>Tag group nesting<indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Group nesting</secondary>
	</indexterm></title>

	<para>Modifying tag group order may result in the nesting of a tag group
	within another tag group. This is acceptable, provided the enclosing group
	totally encloses the enclosed group. In other words, when moving paired
	tags, ensure that both the opening and the closing tag are both either
	inside or outside other tag pairs, or the translated file may be corrupted
	and fail to open.</para>

	<para>Example:</para>

	<para><literal><b0>Formatting</b0> <b1>one</b1> is
	going to be nested inside formatting zero.<segment 0001>
	</literal></para>

	<para>After nesting:</para>

	<para><literal><b0>Formatting <b1>one</b1></b0>
	has been nested inside formatting zero.<segment
	0001></literal></para>
	</section>

	<section id="tag.group.overlapping">
	<title>Tag group overlapping<indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Group overlapping</secondary>
	</indexterm></title>

	<para>Overlapping is the result of bad manipulations of tag pairs and is
	guaranteed to result in formatting corruption and sometimes in the
	translated file not opening at all.</para>

	<para>Example:</para>

	<para><literal><b0>Formatting</b0> <b1>one</b1> is
	going to be messed up.<segment 0001> </literal></para>

	<para>After a bad manipulation:</para>

	<para><literal><b0>Formatting <b1>one</b0> </b1>is
	very messed up now.<segment 0001></literal></para>
	</section>

	<section>
	<title>Tag validation options</title>

	<para>To customize the work with tags, one can set down some of the rules
	in the <guimenuitem>Options > Tag validation...</guimenuitem>
	window:</para>

	<mediaobject>
	<imageobject>
	<imagedata fileref="images/OptionsTagValidation_25.png"/>
	</imageobject>
	</mediaobject>

	<para>The behaviour, stated here, applies to all the source files and not
	just to some of the file types, like formatted text.</para>

	<itemizedlist>
	<listitem>
	<para><emphasis role="bold">Printf variables - do not check, check
	simple, check all</emphasis></para>

	<para>OmegaT can check that programming variables (like %s for
	instance) in the source exist in the translation. You can decide not
	to check at all, check for simple printf variables (like %s %d etc) or
	for print variables of all types.</para>
	</listitem>
	</itemizedlist>

	<itemizedlist>
	<listitem>
	<para><emphasis role="bold">Check simple java MessageFormat
	patterns</emphasis></para>

	<para>Activating this check box will cause OmegaT to check if simple
	java MessageFormat tags (like {0}) are processed correctly.</para>
	</listitem>
	</itemizedlist>

	<itemizedlist>
	<listitem>
	<para><emphasis role="bold">Custom tag(s) regular
	expression</emphasis></para>

	<para>A regular expression entered here will cause OmegaT treat the
	detected instances as customer tags. It checks that the number of tags
	and their order is identical, just like it is the case for
	omegat-tags.</para>
	</listitem>
	</itemizedlist>

	<itemizedlist>
	<listitem>
	<para><emphasis role="bold">Fragment(s) that should be removed from
	the translation regular expression</emphasis></para>

	<para>One can enter a regular expression for unwanted contents in the
	target. Any matches in the target segment will then be painted red,
	i.e. easy to identify and correct. When looking for fuzzy matches the
	remove pattern is ignored. A fixed penalty of 5 is added if the
	removed part does not match some other segment, so the match does not
	show up as 100%</para>
	</listitem>
	</itemizedlist>
	</section>

	<section id="tag.group.validation">
	<title>Tag group validation<indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Group validation</secondary>
	</indexterm></title>

	<para>The validate tags function detects changes to tag sequences (whether
	deliberate or accidental), and shows the affected segments. Launching this
	function – <guimenuitem>Ctrl+Shift+V<indexterm class="singular">
	<primary>Shortcuts</primary>

	<secondary>Tag validation - Ctrl+T</secondary>
	</indexterm></guimenuitem> - opens a window containing all segments in
	the file containing suspected broken or bad tags in the translation.
	Repairing the tags and recreating the target documents is easy with the
	validate tags function. The window that opens when
	<guimenuitem>Ctrl+Shift+V</guimenuitem> is pressed features a 3-column
	table with a link to the segment, the original segment and the target
	segment</para>

	<figure id="tag.validation">
	<title>Tag validation entry</title>

	<mediaobject>
	<imageobject role="html">
	<imagedata fileref="images/TagValidation.png"/>
	</imageobject>

	<imageobject role="fo">
	<imagedata fileref="images/TagValidation.png" width="90%"/>
	</imageobject>
	</mediaobject>
	</figure>

	<para>The tags are highlighted in bold blue for easy comparison between
	the original and the translated contents. Click on the link to activate
	the segment in the Editor. Correct the error if necessary (in the case
	above it is the missing <i2></i2> pair) and press
	<guimenuitem>Ctrl+Shift+V</guimenuitem> to return to the tag validation
	window to correct other errors. Tag errors are tag sequences in the
	translation in which the same tag order and number as in the original
	segment is not reproduced. Some tag manipulations are necessary and are
	benign, others will cause problems when the translated document is
	created.</para>
	</section>

	<section id="hints.for.tag.management">
	<title>Hints for tags management<indexterm class="singular">
	<primary>Tags</primary>

	<secondary>Hints</secondary>
	</indexterm></title>

	<para><emphasis role="bold">Simplify the original text</emphasis></para>

	<para>Tags generally represent formatting in some form of the original
	text. Simplifying the original formatting greatly contributes to reducing
	the number of tags. Where circumstances permit, unifying used fonts, font
	sizes, colors, etc. should be considered, as it could simplify the
	translation and reduce the potential for tag errors. Read the tag
	operations section to see what can be done with tags. Remember that if you
	find tags a problem in OmegaT and formatting is not extremely relevant for
	the current translation, removing tags may be the easiest way out of
	problems.</para>

	<para><emphasis role="bold">Pay extra attention to tag
	pairs</emphasis></para>

	<para>If you need to see tags in OmegaT but do not need to retain most of
	the formatting in the translated document you are free not to include tags
	in the translation. In this case pay extra attention to tag pairs since
	deleting one side of the pair but forgetting to delete the other is
	guaranteed to corrupt your document's formatting. Since tags are included
	in the text itself, it is possible to use segmentation rules to create
	segments with fewer tags. This is an advanced feature and some experience
	is required in order for it to be applied properly.</para>

	<para>OmegaT is not yet able to detect mistakes in formatting fully
	automatically, so it will not prompt you if you make an error or change
	formatting to fit your target language better. Sometimes, however, your
	translated file may look strange, and – in the worst case – may even
	refuse to open.</para>
	</section>
	</chapter>