Skip to content
Permalink
8bed31719c
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
609 lines (505 sloc) 22.3 KB
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../docbook-xml-4.5/docbookx.dtd">
<chapter id="chapter.file.filters">
<title>File Filters</title>
<para>OmegaT features highly customizable filters, enabling you to configure
numerous aspects. File filters are pieces of code capable of:</para>
<itemizedlist>
<listitem>
<para>Reading the document in some specific file format. For instance,
plain text files.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Extracting the translatable content out of the file.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>Automating modifications of the translated document file names by
replacing translatable contents with its translation.</para>
</listitem>
</itemizedlist>
<para>To see which file formats can be handled by OmegaT, see the menu
<guimenuitem>Options &gt; File Filters ... </guimenuitem></para>
<para>Most users will find the default file filter options sufficient. If
this is not the case, open the main dialog by selecting<emphasis
role="bold"> Options → File Filters...</emphasis> from the main menu. You
can also enable project-specific file filters, which will only be used on
the current project, by selecting the <emphasis role="bold">File
Filters...</emphasis> option in Project Properties.</para>
<para>You can enable project specific filters via the <emphasis
role="bold">Project → Properties...</emphasis>. Click on <guibutton>File
Filters</guibutton> button and activate the check box <guimenuitem>Enable
project specific filters</guimenuitem><indexterm class="singular">
<primary>File filters</primary>
<secondary>Project specific file filters</secondary>
</indexterm>. A copy of the filters configuration will be stored with the
project in this case. If you later change filters, only the project filters
will be updated, while the user filters stay unchanged.</para>
<para><emphasis role="bold">Warning!</emphasis> Should you change filter
options whilst a project is open, you must reload the project in order for
the changes to take effect.</para>
<section id="file.filters.dialog">
<title>File filters dialog<indexterm class="singular">
<primary>File filters</primary>
<secondary>Dialog</secondary>
</indexterm></title>
<para>This dialog lists available file filters. Should you wish not to use
OmegaT to translate files of a certain type, you can turn off the
corresponding filter by deactivating the check box beside its name. OmegaT
will then omit the appropriate files while loading projects, and will copy
them unmodified when creating target documents. When you wish to use the
filter again, just tick the check box. Click <emphasis
role="bold">Defaults</emphasis> to reset the file filters to the default
settings. To edit which files in which encodings the filter is to process,
select the filter from the list and click <emphasis
role="bold">Edit.</emphasis></para>
<para>The dialog allows to enable or disable the following options:</para>
<itemizedlist>
<listitem>
<para>Remove leading and trailing tags: uncheck this option to display
all the tags including the leading and trailing ones. Warning: in
Microsoft Open XML formats (docx, xlsx, etc.), if all tags are
displayed, DO NOT write text before the first tag (it is a technical
tag that must always begin the segment).</para>
</listitem>
<listitem>
<para>Remove leading and trailing whitespace in non-segmented
projects: by default, OmegaT removes leading and trailing whitespace.
In non-segmented projects, it is possible to keep it by unchecking this option.</para>
</listitem>
<listitem>
<para>Preserve spaces for all tags: check this option if the source
documents contain significant spaces (for layout matters) that must
not be ignored.</para>
</listitem>
<listitem>
<para>Ignore file context when identifying segments with alternate
translations: by default, OmegaT uses the source file name as part of the
identification of an alternate translation.
if the option is checked, the source file name will not be used, and
alternative translations will take effect in any file as long
as the other context (previous/next
segments, or some sort of ID depending on the file format) matches.</para>
</listitem>
</itemizedlist>
</section>
<section id="filters.options">
<title>Filter options<indexterm class="singular">
<primary>File filters</primary>
<secondary>Options</secondary>
</indexterm></title>
<para>Several filters (Text files, XHTML files, HTML and XHTML files,
OpenDocument files and Microsoft Open XML files) have one or more specific
options. To modify the options select the filter from the list and click
on <emphasis role="bold">Options</emphasis>. The available options
are:</para>
<para><emphasis role="bold">Text files</emphasis></para>
<itemizedlist>
<listitem>
<para><emphasis>Paragraph segmentation on line breaks, empty lines or
never:</emphasis></para>
<para>if sentence segmentation rules are active, the text will further
be segmented according to the option selected here.</para>
</listitem>
</itemizedlist>
<para><emphasis role="bold">PO files</emphasis></para>
<itemizedlist>
<listitem>
<para><emphasis>Allow blank translations in the target
file</emphasis>:</para>
<para>If on, when a PO segment (which may be a whole paragraph) is not
translated, the translation will be empty in the target file.
Technically speaking, the <code>msgstr</code> segment in the PO target
file, if created, will be left empty. As this is the standard behavior
for PO files, it is on by default. If the option is off, the source
text will be copied to the target segment.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Skip PO header</emphasis></para>
<para>PO header will be skipped and left unchanged, if this option is
checked.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Auto replace 'nplurals=INTEGER; plural=EXPRESSION;' in
header</emphasis></para>
<para><emphasis>The option allows OmegaT to override the
specification in the PO file header and use the default for the
selected target language. </emphasis></para>
</listitem>
</itemizedlist>
<para><emphasis role="bold">XHTML Files</emphasis></para>
<itemizedlist>
<listitem>
<para><emphasis>Add or rewrite encoding declaration in HTML and XHTML
files</emphasis>: frequently the target files must have the encoding
character set different from the one in the source file (wether it is
explicitly defined or implied). Using this option the translator can
specify, whether the target files are to have the encoding declaration
included. For instance, if the file filter specifies UTF8 as the
encoding scheme for the target files, selecting Always will assure
that this information is included in the translated files.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Translate the following attributes</emphasis>: the
selected attributes will appear as segments in the Editor
window.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Start a new paragraph on: the &lt;br&gt;</emphasis>
HTML tag will constitute a paragraph for segmentation purposes.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Skip text matching regular expression</emphasis>: the
text matching the regular expression gets skipped. It is shown
rendered red in the tag validator. Text in source segment that matches
is shown in italic.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Do not translate the content attribute of meta-tags
... :</emphasis> The following meta-tags will not be
translated.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Do not translate the content of tags with the
following attribute key-value pairs (separate with commas)</emphasis>:
a match in the list of key-value pairs will cause the content of tags
to be ignored</para>
<para>It is sometimes useful to be able make some tags untranslatable
based on the value of attributes. For example, <literal>&lt;div
class="hide"&gt; &lt;span translate="no"&gt;</literal> You can define
key-value pairs for tags to be left untranslated. For the example
above, the field would contain: <literal>class=hide, translate=no
</literal></para>
</listitem>
</itemizedlist>
<para><emphasis role="bold">Microsoft Office Open XML
files</emphasis></para>
<para>You can select which elements are to be translated. They will appear
as separate segments in the translation.</para>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Word:</emphasis> non-visible instruction
text, comments, footnotes, endnotes, footers</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Excel: </emphasis>comments, sheet
names</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Power Point</emphasis>: slide comments,
slide masters, slide layouts</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Global:</emphasis> charts, diagrams,
drawings, WordArt</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis role="bold">Other Options: </emphasis></para>
<itemizedlist>
<listitem>
<para><emphasis>Aggregate tags</emphasis>: if checked, tags
without translatable text between them will be aggregated into
single tags.</para>
</listitem>
<listitem>
<para><emphasis>Preserve spaces for all tags</emphasis>: if
checked, "white space" (i.e., spaces and newlines) will be
preserved, even if not set technically in the document</para>
</listitem>
</itemizedlist>
</listitem>
</itemizedlist>
<para><emphasis role="bold">HTML and XHTML files</emphasis></para>
<itemizedlist>
<listitem>
<para><emphasis>Add or rewrite encoding declaration in HTML and XHTML
files</emphasis>: Always (default), Only if (X)HTML file has a header,
Only if (X)HTML file has an encoding declaration, Never</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Translate the following attributes</emphasis>: the
selected attributes will appear as segments in the Editor
window.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Start a new paragraph on</emphasis>: the &lt;br&gt;
HTML tag will constitute a paragraph for segmentation purposes.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Skip text matching regular expression</emphasis>: The
text, matching the regular expression, will be skipped.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Do not translate the content attribute of meta-tags
... :</emphasis> The following meta-tags will not be
translated.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><emphasis>Do not translate the content of tags with the
following attribute key-value pairs (separate with commas)</emphasis>:
a match in the list of key-value pairs will cause the content of tags
to be ignored</para>
</listitem>
</itemizedlist>
<para><emphasis role="bold">Text files</emphasis></para>
<itemizedlist>
<listitem>
<para><emphasis>Paragraph segmentation on line breaks, empty lines or
never</emphasis>:</para>
<para>if sentence segmentation rules are active, the text will further
be segmented according to the option selected here.</para>
</listitem>
</itemizedlist>
<para><emphasis role="bold">Open Document Format (ODF)
files</emphasis></para>
<itemizedlist>
<listitem>
<para>You can select which of the following items are to be
translated:</para>
<para>index entries, bookmarks, bookmark references, notes, comments,
presentation notes, links (URL), sheet names</para>
</listitem>
</itemizedlist>
</section>
<section id="edit.filter.dialog">
<title>Edit filter dialog<indexterm class="singular">
<primary>File filters</primary>
<secondary>Editing</secondary>
</indexterm></title>
<para>This dialog enables you to set up the source filename patterns of
files to be processed by the filter, customize the filenames of translated
files, and select which encodings should be used for loading the file and
saving its translated counterpart. To modify a file filter pattern, either
modify the fields directly or click <emphasis role="bold">Edit.</emphasis>
To add a new file filter pattern, click <emphasis
role="bold">Add</emphasis>. The same dialog is used to add a pattern or to
edit a particular pattern. The dialog is useful because it includes a
special target filename pattern editor with which you can customize the
names of output files.</para>
<section id="source.filetype.and.filename.pattern">
<title><indexterm class="singular">
<primary>Source files</primary>
<secondary>File type and name pattern</secondary>
</indexterm>Source file type, filename pattern<indexterm
class="singular">
<primary>File filters</primary>
<secondary>File type and name pattern</secondary>
</indexterm></title>
<para>When OmegaT encounters a file in its source folder, it attempts to
select the filter based upon the file's extension. More precisely,
OmegaT attempts to match each filter's source filename patterns against
the filename. For example, the pattern <literal>*.xhtml
</literal>matches any file with the <literal>.xhtml</literal> extension.
If the appropriate filter is found, the file is assigned to it for
processing. For example, by default, XHTML filters are used for
processing files with the .xhtml extension. You can change or add
filename patterns for files to be handled by each file. Source filename
patterns use wild card characters similar to those used in <emphasis
role="bold">Searches. </emphasis>The '*' character matches zero or more
characters. The '?' character matches exactly one character. All other
characters represent themselves. For example, if you wish the text
filter to handle readme files (<literal>readme, read.me</literal>, and
<literal>readme.txt</literal>) you should use the pattern
<literal>read*</literal>.</para>
</section>
<section id="source.and.target.files.encoding">
<title>Source and Target file encoding</title>
<indexterm class="singular">
<primary>Source files</primary>
<secondary>Encoding</secondary>
</indexterm>
<indexterm class="singular">
<primary>Target files</primary>
<secondary>Encoding</secondary>
</indexterm>
<indexterm class="singular">
<primary>File filters</primary>
<secondary>Source, target - encoding</secondary>
</indexterm>
<para>Only a limited number of file formats specify a mandatory
encoding. File formats that do not specify their encoding will use the
encoding you set up for the extension that matches their name. For
example, by default <literal>.txt</literal> files will be loaded using
the default encoding of your operating system. You may change the source
encoding for each different source filename pattern. Such files may also
be written out in any encoding. By default, the translated file encoding
is the same as the source file encoding. Source and target encoding
fields use combo boxes with all supported encodings included.
&lt;auto&gt; leaves the encoding choice to
<application>OmegaT</application>. This is how it works:</para>
<itemizedlist>
<listitem>
<para>OmegaT identifies the source file encoding by using its
encoding declaration, if present (HTML files, XML based
files)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>OmegaT is instructed to use a mandatory encoding for certain
file formats (Java properties etc)</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para>OmegaT uses the default encoding of the operating system for
text files.</para>
</listitem>
</itemizedlist>
</section>
<section id="target.name">
<title>Target filename<indexterm class="singular">
<primary>Target files</primary>
<secondary>Filenames</secondary>
</indexterm></title>
<para>Sometimes you may wish to rename the files you translate
automatically, for example adding a language code after the file name.
The target filename pattern uses a special syntax, so if you wish to
edit this field, you must click <emphasis
role="bold">Edit...</emphasis>and use the <indexterm class="singular">
<primary>File filters</primary>
<secondary>Dialog</secondary>
</indexterm>Edit Pattern Dialog. If you wish to revert to default
configuration of the filter, click <emphasis
role="bold">Defaults.</emphasis> You may also modify the name directly
in the target filename pattern field of the file filters dialog. The
Edit Pattern Dialog offers among others the following options:</para>
<itemizedlist>
<listitem>
<para>Default is <literal>${filename}</literal>– full filename of
the source file with extension: in this case the name of the
translated file is the same as that of the source file.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>${nameOnly}</literal>– allows you to insert only the
name of the source file without the extension.</para>
</listitem>
<listitem>
<para><literal>${extension} </literal>- the original file
extension</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>${targetLocale}</literal>– target locale code (of a
form "xx_YY").</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>${targetLanguage}</literal>– the target language and
country code together (of a form "XX-YY").</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>${targetLanguageCode}</literal> – the target language
- only "XX"</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>${targetCountryCode}</literal>– the target country -
only "YY"</para>
</listitem>
<listitem>
<para><literal>${timestamp-????}</literal> – system date time at
generation time in various patterns</para>
<para>See <ulink
url="http://docs.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html">Oracle
documentation</ulink> for examples of the "SimpleDateFormat"
patterns</para>
</listitem>
<listitem>
<para><literal>${system-os-name}</literal> - operating system of the
computer used</para>
</listitem>
<listitem>
<para><literal>${system-user-name}</literal> - system user
name</para>
</listitem>
<listitem>
<para><literal>${system-host-name}</literal> - system host
name</para>
</listitem>
<listitem>
<para><literal>${file-source-encoding}</literal> - source file
encoding</para>
</listitem>
<listitem>
<para><literal>${file-target-encoding}</literal> - target file
encoding</para>
</listitem>
<listitem>
<para><literal>${targetLocaleLCID}</literal> - Microsoft target
locale</para>
</listitem>
</itemizedlist>
<para>Additional variants are available for variables ${nameOnly} and
${Extension}. In case the file name has ambivalent name, one can apply
variables of the form <literal>${name only</literal><emphasis>-extension
number</emphasis>} and
<literal>${extension</literal>-<emphasis>extension number} </emphasis>.
If for example the original file is named Document.xx.docx, the
following variables will give the following results:</para>
<itemizedlist>
<listitem>
<para><literal>${nameOnly-0}</literal> Document</para>
</listitem>
<listitem>
<para><literal>${nameOnly-1}</literal> Document.xx</para>
</listitem>
<listitem>
<para><literal>${nameOnly-2}</literal> Document.xx.docx</para>
</listitem>
<listitem>
<para><literal>${extension-0}</literal> docx</para>
</listitem>
<listitem>
<para><literal>${extension-1}</literal> xx.docx</para>
</listitem>
<listitem>
<para><literal>${extension-2}</literal> Document.xx.docx</para>
</listitem>
</itemizedlist>
</section>
</section>
</chapter>