Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2015OmegaT/OmegaT/doc_src/en/FileFilters.xml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
609 lines (505 sloc)
22.3 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"../../../docbook-xml-4.5/docbookx.dtd"> | |
<chapter id="chapter.file.filters"> | |
<title>File Filters</title> | |
<para>OmegaT features highly customizable filters, enabling you to configure | |
numerous aspects. File filters are pieces of code capable of:</para> | |
<itemizedlist> | |
<listitem> | |
<para>Reading the document in some specific file format. For instance, | |
plain text files.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>Extracting the translatable content out of the file.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>Automating modifications of the translated document file names by | |
replacing translatable contents with its translation.</para> | |
</listitem> | |
</itemizedlist> | |
<para>To see which file formats can be handled by OmegaT, see the menu | |
<guimenuitem>Options > File Filters ... </guimenuitem></para> | |
<para>Most users will find the default file filter options sufficient. If | |
this is not the case, open the main dialog by selecting<emphasis | |
role="bold"> Options → File Filters...</emphasis> from the main menu. You | |
can also enable project-specific file filters, which will only be used on | |
the current project, by selecting the <emphasis role="bold">File | |
Filters...</emphasis> option in Project Properties.</para> | |
<para>You can enable project specific filters via the <emphasis | |
role="bold">Project → Properties...</emphasis>. Click on <guibutton>File | |
Filters</guibutton> button and activate the check box <guimenuitem>Enable | |
project specific filters</guimenuitem><indexterm class="singular"> | |
<primary>File filters</primary> | |
<secondary>Project specific file filters</secondary> | |
</indexterm>. A copy of the filters configuration will be stored with the | |
project in this case. If you later change filters, only the project filters | |
will be updated, while the user filters stay unchanged.</para> | |
<para><emphasis role="bold">Warning!</emphasis> Should you change filter | |
options whilst a project is open, you must reload the project in order for | |
the changes to take effect.</para> | |
<section id="file.filters.dialog"> | |
<title>File filters dialog<indexterm class="singular"> | |
<primary>File filters</primary> | |
<secondary>Dialog</secondary> | |
</indexterm></title> | |
<para>This dialog lists available file filters. Should you wish not to use | |
OmegaT to translate files of a certain type, you can turn off the | |
corresponding filter by deactivating the check box beside its name. OmegaT | |
will then omit the appropriate files while loading projects, and will copy | |
them unmodified when creating target documents. When you wish to use the | |
filter again, just tick the check box. Click <emphasis | |
role="bold">Defaults</emphasis> to reset the file filters to the default | |
settings. To edit which files in which encodings the filter is to process, | |
select the filter from the list and click <emphasis | |
role="bold">Edit.</emphasis></para> | |
<para>The dialog allows to enable or disable the following options:</para> | |
<itemizedlist> | |
<listitem> | |
<para>Remove leading and trailing tags: uncheck this option to display | |
all the tags including the leading and trailing ones. Warning: in | |
Microsoft Open XML formats (docx, xlsx, etc.), if all tags are | |
displayed, DO NOT write text before the first tag (it is a technical | |
tag that must always begin the segment).</para> | |
</listitem> | |
<listitem> | |
<para>Remove leading and trailing whitespace in non-segmented | |
projects: by default, OmegaT removes leading and trailing whitespace. | |
In non-segmented projects, it is possible to keep it by unchecking this option.</para> | |
</listitem> | |
<listitem> | |
<para>Preserve spaces for all tags: check this option if the source | |
documents contain significant spaces (for layout matters) that must | |
not be ignored.</para> | |
</listitem> | |
<listitem> | |
<para>Ignore file context when identifying segments with alternate | |
translations: by default, OmegaT uses the source file name as part of the | |
identification of an alternate translation. | |
if the option is checked, the source file name will not be used, and | |
alternative translations will take effect in any file as long | |
as the other context (previous/next | |
segments, or some sort of ID depending on the file format) matches.</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
<section id="filters.options"> | |
<title>Filter options<indexterm class="singular"> | |
<primary>File filters</primary> | |
<secondary>Options</secondary> | |
</indexterm></title> | |
<para>Several filters (Text files, XHTML files, HTML and XHTML files, | |
OpenDocument files and Microsoft Open XML files) have one or more specific | |
options. To modify the options select the filter from the list and click | |
on <emphasis role="bold">Options</emphasis>. The available options | |
are:</para> | |
<para><emphasis role="bold">Text files</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Paragraph segmentation on line breaks, empty lines or | |
never:</emphasis></para> | |
<para>if sentence segmentation rules are active, the text will further | |
be segmented according to the option selected here.</para> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">PO files</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Allow blank translations in the target | |
file</emphasis>:</para> | |
<para>If on, when a PO segment (which may be a whole paragraph) is not | |
translated, the translation will be empty in the target file. | |
Technically speaking, the <code>msgstr</code> segment in the PO target | |
file, if created, will be left empty. As this is the standard behavior | |
for PO files, it is on by default. If the option is off, the source | |
text will be copied to the target segment.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Skip PO header</emphasis></para> | |
<para>PO header will be skipped and left unchanged, if this option is | |
checked.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Auto replace 'nplurals=INTEGER; plural=EXPRESSION;' in | |
header</emphasis></para> | |
<para><emphasis>The option allows OmegaT to override the | |
specification in the PO file header and use the default for the | |
selected target language. </emphasis></para> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">XHTML Files</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Add or rewrite encoding declaration in HTML and XHTML | |
files</emphasis>: frequently the target files must have the encoding | |
character set different from the one in the source file (wether it is | |
explicitly defined or implied). Using this option the translator can | |
specify, whether the target files are to have the encoding declaration | |
included. For instance, if the file filter specifies UTF8 as the | |
encoding scheme for the target files, selecting Always will assure | |
that this information is included in the translated files.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Translate the following attributes</emphasis>: the | |
selected attributes will appear as segments in the Editor | |
window.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Start a new paragraph on: the <br></emphasis> | |
HTML tag will constitute a paragraph for segmentation purposes.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Skip text matching regular expression</emphasis>: the | |
text matching the regular expression gets skipped. It is shown | |
rendered red in the tag validator. Text in source segment that matches | |
is shown in italic.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Do not translate the content attribute of meta-tags | |
... :</emphasis> The following meta-tags will not be | |
translated.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Do not translate the content of tags with the | |
following attribute key-value pairs (separate with commas)</emphasis>: | |
a match in the list of key-value pairs will cause the content of tags | |
to be ignored</para> | |
<para>It is sometimes useful to be able make some tags untranslatable | |
based on the value of attributes. For example, <literal><div | |
class="hide"> <span translate="no"></literal> You can define | |
key-value pairs for tags to be left untranslated. For the example | |
above, the field would contain: <literal>class=hide, translate=no | |
</literal></para> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">Microsoft Office Open XML | |
files</emphasis></para> | |
<para>You can select which elements are to be translated. They will appear | |
as separate segments in the translation.</para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Word:</emphasis> non-visible instruction | |
text, comments, footnotes, endnotes, footers</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Excel: </emphasis>comments, sheet | |
names</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Power Point</emphasis>: slide comments, | |
slide masters, slide layouts</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Global:</emphasis> charts, diagrams, | |
drawings, WordArt</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis role="bold">Other Options: </emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Aggregate tags</emphasis>: if checked, tags | |
without translatable text between them will be aggregated into | |
single tags.</para> | |
</listitem> | |
<listitem> | |
<para><emphasis>Preserve spaces for all tags</emphasis>: if | |
checked, "white space" (i.e., spaces and newlines) will be | |
preserved, even if not set technically in the document</para> | |
</listitem> | |
</itemizedlist> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">HTML and XHTML files</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Add or rewrite encoding declaration in HTML and XHTML | |
files</emphasis>: Always (default), Only if (X)HTML file has a header, | |
Only if (X)HTML file has an encoding declaration, Never</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Translate the following attributes</emphasis>: the | |
selected attributes will appear as segments in the Editor | |
window.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Start a new paragraph on</emphasis>: the <br> | |
HTML tag will constitute a paragraph for segmentation purposes.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Skip text matching regular expression</emphasis>: The | |
text, matching the regular expression, will be skipped.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Do not translate the content attribute of meta-tags | |
... :</emphasis> The following meta-tags will not be | |
translated.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Do not translate the content of tags with the | |
following attribute key-value pairs (separate with commas)</emphasis>: | |
a match in the list of key-value pairs will cause the content of tags | |
to be ignored</para> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">Text files</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para><emphasis>Paragraph segmentation on line breaks, empty lines or | |
never</emphasis>:</para> | |
<para>if sentence segmentation rules are active, the text will further | |
be segmented according to the option selected here.</para> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">Open Document Format (ODF) | |
files</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para>You can select which of the following items are to be | |
translated:</para> | |
<para>index entries, bookmarks, bookmark references, notes, comments, | |
presentation notes, links (URL), sheet names</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
<section id="edit.filter.dialog"> | |
<title>Edit filter dialog<indexterm class="singular"> | |
<primary>File filters</primary> | |
<secondary>Editing</secondary> | |
</indexterm></title> | |
<para>This dialog enables you to set up the source filename patterns of | |
files to be processed by the filter, customize the filenames of translated | |
files, and select which encodings should be used for loading the file and | |
saving its translated counterpart. To modify a file filter pattern, either | |
modify the fields directly or click <emphasis role="bold">Edit.</emphasis> | |
To add a new file filter pattern, click <emphasis | |
role="bold">Add</emphasis>. The same dialog is used to add a pattern or to | |
edit a particular pattern. The dialog is useful because it includes a | |
special target filename pattern editor with which you can customize the | |
names of output files.</para> | |
<section id="source.filetype.and.filename.pattern"> | |
<title><indexterm class="singular"> | |
<primary>Source files</primary> | |
<secondary>File type and name pattern</secondary> | |
</indexterm>Source file type, filename pattern<indexterm | |
class="singular"> | |
<primary>File filters</primary> | |
<secondary>File type and name pattern</secondary> | |
</indexterm></title> | |
<para>When OmegaT encounters a file in its source folder, it attempts to | |
select the filter based upon the file's extension. More precisely, | |
OmegaT attempts to match each filter's source filename patterns against | |
the filename. For example, the pattern <literal>*.xhtml | |
</literal>matches any file with the <literal>.xhtml</literal> extension. | |
If the appropriate filter is found, the file is assigned to it for | |
processing. For example, by default, XHTML filters are used for | |
processing files with the .xhtml extension. You can change or add | |
filename patterns for files to be handled by each file. Source filename | |
patterns use wild card characters similar to those used in <emphasis | |
role="bold">Searches. </emphasis>The '*' character matches zero or more | |
characters. The '?' character matches exactly one character. All other | |
characters represent themselves. For example, if you wish the text | |
filter to handle readme files (<literal>readme, read.me</literal>, and | |
<literal>readme.txt</literal>) you should use the pattern | |
<literal>read*</literal>.</para> | |
</section> | |
<section id="source.and.target.files.encoding"> | |
<title>Source and Target file encoding</title> | |
<indexterm class="singular"> | |
<primary>Source files</primary> | |
<secondary>Encoding</secondary> | |
</indexterm> | |
<indexterm class="singular"> | |
<primary>Target files</primary> | |
<secondary>Encoding</secondary> | |
</indexterm> | |
<indexterm class="singular"> | |
<primary>File filters</primary> | |
<secondary>Source, target - encoding</secondary> | |
</indexterm> | |
<para>Only a limited number of file formats specify a mandatory | |
encoding. File formats that do not specify their encoding will use the | |
encoding you set up for the extension that matches their name. For | |
example, by default <literal>.txt</literal> files will be loaded using | |
the default encoding of your operating system. You may change the source | |
encoding for each different source filename pattern. Such files may also | |
be written out in any encoding. By default, the translated file encoding | |
is the same as the source file encoding. Source and target encoding | |
fields use combo boxes with all supported encodings included. | |
<auto> leaves the encoding choice to | |
<application>OmegaT</application>. This is how it works:</para> | |
<itemizedlist> | |
<listitem> | |
<para>OmegaT identifies the source file encoding by using its | |
encoding declaration, if present (HTML files, XML based | |
files)</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>OmegaT is instructed to use a mandatory encoding for certain | |
file formats (Java properties etc)</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>OmegaT uses the default encoding of the operating system for | |
text files.</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
<section id="target.name"> | |
<title>Target filename<indexterm class="singular"> | |
<primary>Target files</primary> | |
<secondary>Filenames</secondary> | |
</indexterm></title> | |
<para>Sometimes you may wish to rename the files you translate | |
automatically, for example adding a language code after the file name. | |
The target filename pattern uses a special syntax, so if you wish to | |
edit this field, you must click <emphasis | |
role="bold">Edit...</emphasis>and use the <indexterm class="singular"> | |
<primary>File filters</primary> | |
<secondary>Dialog</secondary> | |
</indexterm>Edit Pattern Dialog. If you wish to revert to default | |
configuration of the filter, click <emphasis | |
role="bold">Defaults.</emphasis> You may also modify the name directly | |
in the target filename pattern field of the file filters dialog. The | |
Edit Pattern Dialog offers among others the following options:</para> | |
<itemizedlist> | |
<listitem> | |
<para>Default is <literal>${filename}</literal>– full filename of | |
the source file with extension: in this case the name of the | |
translated file is the same as that of the source file.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>${nameOnly}</literal>– allows you to insert only the | |
name of the source file without the extension.</para> | |
</listitem> | |
<listitem> | |
<para><literal>${extension} </literal>- the original file | |
extension</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>${targetLocale}</literal>– target locale code (of a | |
form "xx_YY").</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>${targetLanguage}</literal>– the target language and | |
country code together (of a form "XX-YY").</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>${targetLanguageCode}</literal> – the target language | |
- only "XX"</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>${targetCountryCode}</literal>– the target country - | |
only "YY"</para> | |
</listitem> | |
<listitem> | |
<para><literal>${timestamp-????}</literal> – system date time at | |
generation time in various patterns</para> | |
<para>See <ulink | |
url="http://docs.oracle.com/javase/1.4.2/docs/api/java/text/SimpleDateFormat.html">Oracle | |
documentation</ulink> for examples of the "SimpleDateFormat" | |
patterns</para> | |
</listitem> | |
<listitem> | |
<para><literal>${system-os-name}</literal> - operating system of the | |
computer used</para> | |
</listitem> | |
<listitem> | |
<para><literal>${system-user-name}</literal> - system user | |
name</para> | |
</listitem> | |
<listitem> | |
<para><literal>${system-host-name}</literal> - system host | |
name</para> | |
</listitem> | |
<listitem> | |
<para><literal>${file-source-encoding}</literal> - source file | |
encoding</para> | |
</listitem> | |
<listitem> | |
<para><literal>${file-target-encoding}</literal> - target file | |
encoding</para> | |
</listitem> | |
<listitem> | |
<para><literal>${targetLocaleLCID}</literal> - Microsoft target | |
locale</para> | |
</listitem> | |
</itemizedlist> | |
<para>Additional variants are available for variables ${nameOnly} and | |
${Extension}. In case the file name has ambivalent name, one can apply | |
variables of the form <literal>${name only</literal><emphasis>-extension | |
number</emphasis>} and | |
<literal>${extension</literal>-<emphasis>extension number} </emphasis>. | |
If for example the original file is named Document.xx.docx, the | |
following variables will give the following results:</para> | |
<itemizedlist> | |
<listitem> | |
<para><literal>${nameOnly-0}</literal> Document</para> | |
</listitem> | |
<listitem> | |
<para><literal>${nameOnly-1}</literal> Document.xx</para> | |
</listitem> | |
<listitem> | |
<para><literal>${nameOnly-2}</literal> Document.xx.docx</para> | |
</listitem> | |
<listitem> | |
<para><literal>${extension-0}</literal> docx</para> | |
</listitem> | |
<listitem> | |
<para><literal>${extension-1}</literal> xx.docx</para> | |
</listitem> | |
<listitem> | |
<para><literal>${extension-2}</literal> Document.xx.docx</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
</section> | |
</chapter> |