Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2015OmegaT/OmegaT/doc_src/en/Glossaries.xml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
314 lines (254 sloc)
12.1 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"../../../docbook-xml-4.5/docbookx.dtd"> | |
<chapter id="chapter.glossaries"> | |
<title>Glossaries<indexterm class="singular"> | |
<primary>Windows and panes in OmegaT</primary> | |
<secondary>Glossary pane</secondary> | |
</indexterm><indexterm class="singular"> | |
<primary>Glossaries</primary> | |
</indexterm></title> | |
<para>Glossaries are files created and updated manually for use in | |
<application>OmegaT</application>.</para> | |
<para>If an <application>OmegaT</application> project contains one or more | |
glossaries, any terms in the glossary which are also found in the current | |
segment will be automatically displayed in the Glossary viewer.</para> | |
<para>You define its location and name in the project properties dialog. The | |
extension must be <filename>.txt</filename> or <filename>.utf8</filename> | |
(if not, it will be added). The location of the file must be within the | |
<filename>/glossary</filename> folder, but it can be in a deeper folder | |
(e.g., <filename>glossary/sub/glossary.txt</filename>). The file does not | |
need to exist when setting it, it will be created (if necessary) when adding | |
a glossary entry. If the file already exists, no attempt is done to verify | |
the format or the character set of the file: the new entries will always be | |
in tab-separated format and UTF-8. As the existing content will not be | |
touched, damage to an existing file would be limited.</para> | |
<section> | |
<title>Usage</title> | |
<para>To use an existing glossary, simply place it in the<indexterm | |
class="singular"> | |
<primary>Project files</primary> | |
<secondary>Glossary subfolder</secondary> | |
</indexterm> <filename>/glossary</filename> folder after creating the | |
project. <application>OmegaT</application> automatically detects glossary | |
files in this folder when a project is opened. Terms in the current | |
segment which <application>OmegaT</application> finds in the glossary | |
file(s) are displayed in the Glossary pane:</para> | |
<indexterm class="singular"> | |
<primary>Glossaries, Glossary pane</primary> | |
</indexterm> | |
<figure> | |
<title>Glossary pane</title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata fileref="images/GlossaryPane_25.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/GlossaryPane_25.png" width="60%"/> | |
</imageobject> | |
</mediaobject> | |
</figure> | |
<para>The word before the = sign is the source term, and its translation | |
is (or are) the words after =. The vocabulary entry can have a comment | |
added. The glossary function only finds exact matches with the glossary | |
entry (e.g. does not find inflected forms etc.). New terms can be added | |
manually to the glossary file(s) during translation, for example in a text | |
editor. Newly added terms will not be recognized once the changes in the | |
text file have been saved.</para> | |
<para>The source term does not have to be a single-word item, as the next | |
example shows:</para> | |
<figure> | |
<title>multiple words entries in glossaries - example<indexterm | |
class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Glossary pane</secondary> | |
<tertiary>multiple-words entries</tertiary> | |
</indexterm></title> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata fileref="images/MultiTerm_Glossary_25.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/MultiTerm_Glossary_25.png" width="80%"/> | |
</imageobject> | |
</mediaobject> | |
</figure> | |
<para>The underlined item "pop-up menu" can be found in the glossary pane | |
as "pojavni menu". Highlighting it in the Glossary pane and then | |
rightclicking insets at the cursor position in the target | |
segment.<footnote> | |
<para>Note that in the above case, this is just half (or even less) of | |
the story, as the target language (Slovenian) uses declension. So the | |
inserted "pojavni meni" in the nominative form - has to be changed to | |
"pojavnem meniju" , i.e. to the locative. So it is probably faster to | |
type the term correctly right away without bothering with the glossary | |
and its shortcuts.</para> | |
</footnote></para> | |
</section> | |
<section> | |
<title><indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>File format</secondary> | |
</indexterm>File format<indexterm class="singular"> | |
<primary>Project files</primary> | |
<secondary>User files</secondary> | |
<seealso>Glossaries</seealso> | |
</indexterm></title> | |
<para>Glossary files are simple plain text files containing three-column, | |
tab-delimited lists with the source and target terms in the first and | |
second columns respectively. The third column can be used for additional | |
information. You can have entries with the target column missing, i.e. | |
just containing the source term and the comment.</para> | |
<para>Glossary files can be either in system default encoding (and | |
indicated by the extension .tab) or in UTF-8 (the extension .utf8 or .txt) | |
or in UTF-16 LE (the extension .txt). Also supported is the CSV format. | |
This format is the same as the tab separated one: source term, target | |
term. Comment fields are separated by a comma ','. Strings can be enclosed | |
by quotes ", which allows having a comma inside a string:</para> | |
<para><literal>"This is a source term, which contains a comma","c'est un | |
terme, qui contient une virgule"</literal></para> | |
<para>In addition to the plain text format, TBX format <indexterm | |
class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>TBX format</secondary> | |
</indexterm> is also supported as a read-only glossary format. The | |
location of the <filename>.tbx</filename> file must be within the | |
<filename>/glossary</filename> folder, but it can be in a deeper folder | |
(e.g., <filename>glossary/sub/MyGlossary.tbx</filename>).</para> | |
<para>TBX - Term Base eXchange - is the open, XML-based standard | |
for exchanging structured terminological data, TBX has been approved as an | |
international standard by LISA and ISO. If you have an existing | |
terminology handling system it is quite possible it offers the export of | |
terminology data via TBX format. <ulink | |
url="http://www.microsoft.com/Language/en-US/Terminology.aspx">Microsoft | |
Terminology Collection</ulink> <indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Microsoft Terminology collection</secondary> | |
</indexterm> can be downloaded in nearly 100 languages and can serve as | |
a cornerstone IT glossary. </para> | |
<para>Note: the <filename>.tbx</filename> output of MultiTerm seems to not | |
be reliable (November 2013), it is better to use the | |
<filename>.tab</filename> output of MultiTerm instead. </para> | |
</section> | |
<section> | |
<title>How to create glossaries<indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Creating a glossary</secondary> | |
</indexterm></title> | |
<para>The project setting allows one can enter a name for a writable | |
glossary file (see beginning of this chapter). Right-click in the glossary | |
pane or press <keycap>Ctrl+Shift+G</keycap> to add a new entry. A dialog | |
opens, allowing you to enter the source term, target term and any comments | |
you may have:</para> | |
<mediaobject role="html"> | |
<imageobject> | |
<imagedata fileref="images/GlossaryEntry_25.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/GlossaryEntry_25.png" width="80%"/> | |
</imageobject> | |
</mediaobject> | |
<para>The contents of glossary files are kept in memory and are loaded | |
when the project is opened or reloaded. Updating a glossary file is thus | |
rather simple: press <keycap>Ctrl+Shift+G</keycap> and enter the new term, | |
its translation and any comments you may have (ensuring you press tab | |
between the fields) and save the file. The contents of the glossary pane | |
will be updated accordingly.</para> | |
<para><indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Location of the writable glossary file</secondary> | |
</indexterm>The location of the writable glossary file can be set in the | |
<guimenuitem>Project > Properties ... </guimenuitem>dialog. The | |
recognized extensions are <filename>TXT</filename> and | |
<filename>UTF8</filename></para> | |
<para><emphasis role="bold">Note: </emphasis>Of course there are other | |
ways and means to create a simple file with tab delimited entries. Nothing | |
speaks against using Notepad++ on Windows, GEdit on Linux for instance or | |
some spreadsheet program for this purpose: any application, that can | |
handle UTF-8 (or UTF-16 LE) and that can show white space (so that you do | |
not miss the required <keycap>TAB</keycap> characters) can be used.</para> | |
</section> | |
<section> | |
<title><indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Priorities</secondary> | |
</indexterm>Priority glossary</title> | |
<para>The results from the priority glossary (by default, | |
glossary/glossary.txt) appear in first places in the Glossary pane and in | |
TransTips.</para> | |
<para>As entries can mix words from priority and non priority glossaries, | |
the words from the priority glossary are displayed in bold.</para> | |
</section> | |
<section> | |
<title><indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Trados MultiTerm</secondary> | |
</indexterm>Using Trados MultiTerm</title> | |
<para>Data exported from Trados MultiTerm can be used as | |
<application>OmegaT</application> glossaries without further modification, | |
provided they are given the file extension <filename>.tab</filename> and | |
the source and target term fields are the first two fields respectively. | |
If you export using the system option "Tab-delimited export", you will | |
need to delete the first 5 columns (Seq. Nr, Date created etc).</para> | |
</section> | |
<section> | |
<title><indexterm class="singular"> | |
<primary>Glossaries</primary> | |
<secondary>Problems with glossaries</secondary> | |
</indexterm>Common glossary problems</title> | |
<para><emphasis role="bold">Problem: No glossary terms are displayed - | |
possible causes:</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para>No glossary file found in the "glossary" folder.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>The glossary file is empty.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>The items are not separated with a TAB character.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>The glossary file does not have the correct extension (.tab, | |
.utf8 or .txt).</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>There is no EXACT match between the glossary entry and the | |
source text in your document - for instance plurals.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>The glossary file does not have the correct encoding.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>There are no terms in the current segment which match any terms | |
in the glossary.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para>One or more of the above problems may have been fixed, but the | |
project has not been reloaded.</para> | |
</listitem> | |
</itemizedlist> | |
<para><emphasis role="bold">Problem: In the glossary pane, some characters | |
are not displayed properly</emphasis></para> | |
<itemizedlist> | |
<listitem> | |
<para>...but the same characters are displayed properly in the Editing | |
pane: the extension and the file encoding do not match.</para> | |
</listitem> | |
</itemizedlist> | |
</section> | |
</chapter> |