Skip to content
Permalink
8bed31719c
Switch branches/tags

Name already in use

A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
Go to file
 
 
Cannot retrieve contributors at this time
172 lines (144 sloc) 6.5 KB
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN"
"../../../docbook-xml-4.5/docbookx.dtd">
<chapter id="chapter.plain.text">
<title>Working with plain text<indexterm class="singular">
<primary>Source files</primary>
<secondary>Plain text files</secondary>
</indexterm></title>
<section id="default.encoding">
<title>Default encoding<indexterm class="singular">
<primary>Encoding</primary>
<secondary>Plain text files</secondary>
</indexterm><indexterm class="singular">
<primary>Source files</primary>
<secondary>Encoding</secondary>
</indexterm></title>
<para>Plain text files - in most cases files with a txt extension -
contain just textual information and offer no clearly defined way to
inform the computer which language they contain. The most that OmegaT can
do in such a case, is to assume that the text is written in the same
language the computer itself uses. This is no problem for files encoded in
Unicode using a 16 bit character encoding set. If the text is encoded in 8
bits, however, one can be faced with the following awkward situation:
instead of displaying the text, for Japanese characters...</para>
<mediaobject>
<imageobject role="html">
<imagedata fileref="images/OmT_Japanese.png"/>
</imageobject>
<imageobject role="fo">
<imagedata fileref="images/OmT_Japanese.png" width="60%"/>
</imageobject>
</mediaobject>
<para>...the system will display it like this for instance:</para>
<mediaobject>
<imageobject role="html">
<imagedata fileref="images/OmT_Cyrillic.png"/>
</imageobject>
<imageobject role="fo">
<imagedata fileref="images/OmT_Cyrillic.png" width="60%"/>
</imageobject>
</mediaobject>
<para>The computer, running OmegaT, has Russian as the default language,
and thus shows the characters in the Cyrillic alphabet and not in
Kanji.</para>
</section>
<section id="OmegaT.solution">
<title>The <application>OmegaT</application> solution</title>
<para>There are basically three ways to address this problem in
<application>OmegaT</application>. They all involve the application of
file filters in the<emphasis role="bold"> Options </emphasis>menu.</para>
<variablelist>
<varlistentry>
<term>Change the encoding of your files to Unicode</term>
<listitem>
<para>open your source file in a text editor that correctly
interprets its encoding and save the file in <emphasis
role="bold">"UTF-8"</emphasis> encoding. Change the file extension
from <literal>.txt</literal> to <literal>.utf8.</literal>
<application>OmegaT</application> will automatically interpret the
file as a UTF-8 file. This is the most common-sense alternative,
sparing you problems in the long run.</para>
</listitem>
</varlistentry>
</variablelist>
<variablelist>
<varlistentry>
<term>Specify the encoding for your plain text files</term>
<listitem>
<para>- i.e. files with a <filename>.txt </filename>extension - : in
the <emphasis role="bold">Text files </emphasis>section of the file
filters dialog, change the <emphasis role="bold">Source File
Encoding</emphasis> from &lt;auto&gt; to the encoding that
corresponds to your source <filename>.txt</filename> file, for
instance to .jp for the above example.</para>
</listitem>
</varlistentry>
</variablelist>
<variablelist>
<varlistentry>
<term>Change the extensions of your plain text source files</term>
<listitem>
<para>for instance from <filename>.txt</filename> to
<filename>.jp</filename> for Japanese plain texts: in the <emphasis
role="bold">Text files</emphasis> section of the file filters
dialog, add new <emphasis role="bold">Source Filename
Pattern</emphasis> (<filename>*.jp</filename> for this example) and
select the appropriate parameters for the source and target
encoding</para>
</listitem>
</varlistentry>
</variablelist>
<para><application>OmegaT</application> has by default the following short
list available to make it easier for you to deal with some plain text
files:</para>
<itemizedlist>
<listitem>
<para><literal>.txt</literal> files are automatically (&lt;auto&gt;)
interpreted by <application>OmegaT</application> as being encoded in
the computer's default encoding.</para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>.txt1</literal> files are files in ISO-8859-1, covering
most <emphasis role="bold">Western Europe</emphasis>
languages.<indexterm class="singular">
<primary>Encoding</primary>
<secondary>Western</secondary>
</indexterm></para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>.txt2</literal> files are files in ISO-8859-2, that
covers most <emphasis role="bold">Central and Eastern
Europe</emphasis> languages<indexterm class="singular">
<primary>Encoding</primary>
<secondary>Central and Eastern European</secondary>
</indexterm></para>
</listitem>
</itemizedlist>
<itemizedlist>
<listitem>
<para><literal>.utf8</literal> files are interpreted by
<application>OmegaT</application> as being encoded in UTF-8 (an
encoding that covers almost all languages in the world).<indexterm
class="singular">
<primary>Encoding</primary>
<secondary>Unicode</secondary>
</indexterm></para>
</listitem>
</itemizedlist>
<para>You can check that yourself by selecting the item <emphasis
role="bold">File Filters</emphasis> in the menu <emphasis
role="bold">Options</emphasis>. For example, when you have a Czech text
file (very probably written in the <emphasis
role="bold">ISO-8859-2</emphasis> code) you just need to change the
extension<literal> .txt</literal> to <literal>.txt2 </literal>and
<application>OmegaT</application> will interpret its contents correctly.
And of course, if you wish to be on the safe side, consider converting
this kind of file to Unicode, i.e. to the <literal>.utf8 </literal>file
format.</para>
</section>
</chapter>