Permalink
Cannot retrieve contributors at this time
Name already in use
A tag already exists with the provided branch name. Many Git commands accept both tag and branch names, so creating this branch may cause unexpected behavior. Are you sure you want to create this branch?
2015OmegaT/OmegaT/doc_src/en/PlainText.xml
Go to fileThis commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
172 lines (144 sloc)
6.5 KB
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
<?xml version="1.0" encoding="UTF-8"?> | |
<!DOCTYPE book PUBLIC "-//OASIS//DTD DocBook XML V4.5//EN" | |
"../../../docbook-xml-4.5/docbookx.dtd"> | |
<chapter id="chapter.plain.text"> | |
<title>Working with plain text<indexterm class="singular"> | |
<primary>Source files</primary> | |
<secondary>Plain text files</secondary> | |
</indexterm></title> | |
<section id="default.encoding"> | |
<title>Default encoding<indexterm class="singular"> | |
<primary>Encoding</primary> | |
<secondary>Plain text files</secondary> | |
</indexterm><indexterm class="singular"> | |
<primary>Source files</primary> | |
<secondary>Encoding</secondary> | |
</indexterm></title> | |
<para>Plain text files - in most cases files with a txt extension - | |
contain just textual information and offer no clearly defined way to | |
inform the computer which language they contain. The most that OmegaT can | |
do in such a case, is to assume that the text is written in the same | |
language the computer itself uses. This is no problem for files encoded in | |
Unicode using a 16 bit character encoding set. If the text is encoded in 8 | |
bits, however, one can be faced with the following awkward situation: | |
instead of displaying the text, for Japanese characters...</para> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata fileref="images/OmT_Japanese.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/OmT_Japanese.png" width="60%"/> | |
</imageobject> | |
</mediaobject> | |
<para>...the system will display it like this for instance:</para> | |
<mediaobject> | |
<imageobject role="html"> | |
<imagedata fileref="images/OmT_Cyrillic.png"/> | |
</imageobject> | |
<imageobject role="fo"> | |
<imagedata fileref="images/OmT_Cyrillic.png" width="60%"/> | |
</imageobject> | |
</mediaobject> | |
<para>The computer, running OmegaT, has Russian as the default language, | |
and thus shows the characters in the Cyrillic alphabet and not in | |
Kanji.</para> | |
</section> | |
<section id="OmegaT.solution"> | |
<title>The <application>OmegaT</application> solution</title> | |
<para>There are basically three ways to address this problem in | |
<application>OmegaT</application>. They all involve the application of | |
file filters in the<emphasis role="bold"> Options </emphasis>menu.</para> | |
<variablelist> | |
<varlistentry> | |
<term>Change the encoding of your files to Unicode</term> | |
<listitem> | |
<para>open your source file in a text editor that correctly | |
interprets its encoding and save the file in <emphasis | |
role="bold">"UTF-8"</emphasis> encoding. Change the file extension | |
from <literal>.txt</literal> to <literal>.utf8.</literal> | |
<application>OmegaT</application> will automatically interpret the | |
file as a UTF-8 file. This is the most common-sense alternative, | |
sparing you problems in the long run.</para> | |
</listitem> | |
</varlistentry> | |
</variablelist> | |
<variablelist> | |
<varlistentry> | |
<term>Specify the encoding for your plain text files</term> | |
<listitem> | |
<para>- i.e. files with a <filename>.txt </filename>extension - : in | |
the <emphasis role="bold">Text files </emphasis>section of the file | |
filters dialog, change the <emphasis role="bold">Source File | |
Encoding</emphasis> from <auto> to the encoding that | |
corresponds to your source <filename>.txt</filename> file, for | |
instance to .jp for the above example.</para> | |
</listitem> | |
</varlistentry> | |
</variablelist> | |
<variablelist> | |
<varlistentry> | |
<term>Change the extensions of your plain text source files</term> | |
<listitem> | |
<para>for instance from <filename>.txt</filename> to | |
<filename>.jp</filename> for Japanese plain texts: in the <emphasis | |
role="bold">Text files</emphasis> section of the file filters | |
dialog, add new <emphasis role="bold">Source Filename | |
Pattern</emphasis> (<filename>*.jp</filename> for this example) and | |
select the appropriate parameters for the source and target | |
encoding</para> | |
</listitem> | |
</varlistentry> | |
</variablelist> | |
<para><application>OmegaT</application> has by default the following short | |
list available to make it easier for you to deal with some plain text | |
files:</para> | |
<itemizedlist> | |
<listitem> | |
<para><literal>.txt</literal> files are automatically (<auto>) | |
interpreted by <application>OmegaT</application> as being encoded in | |
the computer's default encoding.</para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>.txt1</literal> files are files in ISO-8859-1, covering | |
most <emphasis role="bold">Western Europe</emphasis> | |
languages.<indexterm class="singular"> | |
<primary>Encoding</primary> | |
<secondary>Western</secondary> | |
</indexterm></para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>.txt2</literal> files are files in ISO-8859-2, that | |
covers most <emphasis role="bold">Central and Eastern | |
Europe</emphasis> languages<indexterm class="singular"> | |
<primary>Encoding</primary> | |
<secondary>Central and Eastern European</secondary> | |
</indexterm></para> | |
</listitem> | |
</itemizedlist> | |
<itemizedlist> | |
<listitem> | |
<para><literal>.utf8</literal> files are interpreted by | |
<application>OmegaT</application> as being encoded in UTF-8 (an | |
encoding that covers almost all languages in the world).<indexterm | |
class="singular"> | |
<primary>Encoding</primary> | |
<secondary>Unicode</secondary> | |
</indexterm></para> | |
</listitem> | |
</itemizedlist> | |
<para>You can check that yourself by selecting the item <emphasis | |
role="bold">File Filters</emphasis> in the menu <emphasis | |
role="bold">Options</emphasis>. For example, when you have a Czech text | |
file (very probably written in the <emphasis | |
role="bold">ISO-8859-2</emphasis> code) you just need to change the | |
extension<literal> .txt</literal> to <literal>.txt2 </literal>and | |
<application>OmegaT</application> will interpret its contents correctly. | |
And of course, if you wish to be on the safe side, consider converting | |
this kind of file to Unicode, i.e. to the <literal>.utf8 </literal>file | |
format.</para> | |
</section> | |
</chapter> |