Issue 14330 - regrouping options for Thai characters set for HTML export
Summary: regrouping options for Thai characters set for HTML export
Status: CLOSED WONT_FIX
Alias: None
Product: Internationalization
Classification: Code
Component: code (show other issues)
Version: 644m11
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: falko.tesch
QA Contact: issues@l10n
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-05-10 17:59 UTC by arthit
Modified: 2013-08-07 15:00 UTC (History)
1 user (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description arthit 2003-05-10 17:59:40 UTC
at pull-down menu:
Tools --> Options

in Options dialog:
Load/Save --> HTML Compatibility

at Export section:
in Character set drop-down list:
for Thai character set, there're 2 options
1) "Thai (ISO-8859-11 / TIS-620)"
2) "Thai (Windows-874)"

----

in (1) option,

I think ISO-8859-11 and TIS-620 should be separated
into two options.

- "Thai (ISO-8859-11)"
- "Thai (TIS-620)"

Because, when users need to export file to HTML,
they want to specify an exact character set.
(ISO-8859-11 or TIS-620).

For an import task,
ISO-8859-11 and TIS-620 can be combined,
since they are very similar.

But for an export task,
we need to distinguish them clearly.
Since we never know what the purpose of the user's export.

----

in the (2) option,

"Windows-874" is not a standard encoding for Thai characters.
It is not certified by IANA to use in HTML/XML, or over HTTP transmission.

OOo should remove the  "Thai (Windows-874)" option out.

"Windows-874" is require only for import purpose,
but not export.

----

Conclusion:

- remove Windows-874
- separated TIS-620 and ISO-8859-11

as the result, after a fix,
the options of Thai character set should be

1) "Thai (ISO-8859-11)"
2) "Thai (TIS-620)"
Comment 1 Dieter.Loeschky 2003-06-05 11:34:18 UTC
DL->FT: Would you please takeover?
Comment 2 falko.tesch 2003-08-15 13:01:24 UTC
This fits pretty much in an already ongoing issue in our internal
bugtracking system.
Comment 3 falko.tesch 2003-10-08 12:01:28 UTC
FT->MIB: Is that somehow duplicatew to any of your issues?
Comment 4 michael.brauer 2003-10-13 09:37:43 UTC
No, it is not.
Regarding issue (1): The various encodings are implemented in the SAL
layer. SB has to have a look for the reason we are not seperating
between ISI-8859-11 and TIS-620 for the export. For we import, there
is nothing to do.
Regarding issue (2): The econding one can choose are the ones that are
widely used, regardless whether they are certified or not. That's what
OOo users expect. So there is nothing to do here as well.
Comment 5 Stephan Bergmann 2003-10-27 13:06:00 UTC
From what I know
(http://www.inet.co.th/cyberclub/trin/thairef/index.html, mail
communication with Arthit in May 2002), there officially is no such
thing as ISO-8859-11, but it is in inofficial use to mean about the
same as TIS-620.  Also windows-874, though not officially registered
with IANA, is supposed to be in use.  The outcome of the mentioned
mail communication was the decision to internally (in sal/textenc)
support two text encodings: TIS-620 (with ISO-8859-11 being identical
to it) and windows-874 (a superset of TIS-620).

Regarding issue 1 (seperating ISO-8859-11 and TIS-620):  Since there
is no official ISO-8859-11, I see no reason to let the user decide
whether to label exported HTML documents as "TIS-620" or
"ISO-8859-11".  But I guess only Arthit knows enough about Thai
conventions to decide whether the inofficial ISO-8859-11 is in
widespread enough use to warrant an extra entry, distinct from TIS-620
(whatever the use for this might be).  If we indeed want to have both
entries, with different behaviour on HTML export (one producing
documents with a "TIS-620" charset designator, the other one documents
with a "ISO-8859-11" charset designator), and if it would be too
difficult to achieve this without adding a new RTL_TEXTENCODING (which
in principle it should not be), I could imagine adding a new
RTL_TEXTENCODING_ISO_8859_11, besides the existing
RTL_TEXTENCODING_TIS_620 (but only then).

Regarding issue 2 (dropping windows-874):  I have no knowledge about
Thai conventions to comment on this.
Comment 6 michael.brauer 2003-10-30 09:11:40 UTC
It seems to me that there are no reasons to change anything.
ISO-8859-11 and TIS-620 are technically the same, so it doesn't matter
how the encoding is named in the file. Removing Windows-874 is of
course possible, but since noone really has to use it, there is no
reason to change this either.
Comment 7 michael.brauer 2003-10-30 09:23:26 UTC
.
Comment 8 falko.tesch 2004-11-26 08:54:09 UTC
Case closed.
Comment 9 arthit 2004-11-26 15:40:46 UTC
so,
what is the charset that the user will expect to see in this line of the
exported HTML file?

  <meta http-equiv="Content-Type" content="text/html; charset=SOME_ENCODING">

should "SOME_ENCODING" be "ISO-8859-11"
or should it be "TIS-620"?

or can it be "ISO-8859-11 / TIS-620" ???

Comment 10 falko.tesch 2004-11-29 12:27:07 UTC
Currently it is iso5589-15
Comment 11 arthit 2004-11-29 13:44:19 UTC
From my understanding,
ISO 8859-15 is a Latin character set,
and was designed for alphabets in Western languages (like English, French, 
German, Spanish and Portuguese)
.. not for Thai alphabets.