Apache OpenOffice (AOO) Bugzilla – Issue 14330
regrouping options for Thai characters set for HTML export
Last modified: 2013-08-07 15:00:08 UTC
at pull-down menu: Tools --> Options in Options dialog: Load/Save --> HTML Compatibility at Export section: in Character set drop-down list: for Thai character set, there're 2 options 1) "Thai (ISO-8859-11 / TIS-620)" 2) "Thai (Windows-874)" ---- in (1) option, I think ISO-8859-11 and TIS-620 should be separated into two options. - "Thai (ISO-8859-11)" - "Thai (TIS-620)" Because, when users need to export file to HTML, they want to specify an exact character set. (ISO-8859-11 or TIS-620). For an import task, ISO-8859-11 and TIS-620 can be combined, since they are very similar. But for an export task, we need to distinguish them clearly. Since we never know what the purpose of the user's export. ---- in the (2) option, "Windows-874" is not a standard encoding for Thai characters. It is not certified by IANA to use in HTML/XML, or over HTTP transmission. OOo should remove the "Thai (Windows-874)" option out. "Windows-874" is require only for import purpose, but not export. ---- Conclusion: - remove Windows-874 - separated TIS-620 and ISO-8859-11 as the result, after a fix, the options of Thai character set should be 1) "Thai (ISO-8859-11)" 2) "Thai (TIS-620)"
DL->FT: Would you please takeover?
This fits pretty much in an already ongoing issue in our internal bugtracking system.
FT->MIB: Is that somehow duplicatew to any of your issues?
No, it is not. Regarding issue (1): The various encodings are implemented in the SAL layer. SB has to have a look for the reason we are not seperating between ISI-8859-11 and TIS-620 for the export. For we import, there is nothing to do. Regarding issue (2): The econding one can choose are the ones that are widely used, regardless whether they are certified or not. That's what OOo users expect. So there is nothing to do here as well.
From what I know (http://www.inet.co.th/cyberclub/trin/thairef/index.html, mail communication with Arthit in May 2002), there officially is no such thing as ISO-8859-11, but it is in inofficial use to mean about the same as TIS-620. Also windows-874, though not officially registered with IANA, is supposed to be in use. The outcome of the mentioned mail communication was the decision to internally (in sal/textenc) support two text encodings: TIS-620 (with ISO-8859-11 being identical to it) and windows-874 (a superset of TIS-620). Regarding issue 1 (seperating ISO-8859-11 and TIS-620): Since there is no official ISO-8859-11, I see no reason to let the user decide whether to label exported HTML documents as "TIS-620" or "ISO-8859-11". But I guess only Arthit knows enough about Thai conventions to decide whether the inofficial ISO-8859-11 is in widespread enough use to warrant an extra entry, distinct from TIS-620 (whatever the use for this might be). If we indeed want to have both entries, with different behaviour on HTML export (one producing documents with a "TIS-620" charset designator, the other one documents with a "ISO-8859-11" charset designator), and if it would be too difficult to achieve this without adding a new RTL_TEXTENCODING (which in principle it should not be), I could imagine adding a new RTL_TEXTENCODING_ISO_8859_11, besides the existing RTL_TEXTENCODING_TIS_620 (but only then). Regarding issue 2 (dropping windows-874): I have no knowledge about Thai conventions to comment on this.
It seems to me that there are no reasons to change anything. ISO-8859-11 and TIS-620 are technically the same, so it doesn't matter how the encoding is named in the file. Removing Windows-874 is of course possible, but since noone really has to use it, there is no reason to change this either.
.
Case closed.
so, what is the charset that the user will expect to see in this line of the exported HTML file? <meta http-equiv="Content-Type" content="text/html; charset=SOME_ENCODING"> should "SOME_ENCODING" be "ISO-8859-11" or should it be "TIS-620"? or can it be "ISO-8859-11 / TIS-620" ???
Currently it is iso5589-15
From my understanding, ISO 8859-15 is a Latin character set, and was designed for alphabets in Western languages (like English, French, German, Spanish and Portuguese) .. not for Thai alphabets.