Issue 126421 - openoffice does not recognize the incoding of txt files.
Summary: openoffice does not recognize the incoding of txt files.
Status: UNCONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: open-import (show other issues)
Version: 4.1.1
Hardware: All Windows, all
: P5 (lowest) Major (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2015-07-24 03:24 UTC by zahra
Modified: 2015-09-23 23:36 UTC (History)
2 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
file with hindi characters encoded with UTF-8 (531 bytes, text/plain)
2015-09-22 23:47 UTC, Pablo Canseco
no flags Details
file with japanese characters encoded with UTF-8 (731 bytes, text/plain)
2015-09-22 23:47 UTC, Pablo Canseco
no flags Details
file with persian characters encoded with Unicode Big Endian (1.82 KB, text/plain)
2015-09-22 23:48 UTC, Pablo Canseco
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description zahra 2015-07-24 03:24:34 UTC
openoffice does not recognize the incoding of txt files. 
when i click on txt file to open it or when i use open with option on windows and select openoffice to open a txt file it does not recognize the incoding of txt file and by default opens it by the last incoding Western Europe (Windows-1252/WinLatin 1) and for example for persian charactors it can not show them correctly and persian txt file becomes unreadible. 
and about english txt files because they have many incoding for them its more difficult. 
in these cases we should know the incoding of txt file and when we dont know it, maybe opens with an incorrect incoding and becomes unreadible. 
openoffice only recognize the txt incoding when we open writer, press control+o, or in the file menu select open and then import a txt file to writer.
Comment 1 Pablo Canseco 2015-09-22 23:47:26 UTC
Created attachment 84945 [details]
file with hindi characters encoded with UTF-8
Comment 2 Pablo Canseco 2015-09-22 23:47:50 UTC
Created attachment 84946 [details]
file with japanese characters encoded with UTF-8
Comment 3 Pablo Canseco 2015-09-22 23:48:11 UTC
Created attachment 84947 [details]
file with persian characters encoded with Unicode Big Endian
Comment 4 Pablo Canseco 2015-09-22 23:49:31 UTC
I was unable to replicate this issue on Windows 10.

System Information:
OO : Apache OpenOffice 4.1.1 - AOO411m6(Build:9775)  -  Rev. 1617669
OS : Windows 10 64-bit
CPU: AMD FX8120 3.10 GHz
GPU: AMD Radeon R9 290x 4GB

Steps taken to replicate the issue:
1. Using Windows' Notepad utility, create a file with persian characters and save it with Unicode Big Endian encoding.
2. Create another file, this time with Japanese characters and save it with UTF-8 encoding.
3. Create one more file, this time with Hindi characters and save it with UTF-8 encoding. 
4. Open each file with OpenOffice Writer and see if they are still the same

I have attached the three files I tested with, and I found no encoding problems leading to illegible text. While I am not familiar with these languages, the characters still look the same on Notepad and Writer.
Comment 5 Cem Kaner 2015-09-23 16:42:22 UTC
Open Office 4.2.0 weekly build AOO420m1(Build:9800)  -  Rev. 1692551
Windows 8.1

Click on the three files above. In each case, OOO Writer shows a text-encoding dialog first, then I choose the encoding, then OOO displays the file. If I choose an 8-bit encoding (e.g. UTF-8), the Hindi and Japanese files look OK but the Persian text is clearly not displayed correctly. If I choose 16-bit encoding (Unicode, i.e. UTF-16) then the Persian text is displayed correctly. 

I'm not sure that I see this as an error in OOo.

When I open this with MS-Word, it displays an encoding-selection window that shows me what the text will look like in the encoding that I select. This is a very nice feature, but I am not surprised to not find it in OpenOffice. For me, it would be a welcome enhancement.
Comment 6 Pablo Canseco 2015-09-23 23:36:14 UTC
System Information:
OO : Apache OpenOffice 4.2.0 - AOO420m1(Build:9800)  -  Rev. 1692551
OS : Windows 10 64-bit

I did some follow-up testing based on information from zahra and Cem Kaner.

I found that for the three attached files:

- if the file is opened by right-click -> open with -> OpenOffice Writer , the ASCII Filter Options dialog comes up and the user will need to select the right encoding in order to read the file. If Western Europe (ASCII/US) is chosen for the persian file attachment, the result will be illegible. The same thing happens if that encoding is chosen for the two other files. If Unicode is chosen for the persian file, it comes up fine, and likewise for japanese and hindi if UTF-8 is chosen.

- if the file is opened using the open file dialog (or control+o) as pointed out by zahra, OpenOffice Writer chooses the correct encoding automatically and the ASCII Filter Options dialog doesn't pop up.

OpenOffice Writer behaves differently based on if we use Windows' right-click -> open-with, as opposed to the Open File (ctrl+o) dialog.