Issue 9391 - DXF import filter doesn't correctly display Japanese text.
Summary: DXF import filter doesn't correctly display Japanese text.
Status: CLOSED FIXED
Alias: None
Product: Draw
Classification: Application
Component: code (show other issues)
Version: OOo 1.0.1
Hardware: PC Linux, all
: P1 (highest) Trivial (vote)
Target Milestone: OOo 1.1.1
Assignee: khendricks
QA Contact: issues@sw
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2002-11-20 02:10 UTC by tora3
Modified: 2004-02-27 13:37 UTC (History)
6 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
A bugdoc prepared with JW_CAD http://www.jwcad.net/ ; MIME type is image/vnd.dxf (4.71 KB, application/octet-stream)
2002-11-21 02:41 UTC, tora3
no flags Details
An original image on JW_CAD (15.82 KB, image/png)
2002-11-21 02:44 UTC, tora3
no flags Details
Converted image on OpenOffice.org (24.73 KB, image/png)
2002-11-21 02:44 UTC, tora3
no flags Details
a patch to solve this problem (partially) (782 bytes, patch)
2003-10-25 02:40 UTC, maho.nakata
no flags Details | Diff
untested patch that attempts to parse $DWGCODEPAGE (5.00 KB, patch)
2003-10-26 15:40 UTC, khendricks
no flags Details | Diff
a revised patch that seems to work on my machine (5.20 KB, patch)
2003-10-26 16:08 UTC, khendricks
no flags Details | Diff
A perl script to insert $DWGCODEPAGE in DXF file. (2.56 KB, text/plain)
2003-10-26 19:47 UTC, tora3
no flags Details
A DXF sample file that includes $DWGCODEPAGE = DOS932, prepared by the tool above. (4.75 KB, application/octet-stream)
2003-10-26 19:51 UTC, tora3
no flags Details
Patch updated, verified with SS6.1 Beta2 Solaris (ja) (6.10 KB, patch)
2003-10-27 19:53 UTC, tora3
no flags Details | Diff
Fixed by following neighbor classes. Verified on Solaris. (5.04 KB, patch)
2003-10-31 13:02 UTC, tora3
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description tora3 2002-11-20 02:10:30 UTC
Problem:
 Japanese characters can not be displayed correctly in importing a DXF file.

Reproduction:
 1. Prepare OOo1.0.1 Japanese.
 2. Open a bugdoc OO-Draw.dxf through DXF import filter.
 3. The converted document would look like OO-Draw-DXF.png
    An original image, however, is OO-Draw.png

This problem is originally reported by Mr. Miyazaki, a member of OpenOffice.org
Users Group Japan http://blow-away.net/openoffice/

The bugdoc and snapshots will be attached soon.
Comment 1 tora3 2002-11-21 02:41:20 UTC
Created attachment 3690 [details]
A bugdoc prepared with JW_CAD http://www.jwcad.net/ ; MIME type is image/vnd.dxf
Comment 2 tora3 2002-11-21 02:44:08 UTC
Created attachment 3691 [details]
An original image on JW_CAD
Comment 3 tora3 2002-11-21 02:44:51 UTC
Created attachment 3692 [details]
Converted image on OpenOffice.org
Comment 4 tora3 2002-11-21 02:46:49 UTC
Tora->sba: The files have been attached. The DXF file is encoded in
Shift_JIS.

Comment 5 tora3 2002-11-25 07:36:37 UTC
Regarding Header Section of DXF file format
http://www.autodesk.com/techpubs/autocad/acad2000/dxf/header_section_group_codes_dxf_02.htm

$DWGCODEPAGE can specify type of encoding in which the DXF file is
encoded.

In this bug case, the bugdoc does not contain such definition. 

Adding either of the followings to HEADER section of the bugdoc file
does not help however.

---
$DWGCODEPAGE
  3
ANSI_932
  9
----
or
----
$DWGCODEPAGE
  3
DOC932
  9
----

Comment 6 stefan.baltzer 2003-07-28 16:45:54 UTC
SBA: DXF is a draw file format. Changed Component accordingly.
Reassigned to Wolfram.
Comment 7 stefan.baltzer 2003-07-28 16:47:10 UTC
SBA: Prio changed to 3 (no crash).
Comment 8 wolframgarten 2003-08-07 09:37:25 UTC
Set to new.
Comment 9 wolframgarten 2003-08-07 09:38:22 UTC
Reproduceable in 1.0.1 and in a current internal version. Reassigned
to Sven. Please have a look.
Comment 10 sven.jacobi 2003-08-25 14:36:39 UTC
accepted
Comment 11 sven.jacobi 2003-09-08 11:21:55 UTC
changed target
Comment 12 sven.jacobi 2003-09-16 14:59:16 UTC
.
Comment 13 tora3 2003-09-18 22:53:23 UTC
Tora: How can we, members of ja.ooo, help you?
Comment 14 maho.nakata 2003-09-19 02:04:13 UTC
Is it possible to provide some diffs
or something to confirm by ourselves?
Comment 15 sven.jacobi 2003-09-22 18:13:32 UTC
Hi,
the bugfix itself should not be so difficult, but I have currently not
enough time for this now. It would be cool, if anybody can provide a
patch, then this will made it into the next version.

The dxf source code is located at: 
graphics/goodies/source/filter.vcl/idxf/*

Every diff is appreciated,
Sven
Comment 16 tora3 2003-09-22 19:30:04 UTC
That is a good idea!
Comment 17 tora3 2003-09-22 21:48:06 UTC
Could you show us a direction?
Should we replace "char" with an Unicode related type 
such as "OUString" in dxfentrd.hxx or just apply
rtl_convertTextToUnicode() in a method EvaluateGroup(DXFGroupReader &
rDGR) ?
---
class DXFTextEntity : public DXFBasicEntity {
public:
	char sText[DXF_MAX_STRING_LEN+1];  //  1
---
Comment 18 tora3 2003-09-29 07:41:48 UTC
Replacing a type of encoding, as below, has given us 
a quick solution for a Japanese DXF file encoded in SHIFT_JIS.

In goodies/source/filter.vcl/idxf/dxf2mtf.cxx ,
replace 
  String aUString( aStr, RTL_TEXTENCODING_IBM_437 ); 
with
  String aUString( aStr, RTL_TEXTENCODING_SHIFT_JIS ); 

For general implementation, we should recognize a keyword
"$DWGCODEPAGE" in the DXF file format and choose an
appropriate encoding.

There would be another issue. What actual font will be 
used for font family "FAMILY_SWISS" in an individual 
language version, which is specified in the source code?

StarSuite 7 Solaris (ja) plus the quick patch can solve this,
whereas OOo 1.1 RC5 Solaris (en) cannot.

Comment 19 sven.jacobi 2003-10-08 10:45:28 UTC
Sorry for the long delay, I was too busy.

Looking for "$DWGCODEPAGE" would be the solution, but there is another
problem, we need to have a codepage conversion table from DXF codepage
to our RTL_TEXTENCODING (rtl/textenc.h).

Currenlty I know that there might exist a "ANSI_932" which is most
likely to be mapped to RTL_TEXTENCODING_MS_932, but I need the
complete set of textencodings which are used within DXF, so we will
fix this bug also for the other languages. I was not able to find the
complete DXF textencoding table yet.

I see, each font in our DXF filter is getting the FAMILY_SWISS and the
FontName is not used, I think the default font depends to the current
platform that is used. I don't know if DXF supports fontname and
fontfamily, then perhaps our filter needs to be improved.
Comment 20 tora3 2003-10-08 17:30:31 UTC
Thank you for your response.

[dev] How to obtain locale information?
http://www.openoffice.org/servlets/BrowseList?listName=dev&by=thread&from=44513

For FAMILY_SWISS, there might be nothing to do since Japanese comunity
has confirmed this quick patch works with both OOo1.1 (ja) Windows and
Linux. 
Comment 21 marc.neumann 2003-10-16 08:48:41 UTC
"According to the OpenOffice.org roadmap
(http://tools.openoffice.org/releases) this issue was retargeted to
OOo Later."
Comment 22 maho.nakata 2003-10-25 02:40:54 UTC
Created attachment 10625 [details]
a patch to solve this problem (partially)
Comment 23 maho.nakata 2003-10-25 02:41:42 UTC
A patch for problem is supplied.
--maho
Comment 24 maho.nakata 2003-10-25 03:10:11 UTC
This patch solves this issue for Windows.
Comment 25 khendricks 2003-10-26 15:39:31 UTC
Hi, 
 
I have attached a very very untested and tentative patch to handle this by parsing 
$DWGCODEPAGE and just special casing the instance of ANSI_932 or DOC932 
 
Will someone who has access to many dxf files that can manually add  
 
$DWGCODEPAGE 
 3 
ANSI_932 
 9 
 
to the header section of the file, please test the attached patch to make sure it works 
(all I have tested is that it will build). 
 
This is probably close to working (I hope). 
 
Kevin 
 
 
 
Comment 26 khendricks 2003-10-26 15:40:51 UTC
Created attachment 10659 [details]
untested patch that attempts to parse $DWGCODEPAGE
Comment 27 khendricks 2003-10-26 16:07:53 UTC
Hi, 
 
Please ignore my last patch, it had a bug in it. 
 
I have now fixed that I think. 
 
Please test out dxf_fix_rev2.patch 
 
(note: it has some debug output to stderr that should be removed after verifying it 
works). 
 
Kevin 
 
Comment 28 khendricks 2003-10-26 16:08:41 UTC
Created attachment 10661 [details]
a revised patch that seems to work on my machine
Comment 29 khendricks 2003-10-26 16:09:18 UTC
Adding myself to CC on this 
 
 
Comment 30 tora3 2003-10-26 19:47:57 UTC
Created attachment 10663 [details]
A perl script to insert $DWGCODEPAGE in DXF file.
Comment 31 tora3 2003-10-26 19:51:00 UTC
Created attachment 10664 [details]
A DXF sample file that includes $DWGCODEPAGE = DOS932, prepared by the tool above.
Comment 32 tora3 2003-10-26 20:00:40 UTC
Hi Kevin,

Thank you for the patch. Could you modify it a little?
 1. Replace with "DOC932" with "DOS932"
 2. Substitute "_SHIFT_JIS" with "_MS_932"
 like:
  if (strcmp(rDGR.GetS(),"ANSI_932")==0 ||
      strcmp(rDGR.GetS(),"DOS932")==0)
    {
      setTextEncoding(RTL_TEXTENCODING_MS_932);
Tora

Comment 33 tora3 2003-10-26 21:32:43 UTC
I was mistaken in the earier part of this issue.
The correct one is:
---
  9
$DWGCODEPAGE
  3
ANSI_932
----
or
----
  9
$DWGCODEPAGE
  3
DOS932
----
Further information:
http://www.autodesk.com/techpubs/autocad/acad2000/dxf/header_group_codes_in_dxf_files_dxf_aa.htm

In addition, ANSI_932 is used for AutoCAD R14 or later;
DOS932 is for R13 or earier.
Comment 34 khendricks 2003-10-27 14:28:45 UTC
Hi Tora,

Thanks for testing it.

Please create an improved patch with your indicated changes and attach it to this 
issue.  

Then after it has been tested and verified by others we can try to get this retargeted 
to OOo 1.1.1 fix1.

Kevin
Comment 35 tora3 2003-10-27 19:53:12 UTC
Created attachment 10696 [details]
Patch updated, verified with SS6.1 Beta2 Solaris (ja)
Comment 36 tora3 2003-10-28 00:11:14 UTC
The patch idxf_fix_rev3.patch.txt will be working for a Japanese DXF
file that contains a variable $DWGCODEPAGE.  Otherwise, a tool that
inserts the variable should be applied to the DXF file before
importing it. 
Comment 37 Martin Hollmichel 2003-10-29 08:13:51 UTC
reset target milestone, reassign
Comment 38 khendricks 2003-10-30 00:31:30 UTC
Hi, 
 
Cleaned up patch (white space and debug print statements removed) committed to 
cws_src645_ooo111fix1 
 
So resolving this as fixed. 
 
Please verify and close one 1.1.1 becomes available. 
 
Thanks, 
 
Kevin 
 
Comment 39 pavel 2003-10-31 07:35:28 UTC
Please revert or fix immediatelly. On Windows:

guw.pl /cygdrive/c/Progra~1/Micros~3/VC98/Bin/cl.exe @/tmp/mkb01152
Command: /cygdrive/c/Progra~1/Micros~3/VC98/Bin/cl.exe
dxf2mtf.cxx
c:\OOo\pavel\BuildDir\ooo_1.1.0_src\goodies\source\filter.vcl\idxf\dxf2mtf.cxx(4
36) : error C2662: 'getTextEncoding' : cannot convert 'this' pointer
from 'const
 class DXFRepresentation' to 'class DXFRepresentation &'
        Conversion loses qualifiers
c:\OOo\pavel\BuildDir\ooo_1.1.0_src\goodies\source\filter.vcl\idxf\dxf2mtf.cxx(4
95) : error C2662: 'getTextEncoding' : cannot convert 'this' pointer
from 'const
 class DXFRepresentation' to 'class DXFRepresentation &'
        Conversion loses qualifiers
dmake:  Error code 2, while making '../../../wntmsci9.pro/slo/dxf2mtf.obj'

Comment 40 khendricks 2003-10-31 12:34:42 UTC
Hi Pavel, 
 
That is my change so the fault is mine.  The problem is that I do not understand why 
this is causing a problem under Windows at all. 
 
The class DXF2GDIMetaFile has a public member declared as follows: 
 
const DXFRepresentation * pDXF; 
 
 
We use that to access the DXFRrepresentation objects's getTextEncoding method 
as follows: 
 
       { 
                String aUString( aStr, pDXF->getTextEncoding() ); 
  
 
So we are using a pointer to access the object method.  This seems to work fine 
under Linux. 
 
Why does the Windows complier try to convert that pointer at all? 
 
error C2662: 'getTextEncoding' : cannot convert 'this' pointer 
from 'const 
 class DXFRepresentation' to 'class DXFRepresentation &' 
        Conversion loses qualifiers 
 
What am I missing here?  Does the "const" somehow prevent the object pointer from 
being used the way it can normally be used? 
 
To fix this would be a complete guess, since I do not understand why the error is 
occurring. 
 
Would you please try removing the const from the definition of that method to see if 
that matters? 
 
Some hint at what the compiler is trying to do here and why it can't do it would be 
nice? 
 
Thanks, 
 
Kevin 
 
 
 
 
Comment 41 tora3 2003-10-31 13:02:36 UTC
Created attachment 10837 [details]
Fixed by following neighbor classes. Verified on Solaris.
Comment 42 khendricks 2003-10-31 13:07:41 UTC
Hi Pavel, 
 
Using -Wall under Linux I got a message about dropping const when passed as 
"this" which may be related to what you are seeing.  I changed the dxfreprd.hxx 
header file to declare getTextEncoding outside of the class declaration and added 
const to it (since all getTextEncoding does is read and never write. 
 
This removed the dropping "const" message from -Wall under Linux. 
 
I hope it also fixes/prevents the funny conversion that Windows is trying to do here 
as well. 
 
I have committed that change to fix1. 
 
Please give it a try and let me know if it does the trick. 
 
Kevin 
 
Comment 43 khendricks 2003-10-31 13:12:07 UTC
Hi, 
 
Are the neighbor classes actually needed to make this compile under Windows?  
They shoudl not be as long as we add const to the getTextEncoding mthod and 
move its defintion outside of the class itself. 
 
Will you please check out and try what I just committed and if it does not build under 
Windows, we will go with your neighbor class solution. 
 
Thanks, 
 
Kevin 
 
Comment 44 tora3 2003-10-31 14:01:19 UTC
I would like to recommend adopting this new patch
idxf_fix_rev4.patch.txt since it follows the existing programing style
that has had no problem for a long time.  Not to mention, it does not
include unnecessary lines and have correct indents.

I have verified this new patch with SS6.1 Beta2 on Solaris. 
Comment 45 khendricks 2003-10-31 15:40:36 UTC
Hi,

Thanks for your new patch.  The probnlem has already been fixed in OOo 1.1.1 fix1
and that fix should appear in anonymous cvs soon.

The orignal patch (rev3) was modified before committal to remove all white space 
changes and all of the debug print statements and in general to clean it up.

As it turns out, the error under Win was just related to the missing "const" in the 
definition of getTextEncoding and neighbor classes should not be needed here 
(especially since rtl_TextEncoding is in reality just a "short int"!).

Please give the version now in cws_srx645_ooo111fix1 a look see.  If you are still 
unhappy with that change, please submit a new patch to the owner of that issue to 
modify it to make it better.

Thanks,

I am remarking this as resolved -fixed.  Please verify when OOo 1.1.1 is released.

Kevin
Comment 46 pavel 2003-10-31 17:03:51 UTC
Thanks Kevin, build on Windows fixed.
Comment 47 tora3 2003-10-31 19:50:17 UTC
Thanks Kebin, you are on the right track.

Tora
Comment 48 Martin Hollmichel 2004-02-27 13:37:35 UTC
close issue