Apache OpenOffice (AOO) Bugzilla – Issue 93503
Calc export of non-gregorian date to Excel loses format
Last modified: 2017-05-20 11:11:12 UTC
Steps to reproduce: 1. Start OpenOffice.org Calc 2. Tool -> Option -> Language Setting -> Languages and set Locale setting to Thai 3. Type a date (ex. 15/05/2008) in a cell 4. Click the cell and choose Format -> Cells -> Numbers, then changes the following 3.1) Change Category to Date 3.2) Change Language to Thai 3.3) Change Fomat to 31 ธันวาคม 2542 5. Click OK 6. Save as Microsoft Excel 97/2000/XP (.xls) 7. Exit OpenOffice.org Calc 8. Open the xls file in Microsoft Excel Excel will show error dialog box "Some number formats may have been lost." 9. Observe the Date will be formatted as an integer number
Created attachment 56252 [details] Before save as Microsoft Excel 97/2000/XP (.xls)
Created attachment 56253 [details] After open in Excel
Created attachment 56254 [details] the bug document
Problem still exists in DEV300_m52 (OOo 3.2 code line). Confirming issue. Adjusting summary. Old summary "Calc export of Buddhist date to Excel is broken" could be misinterpreted as "this is a regression"...
Created attachment 70648 [details] fix date format export to excel
This the patch to fix this issue. diff -rup OOO320_m19_ori/sc/source/filter/excel/xestyle.cxx OOO320/sc/source/filter/excel/xestyle.cxx --- OOO320_m19_ori/sc/source/filter/excel/xestyle.cxx 2010-05-26 23:30:44.000000000 +0700 +++ OOO320/sc/source/filter/excel/xestyle.cxx 2010-07-16 17:50:08.188256700 +0700 @@ -1327,8 +1327,15 @@ String XclExpNumFmtBuffer::GetFormatCode } aFormatStr = pEntry->GetMappedFormatstring( *mpKeywordTable, *mxFormatter->GetLocaleData() ); + xub_StrLen pos; if( aFormatStr.EqualsAscii( "Standard" ) ) aFormatStr.AssignAscii( "General" ); + else if( ( pos = aFormatStr.SearchAscii("[~buddhist]", 0) ) != STRING_NOTFOUND ) + aFormatStr.ReplaceAscii( pos, 11, "[$-107041E]", 11 ); + else if( ( pos = aFormatStr.SearchAscii("[~hijri]", 0) ) != STRING_NOTFOUND ) + aFormatStr.ReplaceAscii( pos, 8, "[$-1060401]", 11); + else if( ( pos = aFormatStr.SearchAscii("[~persian]", 0) ) != STRING_NOTFOUND ) + aFormatStr.ReplaceAscii( pos, 10, "[$-3060429]", 11 ); } } else
reassigning for Daniel to evaluate the patch
The patch has a little drawback, it does not respect quoted string inside the number format, e.g. the format "[~buddhist]" 0 gives numbers like [~buddhist] 1234 instead of 1234 in the cell. But I think we can live with that, as correct handling would require a complete format code parser...
This should be done at the number formatter instead, as also Writer/Word use those format codes. SvNumberformat::GetMappedFormatstring() in svl/source/numbers/zformat.cxx would be the right place, and act upon NF_SYMBOLTYPE_CALENDAR in the switch(pType[j]). Simply replacing the calendar string with a fixed "[$-...]" notation may not always work as desired, as the value in that notation depends on the actual locale selected, not sure what happens in Excel if those don't match. Better would be to use the locale set at the number format and construct the proper string. For MS-LCID documentation in this context see for example http://office.microsoft.com/en-us/excel-help/creating-international-number-formats-HA001034635.aspx#BMcalendartype However, we may live with fixed values for a first attempt if that sounds too complicated.
Thank er, we will find a better solution by looking at SvNumberformat::GetMappedFormatstring() next.
Accept the issue. Trying to find the best solution for this problem.
Created attachment 70926 [details] fix buddhist calendar export in excel format
patch to fix export buddhist calendar diff -rup OOO320m19-ori/svtools/source/numbers/zformat.cxx OOO320m19/svtools/source/numbers/zformat.cxx --- OOO320m19-ori/svtools/source/numbers/zformat.cxx 2010-05-26 23:30:44.000000000 +0700 +++ OOO320m19/svtools/source/numbers/zformat.cxx 2010-08-02 17:35:47.000000000 +0700 @@ -4128,14 +4128,6 @@ String SvNumberformat::GetMappedFormatst } const SvNumberNatNum& rNum = NumFor[n].GetNatNum(); - // The Thai T NatNum modifier during Xcl export. - if (rNum.IsSet() && rNum.GetNatNum() == 1 && - rKeywords[NF_KEY_THAI_T].EqualsAscii( "T") && - MsLangId::getRealLanguage( rNum.GetLang()) == - LANGUAGE_THAI) - { - aPrefix += 't'; // must be lowercase, otherwise taken as literal - } USHORT nAnz = NumFor[n].GetnAnz(); if ( nSem && (nAnz || aPrefix.Len()) ) @@ -4195,6 +4187,34 @@ String SvNumberformat::GetMappedFormatst aStr += '"'; } break; + case NF_SYMBOLTYPE_CALDEL : + if ( pStr[j].EqualsAscii("[~") ) + { + if ( pStr[j+1].EqualsAscii("buddhist") ) + { + aStr.InsertAscii( "[$-", aStr.Len() ); + if ( rNum.IsSet() && rNum.GetNatNum() == 1 && + MsLangId::getRealLanguage( rNum.GetLang() ) == + LANGUAGE_THAI ) + { + aStr.InsertAscii( "0D07041E", aStr.Len() ); + } + else + { + aStr.InsertAscii( "0107041E", aStr.Len() ); + } + } + else // ignore other calendar + { + j++; // step to skip "]" + } + j++; + } + else + { + aStr += pStr[j]; + } + break; default: aStr += pStr[j]; }
I fixed this issue, Excel export with buddhist calendar, with this patch. This patch check "[~buddhist]" and [NatNum1] and convert it into appropriate MS-LCID eg. "[$-0107041E]". It works successfully when open the xls file in Excel. Please take a look at the patch to see whether we're on the right track. However, when I open the exported xls file with OpenOffice.org, the number format will be converted to "[$-41E]" by the import filter. However, it will not convert any native number or calendar options. I will fix this import filter issue in another issue report. BTW, can you guide me where the code is.
@tantai_thanakanok: Thanks, some nitpicks though: 0. Please always attach diffs as files of type text/patch, pasting diffs to the comment field may introduce unintended line breaks and other quirks. 1. The patch removes the Thai T NatNum modifier exported as t to Excel, that is unacceptable. This modifier is a specialty of the Thai Excel version we import and export. Any specific reason you removed that? 2. When encountering NF_SYMBOLTYPE_CALDEL there is no need to additionally check for "[~", NF_SYMBOLTYPE_CALDEL is only set for that context. For import the number format scanner/parser is used, see svl/source/numbers/zforscan.cxx with the strategic entry point ImpSvNumberformatScan::ScanFormat() Be warned, that is hard stuff ;-) Btw, I suggest you use a recent DEV300 milestone for development, things are quite changing between releases and the number formatter isn't even in module svtools anymore but in svl instead.
Created attachment 71799 [details] fix excel import and export
In the export filter, I remove the code that "check Thai T NatNum modifier and export as "t" to Excel" because in Excel, the "t" format will format Thai number using native digits correctly in Thai locale only. So I check NatNum specifier and export it as LCID to format Thai number shape correctly in all locale. In the import filter, I add the check for LCID and convert it into an appropriate calendarID and NatNum before checking for symboltype. I try to add support for other calendarID in the export and import filter. However, I have the following problems in the export filter :- - for [~hijri], in GetMappedFormatString(), I always get LANGUAGE_ENGLISH_US in the cell while actually it should be LANGUAGE_ARABIC_XXXX. So I cannot add the right numeral shape code into the LCID. - for [~persian], in GetMappedFormatString(), the celendarID [~-persion] disappear from the format code. I don't know when and why.
DR->ER: patch changes number formatter, please have a look
Created attachment 72126 [details] fixed the case that excel export date with locale id [$-xxx0000]
We have built Windows binary with the above patch and tested both internally and by a few volunteers. The patched binary work correctly. ER: can you please review the code?
@tantai_thanakanok: 1. What exactly does this change in SvNumberformat::ImpGetLanguageType() attempt to fix? - return (nNum && (cToken == ']' || nPos == nLen)) ? (LanguageType)nNum : + return ((cToken == ']' || nPos == nLen)) ? (LanguageType)nNum : 2. I don't like the change made to SvNumberformat::ImpNextSymbol() + xub_StrLen pos; + if ( ( pos = rString.SearchAscii( "[$-", 0 ) ) != STRING_NOTFOUND ) + { + if ( rString.GetChar(pos+10) == ']' ) + { + if ( rString.GetChar(4) == '0' && rString.GetChar(5) == '7' ) + { + rString.InsertAscii( "[~buddhist]", pos+11 ); + } + if ( rString.GetChar(3) == 'D' ) + { + rString.InsertAscii( "[NatNum1]", pos+11 ); + } + } + } because it tries to search for the "[$-" substring upon each and every call of the method, completely ignoring the state machine used otherwise in the method. The case of "[$-" is already detected with case SsGetBracketed and cToken case '$' and rString.GetChar(nPos) == '-' where eSymbolType = BRACKET_SYMBOLTYPE_LOCALE is set. Furthermore, since it always searches from position 0, it detects the condition multiple times and could insert the new strings each time the subconditions are met, which currently is not the case because the LanguageType value is only 16 bit and not a full LCID, hence the result the calling method reinserts (the scanned value in proper case) is shorter than the scanned D07xxxx (might in fact be considered a bug) and the problem doesn't occur. Anyhow, using the number formatter dialog and setting the Language to Thai and trying to use the code [$-D07041E]0 and pressing the green check mark button does not accept the code. If I understood correctly that should result in [$-41E][~buddhist][NatNum1]0 3. I still don't agree to the removal or conversion of - // The Thai T NatNum modifier during Xcl export. - if (rNum.IsSet() && rNum.GetNatNum() == 1 && - rKeywords[NF_KEY_THAI_T].EqualsAscii( "T") && - MsLangId::getRealLanguage( rNum.GetLang()) == - LANGUAGE_THAI) - { - aPrefix += 't'; // must be lowercase, otherwise taken as literal - } This condition happens only if either the user did input such a format or, more likely, the document was imported from a file originating from a Thai Excel version. For the latter case in round trip scenarios the 't' has to be preserved for older Excel versions. This was a requirement of the Thai user community. A conversion of NatNum==1 in LANGUAGE_THAI _without_ NF_KEY_THAI_T being set to "[$-D00041E]" may be fine though.
thank you. I'll take a good look at your comments
Created attachment 75035 [details] contain date with locale id 0 ( [$-1070000] )
Step to create date with locale id 0: 1.) open Microsoft excel 2.) type date eg. 18/11/2010 3.) right click > format cells > Number > Date , choose 14/03/2544 4.) check in category custom, the LCID is [$-1070000] 5.) click ok 6.) save in xls format 7.) open the xls file in OpenOffice.org Calc, the format code have a locale id 0 ([$-0])
@er > 1. What exactly does this change in SvNumberformat::ImpGetLanguageType() > attempt to fix? > - return (nNum && (cToken == ']' || nPos == nLen)) ? (LanguageType)nNum : > + return ((cToken == ']' || nPos == nLen)) ? (LanguageType)nNum : The reason for this fix is to allow nNum == 0, i.e. LocaleId == 0 which mean "system locale". The above attached file/use-case shows that it is possible. We've made the fix while also make sure that there's no other side-effect, e.g. cases from invalid localeid still return the same.
Created attachment 75211 [details] updated version
> 2. I don't like the change made to SvNumberformat::ImpNextSymbol() Corrected in the updated version of the patch. > Anyhow, using the number formatter dialog and setting the Language to Thai and > trying to use the code > [$-D07041E]0 > and pressing the green check mark button does not accept the code. If > I understood correctly that should result in > [$-41E][~buddhist][NatNum1]0 I guess the example should actually be written as [$-D07041E]dd/mm/yy because the number format '0' doesn't make sense for date. And the result, after applying the patch, will be [$-41E][NatNum1][~buddhist]dd/mm/yy Is this OK? Since in the current implementation, you can't use the format [$-41E][~buddhist][NatNum1]dd/mm/yy at all. The number format dialog will accept the format code if you enter it manually but after you click the green checkmark but it will do nothing. The sequence "[$-41E][~buddhist][NatNum1]dd/mm/yy" happens to be illegal in the current implementation. Using the sequence "[$-41E][NatNum1][~buddhist]dd/mm/yy" will produce the desired effect. We can fix this but it require a lot more modifications. > 3. I still don't agree to the removal or conversion of >- // The Thai T NatNum modifier during Xcl export. >- if (rNum.IsSet() && rNum.GetNatNum() == 1 && >- rKeywords[NF_KEY_THAI_T].EqualsAscii( "T") && >- MsLangId::getRealLanguage( rNum.GetLang()) == >- LANGUAGE_THAI) >- { >- aPrefix += 't'; >- } The above code is modified as + // The Thai T NatNum modifier during Xcl export. + if (rNum.IsSet() && rNum.GetNatNum() == 1 && + rKeywords[NF_KEY_THAI_T].EqualsAscii( "T") && + MsLangId::getRealLanguage( rNum.GetLang()) == + LANGUAGE_THAI && !LCIDInserted ) + { + + aStr.InsertAscii( "[$-D00041E]", 0 ); + } Which means that the filter will accept the number format "t0" (processed elsewhere) from Excel and convert to "[$41E][NatNum1]0" in Calc. When convert back to excel it will become "[$-D00041E]0" (not "t0") which when import into Calc will become "[$41E][NatNum1]0" again. The difference is that when export to Excel, the code will no longer generate "t0" but the more correct (right?) "[$-D00041E]0".
@er could you please review the patch.
I'm adding this comment to all open issues with Issue Type == PATCH. We have 220 such issues, many of them quite old. I apologize for that. We need your help in prioritizing which patches should be integrated into our next release, Apache OpenOffice 4.0. If you have submitted a patch and think it is applicable for AOO 4.0, please respond with a comment to let us know. On the other hand, if the patch is no longer relevant, please let us know that as well. If you have any general questions or want to discuss this further, please send a note to our dev mailing list: dev@openoffice.apache.org Thanks! -Rob
Reset assigne to the default "issues@openoffice.apache.org".