Apache OpenOffice (AOO) Bugzilla – Issue 111579
Opening large html excel document from SAS
Last modified: 2023-01-04 22:15:50 UTC
I've got a big excel sheet coming from SAS that is an HTML with the xmlns:x="urn:schemas-microsoft-com:office:excel" schema. It contains around 26900 rows, but at around row 5600 it break and the rest of the rows are merged into a single cell. Is there a bug around?
Created attachment 69445 [details] the file that generate the problem
My MS EXCEL-Viewer will not open "dettaglio_rendiconti_dip_fittizio.xls" at all (without explaining what the problem might be), so I believe "opens with errors" is not such a bad result. My "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" hangs when I try to open sample document from explorer (or, may be, I only was to impatient). My Seamonkey 2.0.4 browser opens a renamed "dettaglio_rendiconti_dip_fittizio.html" without problems. "Ooo-Dev 3.2.1 multilingual version English UI WIN XP: [OOO320m16 (Build 9497)]" will open as a (html?) text document with file type "*", the document looks like renamed .html document in OOo and shows the reported error. Related to Issue 89332? @gioppo: Pls. explain SAS - Scandinavin Airlines? Société par Actions Simplifiée? ;-) In what OOo component did you see the document - really CALC or WRITER?
SAS business analitics and business intelligence tool. The file opens correctly with excel 2003. Is produced by a SAS reporting web application (is an "excel export"). This is a problem since big vendors usually do not ship with OOo export option, but only excel. The program used is just Calc since it was an excel export. Tryed renaming it as html and opening with writer, it takes A LOT of time to open (take a couple of coffee and have a chat) it opens in writer/web but is SSSLLOOOWWWW painfully slow adn present the same error at some point the table breaks and all ends up in the same paragraph. I believe the relation to issue 89332 could be correct. It seems that on large documents the xml/html parser gets mad. I understantd that the formatting is the M$ mess, but is just a big table, something a browser digest with no problem and also any html editor. Any java out of memory problems around possible?
I forgot to mention: " ... open sample document from explorer BY DOUBLE CLICK. I tried EXCEL VIEWER 8.0 and 12.0 Result with "Ooo 3.1.1 WIN XP DE[OOO310m19 (Build 9420)]" and right click "open with SCALC" from WIN EXPLORER": OOo hanged (I stopped the attempt after 15 minutes).
Some new info. Also firefox takes a bit of time in opening the file. Also if I open the file with excel (no problem and really quick) and save as excel format and not html the file is opened by Calc without any trouble. There is something in the parsing of the file (HTML/XML)
Have to wait more than 15 minutes, just take a lunch ;-) and have faith
Also if you leave the extension xsl calc propose an import much like a csv stuff ... no good. If file renamed xslx it opens it without no problem. Version OOo used 3.2m13 from go-oo, will also check on SUN (ops Oracle) build.
The file is not in xsl format. Excel detects this and selects an import filter for you. Because OOo is not only a spreadsheet application like Excel, but a suite with other modules, it cannot know, which module you want and takes that module, which best fit to the content. Your file starts with <html> and so Writer/Web will be used. To open such files in Calc, you have to select the suitable filter yourself. Start OOo and open the file from inside OOo. First select the file and then select the file type. You need the type "Web Page Query (OpenOffice.org Calc) (*.htm;*.html)". The file will open in normal speed, at least it does in my DEV300m77. The upcoming dialog box helps, that numbers and dates are imported correctly. This is needed because for example in German 1.000 means one thousand and in US English only one; or in US English 2/3/2010 means 3.February and in GB English it means 2.March. Issue 89332 is about the feature, that OOo will open files, that are html in content but have xls as extension, in Calc automatically. But the solution is not integrated yet. *** This issue has been marked as a duplicate of 89332 ***
Mind that the problem IS NOT in the opening of the file, but in the visualization. It breaks at row 6000 or so. This is the real problem, making Calc open it is just a detail.
OK, I see. It opens in normal speed using the Web Page Quest filter, but reads only about 6712 lines of original 26902 lines.
The file opens with all lines in Gnumeric too. The error remains for me on WinXP, if I exchange the Unix line ends with DOS line ends.
This is the infamous 16 bit paragraph limit from bug 57176, that only allows importing the first 65534 cells. Resolving duplicate. Thank you for your bug report and sample file! A patch to fix this is available, and a release with it will hopefully be out soon. *** This issue has been marked as a duplicate of issue 57176 ***