Apache OpenOffice (AOO) Bugzilla – Issue 118379
FILESAVE saving old Word Files corrupts them
Last modified: 2016-11-03 19:48:59 UTC
Created attachment 76738 [details] original file Open an old Word file (123k big), change a line and save it again (keeping the file format) damages the file. It now is only 57k big and OpenOffice can't even open it. Error: not a Winword97 file. I've attached both the original file and the corrupted one.
Created attachment 76739 [details] corrupted file
getting rid of value "enhancement" for field "severity". For enhancement the field "issue type" shall be used.
I am using Open Office v 3.4.1 build 9593 on Mac OS X 10.8.4 On using the test file and following the steps to reproduce the bug, I also encountered the same problem with the exact file size numbers specified. I also tried creating a .docx file with MS Office 2011 for Mac and replicated the same steps in OpenOffice.org to get the same result of the file getting corrupt and then the file isn't opening in OpenOffice.org
Reset assignee on issues not touched by assignee in more than 2000 days.
I was able to replicate the bug on Version 4.0 with Mac osX version 10.8.5; I tested the attachment from tmarx@digitaprojects.com, and also created a .docx file myself. 1) I opened the attachment in word to make sure that it was fine. 2) After making sure the file was okay, I opened the same file in OpenOffice (Writer) and modified a random word and hit the save icon. 3) I clicked the prompt "keep the file format" and overwrote the file. 4) Opened the same file in Mac Office 2011, and it was corrupted. I followed similar steps for the .docx file I created; however, there was no prompt asking if I wanted to keep the file format, so I saved it in .odt and tried to open it in Mac Office 2011 again. Word didn't even open it.
I'm reviewing this report with Aral Tasher. He advises me of two other relevant items: (1) He replicated this on both 4.0 (build 9702) and version 3.4.1 (both Mac). (2) When he attempted to use OOo to reopen the odt file he had saved, OOo could not open it either. The file was unreadable to both OOo and MS-Word.
I was also able to replicate this bug on Version 4.0.1 but on Windows 7 Ultimate 64-bit. Everything else is in line with Tasher except I tried opening the modified file with Microsoft Word 2013 and also received a corrupted file.
I was able to replicate this bug using AOO Version 4.0.1 Build 9714 on Windows 7 OS SP1 so this appears to not be isolated to a single version or OS. To replicate this bug: I created two separate copies of the test file (Microsoft Word 97 – 2003, 117k) from tmarx@digitalprojects.com, to compare AOO behavior vs that of Word (v2010). I followed the steps to replicate the bug on both copies with the respective applications…note that Word would only open the file in ‘Protected View’ and would not allow any editing to the file. When closing the file in AOO, a dialog was displayed indicating that the document may contain formatting or content that can’t be saved in Word 6.0, and gave the option of keeping current format, or saving as ODF (native AOO)...I kept current format to replicate. When attempting to open the file, a message indicating a Read-Error, and that the file is not a WinWord97 file. The size of the file has decreased down to a size of 53k. This also appears to be related to previously reported bugs, 124811 (Crash when saving as .doc) and 125017 (Crash, corrupted Word file). It’s relationship to 125017 is in the fact that a similar process produces both bug instances…SaveCloseRe-OpenCrash Indicating a Read Error…not a WinWord97 file. It’s relationship to 124811 also appears to involve similar steps to reproduce. I was not able to replicate this bug with a newly created document in Word, saved as Word 97 – 2003, and re-opened in AOO and resaved several times to increase file size…but there were no crashes.
I was able to replicate these results on my 64 bit Windows 8 machine. I tested both the file from tmarx@digitaprojects.com and my own docx file I created. My steps for testing the downloaded file were as follows: 1) Downloaded the document and opened it in OpenOffice Writer. 2) I then edited a single line by either adding or deleting content (I tested both) 3) I then saved the file and chose to keep the same format. 4) When I tried to open the file in OpenOffice I received an error saying that it is not a WinWord97 file. I was however, unable to replicate this bug with my own created .docx file by following the same steps. I am using OpenOffice 4.1.1.
Mac OS X 10.10.3, Open Office version 4.1.1. I have two corrupted files created from Word .docx's. One has duplicated attributes in the content .XML. The other is simply unreadable. This is a devastating bug.
This bug still exists in 4.2.0. I attempted to replicate the bug on 9/22/2015. System environment: Apache OpenOffice Writer 4.2.0 - AOO420m1, Build: 9800, Rev.1692551 OS: Windows 7 Home Premium - SP 1 (64bit) CPU: AMD FX-8350 Eight-Core Processor GPU: NVIDIA GeForce GTX 970 Steps taken to replicate the issue: 1. Open file “Dienstreiseantrag_Cebit.doc” provided by original report. 2. Highlight “5. Voraussichtliche / geschätzte Reisekosten” with cursor. 3. Delete line, type “Lorem ipsum dolor sit amet” on the same field. 4. Click on the save button. 5. A prompt will appear asking whether to keep the file’s current format or save in ODF format. Leave “Ask when not saving in ODF format” checkbox checked and select “Keep Current Format”. 6. Close file. 7. Re-open file. 8. An error message will appear with the text: “Read-Error. This is not a WinWord97 file.” Clicking OK or the red X to close the dialogue will bring you to a blank “Untitled 1” unsaved OpenOffice Writer document. I observed the same file size loss as indicated by the original report. Original is roughly 117-120 KB. After saving, the file is only 52 KB in size. Microsoft Office 365 is actually able to read the corrupted file, but formatting is broken and certain characters don’t appear. I noticed that the file provided by the OP contained German text. I attempted to replicate this bug with my own created 100 KB .doc file but did not experience similar results. My own 32 KB .docx file also didn’t experience the same issue. I tried with both English and German text. As a last test, I selected all the text from the originally provided file and copied it into a new .doc file. Following the same steps listed above, the file was not corrupted. I speculate that actual text in these corrupted files is not the issue. Despite having to change a certain line or a certain string of characters for Writer to save the file, the actual issue likely lies within how Writer handles.doc/docx files. This could definitely be a severe issue for people using Writer on certain older documents.
Changed last confirmed on to 4.2.2-dev per comment 11 and hardware and os to all per multiple comments.
(In reply to tmarx from comment #0) > Created attachment 76738 [details] > It now is only 57k big and OpenOffice can't even open it. Error: not a > Winword97 file. Unfortunately, the "Not a Winword97 file" happens if an exception is thrown by the input filtering of a supposed Word 97 file. All that is know is that there was a problem reading/the input. The inference reflected in the wording of the error message is misleading.
I was able to reproduce the bug which consists of saving an old word file (.doc) that has been modified in Writer and then saved. I currently have OpenOffice: 4.1.2 OS: Windows 10 CPU:i7-5500U RAM:8GB STORAGE: SAMSUNG 850 EVO SSD Reproduction steps: 1. Download original file provided by tmarx@digitalproject.com (Dienstreiseantrag_Cebit.doc) 2. Open the file in Writer 3. Change the contents of the file by changing a line 4. When saving, keep the file format 5. Try to open the file Result: User gets the "Not a Winword97 file" error. Opening and editing .doc files is important for users to be able to open their older docs in OpenOffice Writer.
I observed the corruption bug to be reproducible ONLY when using the provided file and maintaining the current format in Writer. Environment: Version Apache OpenOffice Writer 4.1.2 OS Windows 10 Pro (64-bit) CPU Intel i5 3570k @ 4.2 GHz GPU NVIDIA GeForce GTX 660 Ti RAM 16 GB Reproduction: 1. Download “original file” attachment 2. Open “Dienstreiseantrag_Cebit.doc” with Writer 3. Edit the document by adding/removing characters 4. Save the document 5. A dialog should appear, select “Keep Current Format” 6. Close the document 7. Open “Dienstreiseantrag_Cebit.doc” with Writer Observation: A dialog displaying “Read-Error. This is not a WinWord97 file.” appeared. The file size was reduced from 117 KB to 52.5 KB and file corruption is visible when opening with other text editors such as Word. Further Exploration: Attempt 1: I created a Microsoft Word 97 – 2003 Document (the same file format as the file in question) to replicate the issue using the phrase: “Die Reisekosten aus Haushaltsmitteln werden ausschließlich nach den Bestimmungen des Landesreisekostengesetzes NRW (LRKG) bzw. den hausinternen Vereinbarungen der BUW erstattet. Der Erstattungsanspruch ist verjährt, wenn die Reisekostenabrechnung nicht spätestens innerhalb von 6 Monaten nach Reiserückkehr Dezernat 4.1 vorgelegt wurde. Die Reisekosten aus Projektmitteln werden nach den jeweiligen Bestimmungen des Drittmittelgebers erstatte” directly copy-pasted from the provided document. I then performed the same process of modifying text, saving with “Keep Current Format”, and re-opening the file. This attempt at reproduction did not produce the error observed with the user’s document. Attempt 2: Next, I took the entire contents of “Dienstreiseantrag_Cebit.doc” and copy-pasted the clipboard to a new Microsoft Word 97 – 2003 Document. Repeating the same procedure outlined in the reproduction steps did not yield any errors. The nature of this bug could corrupt the contents of older documents and must be examined.
I am using Windows 10 and open office 4.1.2 I tried many diferent scenarios: Attempt 1: use the original file, open it, edit, and close it. I received the same error as everyone else. Attempt 2: Create my own very large test file with a lot of formatting and large pictures in Microsoft word 2013 and saved it as a "Word 97-2003 document", Then I did the open, edit, and close. No errors. I then did a few more variations of that. I also opened the original file in word, and noticed it was in a protected, read only format. This is when I realized that windows was listing the original file as a "Word 97-2003 document" however, when opened in Open Office, open office said it was a "word 6.0" document. Attempt 3: I downloaded a "word 6.0" file from online tested it with the open, edit, save, open thing again. No errors. Attempt 4: I copy and pasted a few lines of the unusual formatted test from the original file to my new "word 6.0" file. I did save As, edit, save, close, open.... and I got the original Error. Attempt 5: I made a new file and copied just a few lines of the original and saved it with the "word 6.0" file type. Then I did the exit and re-open test. At only 12KB this file still had the same error.
Confirmed with following configuration: System Configuration: Processor: Intel Core i5 CPU M560 @2.67GHz Installed Memory: 2.00 GB (1.6 usable) Operating System: Windows 7 Home Premium 64 bit Apache Open Office: AOO413m1(Build:9783) - Rev. 1761381 2016-09-29 02:39:19AOO413m3(Build:9782) - Rev. 1709696 Language: en_US Additional Language Packs: None