Issue 118379 - FILESAVE saving old Word Files corrupts them
Summary: FILESAVE saving old Word Files corrupts them
Status: CONFIRMED
Alias: None
Product: Writer
Classification: Application
Component: save-export (show other issues)
Version: OOo 3.3
Hardware: All All
: P3 Normal (vote)
Target Milestone: ---
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2011-08-04 08:50 UTC by tmarx
Modified: 2016-11-03 19:48 UTC (History)
13 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: 4.1.3
Developer Difficulty: ---


Attachments
original file (117.00 KB, application/msword)
2011-08-04 08:50 UTC, tmarx
no flags Details
corrupted file (52.00 KB, application/msword)
2011-08-04 08:51 UTC, tmarx
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description tmarx 2011-08-04 08:50:44 UTC
Created attachment 76738 [details]
original file

Open an old Word file (123k big), change a line and save it again (keeping the
file format) damages the file.

It now is only 57k big and OpenOffice can't even open it. Error: not a Winword97 file.

I've attached both the original file and the corrupted one.
Comment 1 tmarx 2011-08-04 08:51:16 UTC
Created attachment 76739 [details]
corrupted file
Comment 2 Oliver-Rainer Wittmann 2012-06-13 12:15:46 UTC
getting rid of value "enhancement" for field "severity".
For enhancement the field "issue type" shall be used.
Comment 3 Dhanish Mehta 2013-06-30 04:31:24 UTC
I am using Open Office v 3.4.1 build 9593 on Mac OS X 10.8.4

On using the test file and following the steps to reproduce the bug, I also encountered the same problem with the exact file size numbers specified.

I also tried creating a .docx file with MS Office 2011 for Mac and replicated the same steps in OpenOffice.org to get the same result of the file getting corrupt and then the file isn't opening in OpenOffice.org
Comment 4 Rob Weir 2013-07-30 02:13:29 UTC
Reset assignee on issues not touched by assignee in more than 2000 days.
Comment 5 Aral Tasher 2013-09-24 21:22:13 UTC
I was able to replicate the bug on Version 4.0 with Mac osX version 10.8.5;

I tested the attachment from tmarx@digitaprojects.com, and also created a .docx file myself.

1) I opened the attachment in word to make sure that it was fine. 
2) After making sure the file was okay, I opened the same file in OpenOffice (Writer) and modified a random word and hit the save icon.
3) I clicked the prompt "keep the file format" and overwrote the file.
4) Opened the same file in Mac Office 2011, and it was corrupted.

I followed similar steps for the .docx file I created; however, there was no prompt asking if I wanted to keep the file format, so I saved it in .odt and tried to open it in Mac Office 2011 again. Word didn't even open it.
Comment 6 Cem Kaner 2013-09-28 15:22:08 UTC
I'm reviewing this report with Aral Tasher. He advises me of two other relevant items:

(1) He replicated this on both 4.0 (build 9702) and version 3.4.1 (both Mac).

(2) When he attempted to use OOo to reopen the odt file he had saved, OOo could not open it either. The file was unreadable to both OOo and MS-Word.
Comment 7 spikeynam 2013-11-14 23:59:11 UTC
I was also able to replicate this bug on Version 4.0.1 but on Windows 7 Ultimate 64-bit. Everything else is in line with Tasher except I tried opening the modified file with Microsoft Word 2013 and also received a corrupted file.
Comment 8 Todd 2014-06-08 01:26:16 UTC
I was able to replicate this bug using AOO Version 4.0.1 Build 9714 on Windows 7 OS SP1 so this appears to not be isolated to a single version or OS.

To replicate this bug:
I created two separate copies of the test file (Microsoft Word 97 – 2003, 117k) from tmarx@digitalprojects.com, to compare AOO behavior vs that of Word (v2010).

I followed the steps to replicate the bug on both copies with the respective applications…note that Word would only open the file in ‘Protected View’ and would not allow any editing to the file.

When closing the file in AOO, a dialog was displayed indicating that the document may contain formatting or content that can’t be saved in Word 6.0, and gave the option of keeping current format, or saving as ODF (native AOO)...I kept current format to replicate.

When attempting to open the file, a message indicating a Read-Error, and that the file is not a WinWord97 file. The size of the file has decreased down to a size of 53k.

This also appears to be related to previously reported bugs, 124811 (Crash when saving as .doc) and 125017 (Crash, corrupted Word file).

It’s relationship to 125017 is in the fact that a similar process produces both bug instances…SaveCloseRe-OpenCrash Indicating a Read Error…not a WinWord97 file.

It’s relationship to 124811 also appears to involve similar steps to reproduce.
I was not able to replicate this bug with a newly created document in Word, saved as Word 97 – 2003, and re-opened in AOO and resaved several times to increase file size…but there were no crashes.
Comment 9 Brian Dumont 2014-09-24 00:11:53 UTC
I was able to replicate these results on my 64 bit Windows 8 machine. 
I tested both the file from tmarx@digitaprojects.com and my own docx file I created.
My steps for testing the downloaded file were as follows:

1) Downloaded the document and opened it in OpenOffice Writer.
2) I then edited a single line by either adding or deleting content (I tested both)
3) I then saved the file and chose to keep the same format.
4) When I tried to open the file in OpenOffice I received an error saying that it is not a WinWord97 file.

I was however, unable to replicate this bug with my own created .docx file by following the same steps. I am using OpenOffice 4.1.1.
Comment 10 Steven Lott 2015-05-28 10:30:00 UTC
Mac OS X 10.10.3, Open Office version 4.1.1. I have two corrupted files created from Word .docx's. One has duplicated attributes in the content .XML. The other is simply unreadable.

This is a devastating bug.
Comment 11 b 2015-09-22 20:42:09 UTC
This bug still exists in 4.2.0. 
I attempted to replicate the bug on 9/22/2015.

System environment:
Apache OpenOffice Writer 4.2.0 - AOO420m1, Build: 9800, Rev.1692551
OS: Windows 7 Home Premium - SP 1 (64bit)
CPU: AMD FX-8350 Eight-Core Processor
GPU: NVIDIA GeForce GTX 970

Steps taken to replicate the issue:
1. Open file “Dienstreiseantrag_Cebit.doc” provided by original report.
2. Highlight “5. Voraussichtliche / geschätzte Reisekosten” with cursor.
3. Delete line, type “Lorem ipsum dolor sit amet” on the same field.
4. Click on the save button.
5. A prompt will appear asking whether to keep the file’s current format or save in ODF format. Leave “Ask when not saving in ODF format” checkbox checked and select “Keep Current Format”.
6. Close file.
7. Re-open file.
8. An error message will appear with the text: “Read-Error. This is not a WinWord97 file.” Clicking OK or the red X to close the dialogue will bring you to a blank “Untitled 1” unsaved OpenOffice Writer document.

I observed the same file size loss as indicated by the original report. Original is roughly 117-120 KB. After saving, the file is only 52 KB in size. Microsoft Office 365 is actually able to read the corrupted file, but formatting is broken and certain characters don’t appear.

I noticed that the file provided by the OP contained German text. I attempted to replicate this bug with my own created 100 KB .doc file but did not experience similar results. My own 32 KB .docx file also didn’t experience the same issue. I tried with both English and German text.

As a last test, I selected all the text from the originally provided file and copied it into a new .doc file. Following the same steps listed above, the file was not corrupted. I speculate that actual text in these corrupted files is not the issue. Despite having to change a certain line or a certain string of characters for Writer to save the file, the actual issue likely lies within how Writer handles.doc/docx files. This could definitely be a severe issue for people using Writer on certain older documents.
Comment 12 Keith N. McKenna 2015-09-22 21:07:44 UTC
Changed last confirmed on to 4.2.2-dev per comment 11 and hardware and os to all per multiple comments.
Comment 13 orcmid 2016-03-12 20:12:55 UTC
(In reply to tmarx from comment #0)
> Created attachment 76738 [details]
> It now is only 57k big and OpenOffice can't even open it. Error: not a
> Winword97 file.

Unfortunately, the "Not a Winword97 file" happens if an exception is thrown by the input filtering of a supposed Word 97 file.  

All that is know is that there was a problem reading/the input.  The inference reflected in the wording of the error message is misleading.
Comment 14 Jaime Romero 2016-09-26 20:28:54 UTC
I was able to reproduce the bug which consists of saving an old word file (.doc) that has been modified in Writer and then saved.

I currently have 
OpenOffice: 4.1.2
OS: Windows 10
CPU:i7-5500U
RAM:8GB
STORAGE: SAMSUNG 850 EVO SSD

Reproduction steps:
1. Download original file provided by tmarx@digitalproject.com (Dienstreiseantrag_Cebit.doc)
2. Open the file in Writer
3. Change the contents of the file by changing a line
4. When saving, keep the file format
5. Try to open the file

Result:
User gets the "Not a Winword97 file" error.

Opening and editing .doc files is important for users to be able to open their older docs in OpenOffice Writer.
Comment 15 rzerbe 2016-09-30 02:20:41 UTC
I observed the corruption bug to be reproducible ONLY when using the provided file and maintaining the current format in Writer.

Environment:
Version		Apache OpenOffice Writer 4.1.2
OS		Windows 10 Pro (64-bit)
CPU		Intel i5 3570k @ 4.2 GHz
GPU		NVIDIA GeForce GTX 660 Ti
RAM		16 GB

Reproduction:
1. Download “original file” attachment
2. Open “Dienstreiseantrag_Cebit.doc” with Writer
3. Edit the document by adding/removing characters
4. Save the document
5. A dialog should appear, select “Keep Current Format”
6. Close the document
7. Open “Dienstreiseantrag_Cebit.doc” with Writer

Observation:
A dialog displaying “Read-Error. This is not a WinWord97 file.” appeared. The file size was reduced from 117 KB to 52.5 KB and file corruption is visible when opening with other text editors such as Word. 

Further Exploration:

Attempt 1:
I created a Microsoft Word 97 – 2003 Document (the same file format as the file in question) to replicate the issue using the phrase:

“Die Reisekosten aus Haushaltsmitteln werden ausschließlich nach den Bestimmungen des Landesreisekostengesetzes NRW (LRKG) bzw. den hausinternen Vereinbarungen der BUW erstattet. Der Erstattungsanspruch ist verjährt, wenn die Reisekostenabrechnung nicht spätestens innerhalb von 6 Monaten nach Reiserückkehr Dezernat 4.1 vorgelegt wurde. Die Reisekosten aus Projektmitteln werden nach den jeweiligen Bestimmungen des Drittmittelgebers erstatte” 

directly copy-pasted from the provided document. I then performed the same process of modifying text, saving with “Keep Current Format”, and re-opening the file. This attempt at reproduction did not produce the error observed with the user’s document.

Attempt 2:
Next, I took the entire contents of “Dienstreiseantrag_Cebit.doc” and copy-pasted the clipboard to a new Microsoft Word 97 – 2003 Document. Repeating the same procedure outlined in the reproduction steps did not yield any errors.

The nature of this bug could corrupt the contents of older documents and must be examined.
Comment 16 Caroline Newman 2016-10-04 22:24:58 UTC
I am using Windows 10 and open office 4.1.2

I tried many diferent scenarios:

Attempt 1: use the original file, open it, edit, and close it.
I received the same error as everyone else.

Attempt 2: Create my own very large test file with a lot of formatting and large pictures in Microsoft word 2013 and saved it as a "Word 97-2003 document", Then I did the open, edit, and close. No errors. 

I then did a few more variations of that. I also opened the original file in word, and noticed it was in a protected, read only format.  

This is when I realized that windows was listing the original file as a "Word 97-2003 document" however, when opened in Open Office, open office said it was a "word 6.0" document. 

Attempt 3: I downloaded a "word 6.0" file from online tested it with the open, edit, save, open thing again.  No errors.  

Attempt 4: I copy and pasted a few lines of the unusual formatted test from the original file to my new "word 6.0" file.  I did save As, edit, save, close, open.... and I got the original Error.  

Attempt 5: I made a new file and copied just a few lines of the original and saved it with the  "word 6.0" file type. Then I did the exit and re-open test. At only 12KB this file still had the same error.
Comment 17 Keith N. McKenna 2016-11-03 19:48:59 UTC
Confirmed with following configuration:

System Configuration:
Processor: Intel Core i5 CPU M560 @2.67GHz
Installed Memory: 2.00 GB (1.6 usable)
Operating System: Windows 7 Home Premium 64 bit

Apache Open Office:
AOO413m1(Build:9783)  -  Rev. 1761381
2016-09-29 02:39:19AOO413m3(Build:9782)  -  Rev. 1709696
Language: en_US
Additional Language Packs: None