55650 – Conversion wizard is too slow if log enabled

Issue 55650 - Conversion wizard is too slow if log enabled

Summary: Conversion wizard is too slow if log enabled

Status:	CONFIRMED

Alias:	None

Product:	General
Classification:	Code
Component:	code (show other issues)
Version:	OOO 2.0 Beta2
Hardware:	All All

Importance:	P3 Trivial (vote)
Target Milestone:	AOO Later
Assignee:	AOO issues mailing list
QA Contact:

URL:
Keywords:

Depends on:
Blocks:

Reported:	2005-10-08 13:23 UTC by mmenaz
Modified:	2013-07-30 02:20 UTC (History)
CC List:	2 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description mmenaz 2005-10-08 13:23:58 UTC

I've started converting all the documents of my office from MsOffice format to
OpenDocument, using the wizard.
I've started the conversion in a directory tree with about 3,500 files.
With the "log" enabled, the conversion is 10x slower than without the log, and,
even worse, becomes slower and slower. After 7 hours (sigh!) I was around 1,600
files converted, and I had to abort the process.
I restarted without log enabled, and I completed it, as I said, with a
document/minute ratio 10x faster, from start to end.
Even when converting 680 Excel files the "log" flag made it be a painful
experience (but did not re-tried without it, so don't know exactly what
improvements you could have in a "pure" Excel conversion).
I expect a lot of people use the wizard with their archives, so improvements in
this regards is much more important than before.
Please, improve the "log" option algorithm, or remove it.
I've realized a python script that I run after the wizard, that checks if each
M$ Office file has a converted one, and if not prints it's path. This works only
if you have the "source" files and the converted one in the same position, of
course. This is very beta software (I don't assume responsability for it), but
seems to work fine, so you could include it in the wizard to run it at the end
of the process, and it's probably more useful than the report (btw, it does not
yet consider the templates).

#!/usr/bin/env python

""" Shows filenames of files not been converted by the OpenOffice conversion wizard
    (originale and converted files are supposed to be in the same directory)

    Copyright (c) by Marco Menardi    mmenaz@mail.com
    with the help of Paolo Veronelli  paolo.veronelli@gmail.com


    This program is Free Software; you can redistribute it and/or
    modify it under the terms of the GNU General Public License
    as published by the Free Software Foundation; either version 2
    of the License, or any later version.

    This program is distributed in the hope that it will be useful,
    but WITHOUT ANY WARRANTY; without even the implied warranty of
    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
    GNU General Public License for more details.

    You should have received a copy of the GNU General Public License
    along with this program; if not, write to the Free Software
    Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301, USA.

"""
__version__="0.1"
##### hack this dict to add supports
exts={'.doc':'.odt', # Word,  Write
      '.xls':'.ods', # Excel, Calc
      '.ppt':'.odp', # PowerPoint, Impress
      '.pps':'.odp', # PowerPoint, Impress (different extension)
      '.sxw':'.odt', # Write 1.0
      '.sxc':'.ods', # Calc 1.0
      '.sxd':'.odp', # Impress 1.0
      '.sdw':'.odt', # StarWrite 1.0
      '.sdc':'.ods', # StarCalc 1.0
      '.sdd':'.odp', # StarImpress 1.0
      '.sdp':'.odp', # StarImpress 1.0 (different extension)
     }


import glob, os, stat,sys,optparse

usage='''
python notooo.py <directory>

\tLists Microsoft Office files that have no OpenDocument equivalent in the same
directory
\tDefault is current directory.
'''

parser = optparse.OptionParser(usage=usage)
parser.add_option("-n", "--nonrecurse",dest="nonrecurse",
action="store_false",default=True,
   help="don't recurse down target directory")

options,args = parser.parse_args()
if not len(args): # default to '.'
  basedir = '.'
elif len(args) == 1: # the target directory
  basedir = args[0]
else:
  parser.print_help()
  sys.exit(1)


for root, dirs, files in os.walk(basedir):
    for name in files:
        base, ext = os.path.splitext(name)
        if ext in exts:
            target = os.path.join(root, base + exts[ext])
            if not os.path.exists(target):
                print os.path.join(root, name)
    if not options.nonrecurse:
      break

Comment 1 Olaf Felka 2005-10-10 09:02:15 UTC

of @ oc: Please have a look.

Comment 2 frank 2005-10-14 11:30:25 UTC

Hi Ilko,

seems to be your construction site.

178 files in some subfolders converted without log function 25 % faster.

Frank

Comment 3 mmenaz 2005-10-18 14:06:33 UTC

Frank:
"178 files in some subfolders converted without log function 25 % faster"
As I clearly stated, the log makes thing go slower and slower. If you convert
178 files, maybe. Try with 1780, or 6478 as I did, then let me know. The average
office has thousands of files to convert, in hundred of folders, you can't ask
them to convert them in bulks of 100 files. Do we want to ease their migration
as much as possible or not?
Thanks

Comment 4 Rob Weir 2013-07-30 02:20:11 UTC

Reset assignee on issues not touched by assignee in more than 2000 days.