Issue 57580 - Optimize 'images.zip'
Summary: Optimize 'images.zip'
Status: CONFIRMED
Alias: None
Product: General
Classification: Code
Component: code (show other issues)
Version: 680m138
Hardware: All All
: P3 Trivial (vote)
Target Milestone: AOO Later
Assignee: AOO issues mailing list
QA Contact:
URL:
Keywords: oooqa
Depends on:
Blocks:
 
Reported: 2005-11-09 14:20 UTC by pavel
Modified: 2014-02-27 07:51 UTC (History)
8 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: 4.1.0-dev
Developer Difficulty: ---


Attachments
count results (221.51 KB, application/vnd.oasis.opendocument.spreadsheet)
2014-02-27 06:49 UTC, Rainer Bielefeld
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description pavel 2005-11-09 14:20:31 UTC
Hi,

while investigating other issues, I found that images.zip contains zillions of
duplicated images. The following lists duplicated images (number of occurences,
MD5 sum):

    159 807dae78a1ec98d3ea5b3e32c98ffe90
     97 80d815d13cda9b63ebd372fec588cfd5
     39 36d1d1d46cdd975db0539b82ffe5ac42
     38 50f29638fa0077563917079d253bc6ad
     21 825cdbac91b67e546779ab0b044d34c1
     18 2f23396a4fdea21793118cd3654bffff
     17 10a5e31f459eef1605af760b6b4f1086
     16 ac16f51de1eb909a039de78ac652e29d
     11 85e62fc7c318733a04dd393828c7e3a8
     10 c72e90bd6cea0321ab79071713d9e61b
     10 bb189c5fadb9728d143f134df8c6ac72
     10 2b33af19fd1a73b4ecf6b9914ee8e4eb
     10 17b8d50c159558bc02b6ba63f01bdabb

generated as:

unzip images.zip
find . -type f -exec md5sum '{}' \;|awk '{print $1}'|sort|uniq -c | sort -rn

It seems unneccessary to include one image 159 times...
Comment 1 thorsten.martens 2005-11-18 11:17:25 UTC
TM->requirements: please have a look.
Comment 2 ace_dent 2005-11-30 00:13:08 UTC
I would suggest that the 159x & 97x repeated images are the 16x16 and 32x32
versions of the 'green cross on blue background' icon.
Look at: ui/default_images/sfx2/res/ command.png & ln06.png.
I assume these icons are place holders? It is possible that the other repeated
icons are 'valid', since there are multiple copies of the standard tick, cross,
etc. in different project folders within 'images.zip'... it would be nice to cut
down on some of the duplication by assembling the 'core' icons. Might also
relate to issue: 36518?
Cheers, Andrew.
Comment 3 ace_dent 2006-02-17 18:17:46 UTC
To work towards optimizing OOo images, I have started an audit. It is currently
only an early draft, but I will post updates fairly regularly here:
http://students.bath.ac.uk/ea2aced/OOo/OOoIconCat.odt

Hopefully this work is of interest to developers of OOo. With 2.0.2rc1, the
'images.zip' archives now contributes *18MB* of the application size!!

Regards,
Andrew
Comment 4 ace_dent 2006-08-10 02:22:12 UTC
CC sts
Comment 5 mci 2006-08-27 14:03:33 UTC
duplicate of issue 66690 ??
Comment 6 pavel 2006-08-27 14:28:09 UTC
This issue is about reducing the reduncandy, not quality of images, so no, this
is not duplicate...
Comment 7 pavel 2006-12-03 13:16:14 UTC
Kai: are you interesting to work on this?
Comment 8 ace_dent 2006-12-03 18:30:20 UTC
Although I have not updated the audit document in a while, the current link is:
http://people.bath.ac.uk/ea2aced/OOo/OOoIconCat.odt
If this work is useful I will continue adding to it.

I think to make this really work, it would be great to have a combined effort
with freedesktop.org, so that a naming / icon standard can be formed for general
office / productivity suites. I'm happy to collaborate on this- with anyone!
Apparently the IconTool maintained by Sun's ArtTeam (Stella) stores contextual
information for the images- so this might allow redundancy to be reduced.

Maybe Stella could comment on the current state of the place holder icons (green
cross)- are they needed? Will icons be generated? If not, we should remove them
or at lest substitute with minimum size png images (67 bytes for 1pixel). With
current builds, one placeholder icon will now occur *8* times (default norm &
HC, industrial norm & HC, crystal norm & HC, and HiContrast norm & HC!!).

Look forward to feedback or signs of interest :-)
Andrew
Comment 9 pavel 2006-12-04 08:21:35 UTC
An idea:

1. remove duplicates
    - either via preprocessing the images list or via postprocessing the images.zip file

2. in case that image doesn't exist in the images.zip, consult duplicates.txt file in images.zip. This file 
will contain something like:

    path/existing/file/icon.png
    <TAB>path/to/not/existing/duplicated/icon/which/was/removed/icon.png
    <TAB>path/to/another/not/existing/duplicated/icon/which/was/removed/icon.png
    ...

Do you think it could work?
Comment 10 carsten.driesner 2006-12-04 08:37:29 UTC
cd: Added myself on CC.
Comment 11 amesates 2010-10-23 15:32:11 UTC
Created attachment 72528
Comment 12 Rainer Bielefeld 2014-02-27 06:47:27 UTC
I wonder whether a cleanup already has bee done? I did some tests with very imperfect tools and did not fine "zillions" if Identical files. But nevertheless, found for "images.zip" osme possible duplicates:

(a) many imagename.png exist more than 1 time in the extracted file system.
    Examples: names_as_addressing.png 23 times, sc_underline.png 25 times
(b) A quick test with "sc_underline.png" shows that most of them have 
    different sizes, so they can't be identical. But I found several with
    identical size, let's take the ones with 408 bytes:
    ....\images\res\commandimagelist\sv\
    ....\images\res\commandimagelist\en-GB
    ....\images\res\commandimagelist\de
    ....\images\res\commandimagelist
    really look identical to me, show a capital underlined U
(c) so due to (b) there still might be some necessity for a cleanup?

(d) My test (WIN 7):
(d1) extract  "images.zip" (from "\share\config") to a new folder 
(d2) from console from folder to where "images.zip"
     has been extracted  do 'dir /s /w > dir.csv'
(d3) open "dir.csv" with calc, copy / paste all rows containing a .png
     to a new sheet
(d4) In a new column with function like "=COUNTIF(D$1:D$10000;D4745)" 
     in row 4745 list number of found files with that 
     particular name in D4745
(d5) search extracted folders for a name with duplicates, sort hits by
     file size and compare files with the same name and size
(e) may be there are more powerful tools to find duplicates?
    My tests are rather "expensive".
Comment 13 Rainer Bielefeld 2014-02-27 06:49:59 UTC
Created attachment 82728 [details]
count results

My file number counts for "images.zip with server installation of "AOO 4.1.0-Dev – English UI / German locale - [AOO410m1(Build:9750) - Rev. 1566800 - 2014-02-12]" on German WIN7 Home Premium (64bit)", own separate user profile. (I am not 100% sure concerning rev, meght be few days more early or later).