Apache OpenOffice (AOO) Bugzilla – Issue 57580
Optimize 'images.zip'
Last modified: 2014-02-27 07:51:15 UTC
Hi, while investigating other issues, I found that images.zip contains zillions of duplicated images. The following lists duplicated images (number of occurences, MD5 sum): 159 807dae78a1ec98d3ea5b3e32c98ffe90 97 80d815d13cda9b63ebd372fec588cfd5 39 36d1d1d46cdd975db0539b82ffe5ac42 38 50f29638fa0077563917079d253bc6ad 21 825cdbac91b67e546779ab0b044d34c1 18 2f23396a4fdea21793118cd3654bffff 17 10a5e31f459eef1605af760b6b4f1086 16 ac16f51de1eb909a039de78ac652e29d 11 85e62fc7c318733a04dd393828c7e3a8 10 c72e90bd6cea0321ab79071713d9e61b 10 bb189c5fadb9728d143f134df8c6ac72 10 2b33af19fd1a73b4ecf6b9914ee8e4eb 10 17b8d50c159558bc02b6ba63f01bdabb generated as: unzip images.zip find . -type f -exec md5sum '{}' \;|awk '{print $1}'|sort|uniq -c | sort -rn It seems unneccessary to include one image 159 times...
TM->requirements: please have a look.
I would suggest that the 159x & 97x repeated images are the 16x16 and 32x32 versions of the 'green cross on blue background' icon. Look at: ui/default_images/sfx2/res/ command.png & ln06.png. I assume these icons are place holders? It is possible that the other repeated icons are 'valid', since there are multiple copies of the standard tick, cross, etc. in different project folders within 'images.zip'... it would be nice to cut down on some of the duplication by assembling the 'core' icons. Might also relate to issue: 36518? Cheers, Andrew.
To work towards optimizing OOo images, I have started an audit. It is currently only an early draft, but I will post updates fairly regularly here: http://students.bath.ac.uk/ea2aced/OOo/OOoIconCat.odt Hopefully this work is of interest to developers of OOo. With 2.0.2rc1, the 'images.zip' archives now contributes *18MB* of the application size!! Regards, Andrew
CC sts
duplicate of issue 66690 ??
This issue is about reducing the reduncandy, not quality of images, so no, this is not duplicate...
Kai: are you interesting to work on this?
Although I have not updated the audit document in a while, the current link is: http://people.bath.ac.uk/ea2aced/OOo/OOoIconCat.odt If this work is useful I will continue adding to it. I think to make this really work, it would be great to have a combined effort with freedesktop.org, so that a naming / icon standard can be formed for general office / productivity suites. I'm happy to collaborate on this- with anyone! Apparently the IconTool maintained by Sun's ArtTeam (Stella) stores contextual information for the images- so this might allow redundancy to be reduced. Maybe Stella could comment on the current state of the place holder icons (green cross)- are they needed? Will icons be generated? If not, we should remove them or at lest substitute with minimum size png images (67 bytes for 1pixel). With current builds, one placeholder icon will now occur *8* times (default norm & HC, industrial norm & HC, crystal norm & HC, and HiContrast norm & HC!!). Look forward to feedback or signs of interest :-) Andrew
An idea: 1. remove duplicates - either via preprocessing the images list or via postprocessing the images.zip file 2. in case that image doesn't exist in the images.zip, consult duplicates.txt file in images.zip. This file will contain something like: path/existing/file/icon.png <TAB>path/to/not/existing/duplicated/icon/which/was/removed/icon.png <TAB>path/to/another/not/existing/duplicated/icon/which/was/removed/icon.png ... Do you think it could work?
cd: Added myself on CC.
Created attachment 72528
I wonder whether a cleanup already has bee done? I did some tests with very imperfect tools and did not fine "zillions" if Identical files. But nevertheless, found for "images.zip" osme possible duplicates: (a) many imagename.png exist more than 1 time in the extracted file system. Examples: names_as_addressing.png 23 times, sc_underline.png 25 times (b) A quick test with "sc_underline.png" shows that most of them have different sizes, so they can't be identical. But I found several with identical size, let's take the ones with 408 bytes: ....\images\res\commandimagelist\sv\ ....\images\res\commandimagelist\en-GB ....\images\res\commandimagelist\de ....\images\res\commandimagelist really look identical to me, show a capital underlined U (c) so due to (b) there still might be some necessity for a cleanup? (d) My test (WIN 7): (d1) extract "images.zip" (from "\share\config") to a new folder (d2) from console from folder to where "images.zip" has been extracted do 'dir /s /w > dir.csv' (d3) open "dir.csv" with calc, copy / paste all rows containing a .png to a new sheet (d4) In a new column with function like "=COUNTIF(D$1:D$10000;D4745)" in row 4745 list number of found files with that particular name in D4745 (d5) search extracted folders for a name with duplicates, sort hits by file size and compare files with the same name and size (e) may be there are more powerful tools to find duplicates? My tests are rather "expensive".
Created attachment 82728 [details] count results My file number counts for "images.zip with server installation of "AOO 4.1.0-Dev – English UI / German locale - [AOO410m1(Build:9750) - Rev. 1566800 - 2014-02-12]" on German WIN7 Home Premium (64bit)", own separate user profile. (I am not 100% sure concerning rev, meght be few days more early or later).