Apache OpenOffice (AOO) Bugzilla – Issue 40213
title and first link use the same string -> optimize translation
Last modified: 2017-05-20 11:31:03 UTC
Hi, we use PO files for translation of help. There are too many pieces like this: #: 08020000.xhp#tit.help.text msgid "Current Size" msgstr "Aktuálnà velikost" #: 08020000.xhp#hd_id3154011.1.help.text msgid "\\<link href=\\\"text/simpress/02/08020000.xhp\\\" name=\\\"Current Size\\\"\\>Current Size\\</link\\>" msgstr "\\<link href=\\\"text/simpress/02/08020000.xhp\\\" name=\\\"Aktuálnà velikost\\\"\\>Aktuálnà velikost\\</link\\>" These strings come directly from the GSI file (1:1 mapping between GSI and PO file). Can we define some mechanism or automated process to make it easier/consistent to translate? If the string in the title of the page is already translated, why do we have to translate it again? Can't we re-use it somehow? What about e.g. (meta-diff): <title id="tit" xml-lang="en-US">Current Size</title> -<link href="text/simpress/02/08020000.xhp" name="Current Size">Current Size</link></paragraph> +<link href="text/simpress/02/08020000.xhp" name="$title">$title</link></paragraph> As you know, I like clear evidences, so: pavel@linux:~/.ooo/cs> grep helpcontent en-US.sdf | awk -F' ' '{if ($2 != PREVHELPFILE) { print $11; getline; print $11 ; PREVHELPFILE=$2} }'|grep -B1 "link href"|grep "link href"|wc -l 829 pavel@linux:~/.ooo/cs> In English, we have 829 occurences of this (title and exactly the following string - link - contains the same string). pavel@linux:~/.ooo/cs> grep helpcontent en-US.sdf | awk -F' ' '{if ($2 != PREVHELPFILE) { print $11; getline; print $11 ; PREVHELPFIL E=$2} }'|grep -B1 "link href"|grep "link href"|grep name|wc -l 739 pavel@linux:~/.ooo/cs> And as the same string is used also 739 times in the "name" in the link, we can save 1568 strings from translations. Right now, we have 69559 strings in en-US GSI file. This means, that by implementing this, we can save up-to 2.25% of the translations. The above applies for <link ...> being just after the <title in the GSI. Sometimes, the link is the third line, so have to add them too: pavel@linux:~/.ooo/cs> grep helpcontent en-US.sdf | awk -F' ' '{if ($2 != PREVHELPFILE) { print $11; getline; getline; print $11 ; PR EVHELPFILE=$2} }'|grep -B1 "link href"|grep "link href"|grep name|wc -l 991 pavel@linux:~/.ooo/cs> grep helpcontent en-US.sdf | awk -F' ' '{if ($2 != PREVHELPFILE) { print $11; getline; getline; print $11 ; PREVHELPFILE=$2} }'|grep -B1 "link href"|grep "link href"|wc -l 1118 pavel@linux:~/.ooo/cs> -> another 2109 strings saved. So total numbers for this improvement would be: 3677 strings, 5.28% of the complete translations (8.5% of the complete help). This a huge improvement! Of course I can implement some mechanism in our tools for generating PO files, but I'd like to solve this directly in the source also for teams who do not use PO files. How does this affects Sun's translation mechanisms? Giving this Prio 1 because it could save *a lot of time/money* for translators.
First of all, this is by no means a P1. Secondly, although this happens most of the time, title and first heading do not *need* to use the same string. We can discuss changing the help DTD in a way that this can be resolved. Thirdly, this is yet another localization issue that could be resolved by using TM or pretranslation. It may be theroretically possible to get rid of any redundancy but you would end up with files full of cross references. Believe me, we were at that stage before with StarOffice help where we did the weirdest things to avoid duplication. I leave this to the localization tools because they can handle that most efficiently without forcing the tech writer to find out whether a phrase was used before in the help or not. I am open for suggestions for this particular case, though, how we can optimize this for >2.0
Reset assigne to the default "issues@openoffice.apache.org".