Issue 15428 - thencode: th_en_US hardcoded
Summary: thencode: th_en_US hardcoded
Status: CLOSED FIXED
Alias: None
Product: Infrastructure
Classification: Infrastructure
Component: Website general issues (show other issues)
Version: current
Hardware: All All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: khendricks
QA Contact: issues@lingucomponent
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2003-06-09 10:40 UTC by pavel
Modified: 2013-02-24 20:34 UTC (History)
1 user (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description pavel 2003-06-09 10:40:54 UTC
Kevin,

we can not use thencode directly for Czech thesaurus database creation, because
it has en_US hardcoded in it:

pavel@pavel:~/Tmp/ooo_11beta2_src/lingucomponent/source/thesaurus/parser> grep
th_en_US.dat *
thencode.cxx:      dpath = aoUrl + OUString(
RTL_CONSTASCII_USTRINGPARAM("/th_en_US.dat" )); 
thencode.cxx:      dpath = aoUrl + OUString(
RTL_CONSTASCII_USTRINGPARAM("th_en_US.dat"));

We must workaround it by

        mv $(BIN)/th_en_US.dat $(BIN)/th_cs_CZ.dat
        mv $(BIN)/th_en_US.idx $(BIN)/th_cs_CZ.idx

It is OK for now, but I think it should be fixed by adding additional
command-line argument to thencode (en_US or cs_CZ for us).

What do you think?
Comment 1 khendricks 2003-06-09 12:50:49 UTC
Hi, 
 
Yes your solution is best.  But I thought you were using a much simpler awk script 
and I was hoping you would "officially" contribute it and then thencode can go away 
for good. 
 
Also Giuseppe is completely rewriting thencode for the it_IT project because he had 
trouble with awk and "phrases with spaces in them". 
 
So I have high hopes that it_IT will contribute a much improved thencode for all of us 
to use. 
 
Until then I can add a parameter for which file name to use. 
 
Thanks, 
 
Kevin 
 
Comment 2 khendricks 2003-06-09 15:48:54 UTC
Hi, 
 
Fixed in latest version of cws_srx644_ooo11beta2 
 
New syntax is: 
 
thencode input_directory output_directory output_file_name 
 
So inside the newest dictionaries/en_US the makefile uses the following: 
 
./thencode $(MISC) $(BIN) th_en_US 
 
which creates th_en_US.idx and the_en_US.dat 
 
So I am resolving this as fixed. 
 
Please verify in the next OOo 1.1 and close. 
 
Thanks, 
 
Kevin 
 
Comment 3 pavel 2003-06-09 16:10:46 UTC
I see another possible problem:

thencode has another hardcoded values:

wordlist.txt and trimthes.txt

I added cs_CZ directory to dictionaries and after this, there already
are those files thus $(MISC)$/wordlist.txt and $(MISC)$/trimthes.txt
targets will not be rebuild and English thesauri effectively contains
Czech words :-(

I solved it by removing those files just after creating th_cs_CZ* files.


Comment 4 khendricks 2003-06-09 16:25:57 UTC
Hi Pavel, 
 
Good point!  But removing at the end is not workable under parallel builds is it (try 
building en_US and cz_CZ in parallel!) 
 
Since I don't want to pass in yet more names perhaps we should change all of this 
yet again to look something like: 
 
thencode input_directory output_directory root_name 
 
So I would pass in something like  
 
./thencode $(MISC) $(BIN) th_en_US 
 
and then thencode would read in  
 
th_en_US_words.txt 
th_en_US_thes.txt 
 
and produce th_en_US.idx and th_en_US.dat 
 
You would then pass in 
 
./thencode $(MISC) $(BIN) th_cz_CZ 
and your input files would be  
 
th_cz_CZ_words.txt 
th_cz_CZ_thes.txt 
 
and your output files would be 
 
th_cz_CZ.idx 
th_cz_CZ.dat 
 
Would that be better? 
 
Kevin 
 
Comment 5 pavel 2003-06-09 16:35:30 UTC
This is one approach.

BTW - I work on unix systems, thus names with 13 or more characters
are OK, but is it OK in general?

What about creating directory en_US or cs_CZ in $(MISC) and working
with files wordlist.txt and trimthes.txt there?
Comment 6 khendricks 2003-06-09 16:48:34 UTC
Hi, 
 
I would rather go with unigue names like th_en_US_words.txt and 
the_en_US_thes.txt. 
 
I have already mande the liungucomponent and dictioanary changes to implement 
this in my own tree and tested it and it works fine (and will not interfere with parallel 
builds). 
 
Since dictionary.lst created no problems and it is 10.3 I would assume that 14.3 
would be fine as well. 
 
But just to make sure, I have asked on the dev@porting and if 14.3 is fine, I will 
commit those changes. 
 
If that comes back as okay, I will commit these changes so I can close this once 
again! 
 
Thanks, 
 
Kevin 
 
Comment 7 pavel 2003-06-12 10:38:28 UTC
Kevin,

I think we can go with long names.
Comment 8 pavel 2003-06-20 13:00:03 UTC
I see it in 11RC :-)

I have modified cs_CZ thesaurus parsing and it seems to work :-)
I'm not able to test it in running RC, because it does not run - it
only offers to repair or uninstall.
Comment 9 pavel 2003-06-22 17:48:28 UTC
Found fixed in 11rc. Thanks Kevin.
Comment 10 pavel 2003-06-22 20:49:29 UTC
-