Apache OpenOffice (AOO) Bugzilla – Issue 11993
XBreakIterator::getCurrentWord with DICTIONARY_WORD flag
Last modified: 2013-02-24 21:07:28 UTC
When using the breakiterator in DICTIONARY_WORD mode and I have a text like "abcd ef ghi??? KLM", the results should be: a) if the cursor (index) is placed right after the "d" isBeginWord: false isEndWord : true getCurrentWord : 0, 4 but returned is: 5, 7 b) if it is placed amidst the two spaces between "f" and "g": isBeginWord: false isEndWord : false getCurrentWord : 8, 8 but returned is: 9, 12 c) if it is placed right after the "i": isBeginWord: false isEndWord : true getCurrentWord : 9, 12 but returned is: 16, 9 d) if it is placed right before the "K": isBeginWord: true isEndWord : false getCurrentWord : 16, 9 This case is OK. Note: getWordBoundary is always being called with last argument set to TRUE. To put it in somewhat other words, comparing it to isBeginWord and isEndWord the results should be: isBeginWord isEndWord result false false x, x where x is the current cursor/index position false true the boundary of the word before the cursor true false the boundary of the word following the cursor true true the word following the cursor if the last function argument is TRUE, the word before the cursor if it is false The fix of #i3117# depends on this being fixed.
TL->Karl: Seems to be your issue.
The case isBeginWord isEndWord result false false x, x where x is the current cursor/index position Should have been: false false x, x where x is the current position if the cursor is not within a dictionary word, otherwise the boundaries of the word the cursor is located in.
I have fixed case a) c) and d). For b), according to the internal bug 106385, DICTIONARY_WORD should skip spaces. getWordBoundary will find previous or next word boundary according to last function argument. It is impossible to have isBeginWord and isEndWord both TRUE. If whole text does not have word, ex. only contains space and punctuation, getWordBoundary will return startPos and endPos as nStartPos, and isBeginWord and isEndWord as FALSE.
Hi Michael, please set this one to verified if 3117 is found fixed. Thanks Frank
Set to "Verified" in agreement with FME
.
The fix caused problems while importing HTML tables (see internal task #109082), so it has been taken back from OO 1.1 beta2.
Reassigned to Karl, to newly fix it in OO 1.1 final.
SBA: I talked to FME and the fix for the core issue 3117 (Ctrl+F7 does not call Thesaurus if the cursor is right behind a word) will be done within Writer and not by the breakiterator. Reassigned to Thomas.
TL: As dicussed with MI I will change the target to OO 2.0 in order to have time to discuss this and it' impact in more detail.
*** Issue 14904 has been marked as a duplicate of this issue. ***
Well, if it is really goig to remai unfixed until 2.0 (and I don't understand the earlier reasoning, but did file the issue just marked as a duplicate) could someone please explaine what are the circumstances when this bug is triggered? This is because I'll need to write a workaround into my case changing macro if this is not going to be fixed until 2.0
TL-Karl: Could you answer the question?
Karl: In current implementation, breakiterator only ignores space when it tries to find word boundary. Puctuations is counted as word. When you have something like "word", first word is '"', its boundary is 0,1. and second is 'word' and boundary is 1,5. You could see first word's end is the second word's start, overlapped. When you put cursor in any position of 'word', first call goToStartOfWord move cursor to word's start, in second call goToEndOfWord, which call getWordBoundary with direction as backwords, meaning you want previous word, it returns first word's boundary. Now you get your selection's boundary as second word's start and first word's end, in this case, both of them are 1, you get nothing, and insert your new word in position 1.
Correction: in second call goToEndOfWord, it is not because it calls getWordBoundary with backwards direction, it is because it call isEndWord first, since two words are overlapped, second word's start is first word's end, isEndWord return true and you get both start and end of selection are 1.
Fixed in CWS i18n08
Verified in CWS i18n08.
Adjusting owner
Adjusting resolution
SBA->SW: Please have a look.
SW->SG: please verify this in i18n08, which can be found on cwsserv03
fix works with getCurrentWord, but did change the behaviour of isBeginWord and isEndWord functions. This bug is verfied nevertheless, isBeginWord and isEndWord are handled in #i21907.
set to verified.
what is happening about this? It is still broken in 680_m32.
andrewb is indeed right: this does not work in src680_m32, the behaviour is again like described in the bug. So the bug goes back to Karl.
cleared resolution.
This regression is caused by bug fix for 112021. When cursor is on the end of a word, but not the begin of another word, no matter getWordBoundary is searching forwards or backwords, it should return boundary of the word. Refix in cws i18n13.
reopen the issue for reassigning to QA.
checked on Solaris and Windwos, works -> verified.
Checked on Solaris and Windows again, worked -> clsed.