Apache OpenOffice (AOO) Bugzilla – Issue 17964
Fix word count feature for asian text
Last modified: 2013-08-07 14:38:26 UTC
This patch adds a Tools->Word Count entry, it seems this is a frequent end-user request, the end result is not beautiful [ it being still a TabDialog ] but it does have the merit of being functional ;-) HTH.
Created attachment 8306 [details] ugly word count hack.
HI-AMA: Please analyse the attachment.
What do you think? I'm not sure that we should implement it as a hack. If the user needs such feature we should think about a proper solution.
os->cj: Do we want to support a "statistic-only" dialog? I don't know if this is really often requested. If yes the implementation should only add a new slot id to the sfx2 and the SfxDocumentInfoDialog should be stripped in SfxObjectShell::DocInfoDlg_Impl()
an accessible word count is loved by journalists; Journalists write the reviews of OpenOffice.org that say "it sucks because I couldn't find word count" => it's worth fixing IMHO.
cj-ft: could you please take a look at this and add this to your word count spec. Thanks.
Re-assigned from DEFECT to FEATURE
Title changed
.
*** Issue 1793 has been marked as a duplicate of this issue. ***
Copied from 1793 word count should also count words in selections: The word count cannot count a selection of text, but rather the entire document (eg, while File -> Statistics gives words for the entire document, users should also be able to highlight a selection of text and run a wordcount on it). Many students and almost all professional writers need to be able to wordcount a selection of text, as they must produce documents which consists of different sections with different word limits.
Well, if this isn't a Potemkin's village... First an issue is closed as duplicate of this one, but the votes for it /20!/ are not added to this one. Then, the closed issue had priority 3, this one has priority 4! Please immediately change priority of this one to 3 or again open the old issue with its higher priority and add attachments from this one to it and close this issue! Next, this is an issue on all platgorms and all OSs, as was the case with the closed isssue! Please change the Platform and OS data of this issue! When another intermediate milestone before 2.0 will be proposed, please change the target milestone to it (i.e. 1.2 or whatever) - so we don't have to wait indefinitely for this.
This should not be forgotten: A superb word count feature is MANDATORY! :o) No bullsh*t, it really is. Academic writers, journalists and writers in general use this simple tool more than you think. I too was about to leave OOo Writer because this funtionality is abscent. It is no coincidence that Microsoft Word XP enhanced its word count feature, thats simply because people needed it and asked for more! We are not asking for it because Word has it and OOo then must have it too. It should be quite simple to add, it is a totally harmless feature - and on my universite we use it a lot! What we need is a) That is is in the tools menu - accessible! b) That it can count the entire document c) That it can count the entire document, except footnotes e) Perhaps that it can count the entire document, except indexes and other fields f) That it can count on a selection of text As of now Writer includes text in footnotes to when counting. If it did not I had been forced to switch back to Word. I know that word counting seems like a strange feature to those who do not need it, but it really is important!!! And forcing everybody to use a macro is totally insane. If you want OOo to be used by EVERYBODY - then give them a choice they can handle! :) g) And with and without spaces! :)
After hundreds of votes for this and duplicate enhancement requests, it seems this is a major issue (it is with me -- my publisher was NOT pleased that my word count included spaces). I would suggest that under Tools > Options, there be a place to set word count to include/exclude spaces, footnotes, bibliography and (for some reason -- I don't know my publishers reasoning for it) TOC entries. That is for simplicity for users. I suspect this is not an easy programming issue, but it is certainly important for professional writers. Definitely a priority 3 (for some of us, it is worse). Why should we save as a Word document to get correct word counts?
Here is another way to enhance the word count feature, live word count. Microsoft Word for Mac has it, but the Windows version lacks it. http://www.microsoft.com/mac/otherproducts/office2001/using.aspx? pid=usingoffice2001&type=tips&article=/mac/products/office/2001/word/wordcount. xml
I have written several scripts implementing the "live word count" feature. It should be trivial to add a loop to the main program that does the same thing(I hope). I'll attach the basic code for my scripts.
Created attachment 13347 [details] Looping wordcounter scripts (live word count)
Created attachment 13348 [details] A wordcounting script I found online that also counts selections
The macro posted above is a very old one. There is a greatly improved version, which counts selections with and without footnotes and generally does all you can in Basic available from http://www.darwinwars.com/lunatic/bugs/oo_macros.html I know it is not a proper solution, but I really think it's the best hack available at the moment.
Hey all. I do Japanese to English translation, and use word counts -- and Japanese character counts -- extensively for estimates, invoicing, and work pacing. In addition to the features already mentioned, something else important is the way the word count tool deals with Asian characters. Many Asian languages are not counted in words, but in characters. In order for the word count tool to be useful with Asian languages, it needs to be able to distinguish between Asian and non-Asian characters and produce independent counts for Asian characters and non-Asian words. It's also important that the "Asian character count" not include Asian/double-byte spaces, or, that it show both Asian character total with spaces and Asian character total without spaces.
FT: This is is postponed to office later _but_ the main features are specified in issue 27302 and will be targeted in OO.o 2.0
*** Issue 27422 has been marked as a duplicate of this issue. ***
OOo Later? You must be kidding. These are a features of word processors anno 1980! People around me struggle to get macros working, and when they fail they go back to MS Word, not to return to OOo. First impressions last!
As another Japanese -> English translator, I would like to second miller_dscott's comments. A word/character count function that distinguishes between Asian and non-Asian text is vital. As mentioned in a post over on the OpenOffice.org Forum (http://www.oooforum.org/forum/viewtopic.php?p=23214#23214), I currently get rather silly results with OOo. A sample paragraph just pasted in from the front page of http://www.nikkei.com shows 135 Asian characters in Word, and 78 'words' in Writer. Again, the very concept of 'word' is irrelevant for counting Japanese, as everyone goes by character count, not including spaces. I understand it's similar for Chinese. Cheers, Erik
miller_dscott can you please add a sample document with Asian and normal characters and include the proper count statistics (so that those of us who do not know how to count Asian chars can test with something :)
Do votes and user requests and comments actually make any difference? If so, it would seem that priority of this bug would be right at the top. But it's not, it's set for "later," whenever that is.
Actually, edsuom, It will be significantly improved by OOo 2.0 (due out at the first of the year). The 680 - developers snapshot already has it installed and running. It is a feature, on the Tools bar, that does total document and selected text word count and character count. So, yes the votes and comments do matter. This will be fixed, and if you want to get the latest and greatest (albeit potentially buggy) version, download the latest 680 snapshot and you can use the improved word count today. :-)
chadley78, the current implementation in the developer release is nothing but a quick fix. The only new feature is the ability to count words in a text selection. Look at THIS bug... Target milestone is "OOo later". I stil don't understand why something as simple as counting words is not implemented right away, as you can see from the votes and comments this is a major issue to many people. At least it would please a lot of users and not require too much work from the developers. OOo later... when is that? I don't know and I therefore still use Word. The word count enhancement already present in the developer release is insufficient for my needs.
This feature is essential. It's the main reason I needed to switch back to Word when I was finalising a dissertation, in order to get word counts for selected text.
*** Issue 33888 has been marked as a duplicate of this issue. ***
There still seem to be duplicates floating around, eg Issue 4568. I'm only going to echo what's been said before: this feature is as important a part of a word processor as the ability to display characters on screen. For OOo to stand against competitors, this needs to be implemented as soon as possible.
as sajer said: > c) That it can count the entire document, except footnotes > f) That it can count on a selection of text > g) And with and without spaces! :) It could be important, if there are 142 characters with or without spaces, it is (like Word does!) important in some study homeworks if there are 100 words (or chars) in the selectable body-Text or if there are 100 words in body + footnotes (footnotes which belong to the selected! body text, not all footnotes)!
Created attachment 23112 [details] another macro with some features
*** Issue 46339 has been marked as a duplicate of this issue. ***
I recently installed 2.0 Beta 1, and while there is much to be very happy about, the word count feature is **STILL** next to useless for anyone using Asian (double-byte) text. I find it very difficult to believe that counting chars and checking for double- or single-byte-ness is so hard that this hasn't been implemented yet. OOo is *clearly* aware of double-byte-ness, given the presence of Asian language support options and formatting capabilities. So why oh why has this been neglected yet again? Why is double-byte Asian text counted as _words_, when the very concept doesn't really exist (at least for Japanese)? I'll attach a short sample presently.
Created attachment 29609 [details] Sample mixed Asian-Latin text, with screenshots of Word's comprehensive and accurate word count, and OOo's inaccurate and inadequate word count (sorry, just telling it like it is)
Confirmed that this issue remains the same in 2.0 RC.
It seems this issue turned into something different than the initial patch submission. I understand that the patch is no longer applicable. I changed the issue type to enhancement. We can then go from here and see who is going to help and improve word count for asian languages...
set target to OOo Later
Confirmed that mixed CJK + Western documents still have useless word / character counts, as of version 2.2.1. Over four years and counting.
Confirmed that mixed CJK + Western documents still have useless word / character counts, as of version 2.3. Target still vaguely set to "OOo Later" -- does this have any hope of resolution in 3.0?
Downloaded IBM's Lotus Symphony Beta, and this does indeed break down CJK vs Western counts, and produces the same results as MS Word for the sample text included at the top of my upload from 16 Sep 2005. I had read somewhere that Lotus Symphony was supposed to be based somehow on OOo, but briefly looking over the app makes it look like it's built on Eclipse; perhaps it just borrows code from OOo for .odt support, and thence the confusing media articles? As I am wholly ignorant of the internals, this may be a foolish question, but is there any possibility of using Symphony's more accurate word-count breakdown code in OOo?
Just tried with the 3.0 Beta, and the worse-than-useless count of mixed Asian + alphabetic text remains the same as back in 2005 when I created the "Asian Count Sample.odt" file. Would someone on the dev team be kind enough to chime in and indicate what "OOo Later" might mean? Are we talking the next minor upgrade, the next major upgrade, or even further down the line? ... As an aside, I got ambitious and thought I'd try to help out by looking into fixing the source myself, since the word / character counting functionality should theoretically be relatively simple, but the amazingly byzantine API docs would require far and away much more time than I can afford to set aside as a sole-prop... :(
mmp not longer working at Sun, reassigning to mba
Confirmed that this issue remains the same in 3.0 RC 3 (Build 9357).
Do avoid further confusion I adjusted the summary to reflect the direction the discussion in this issue has taken. Andreas, can we make a plan if and when we can work on this?
Hi! I 'm looking for a fix for Indic script(Indian languages). Word count is almost double in OO Writer compared to MS Word. When can we get this fixed?
I've been grappling with the non-existing count of Asian characters for weeks and must use MS Word whenever I really want to count them. I admire the patience of Erikanderson3 who has kept the torch on this matter for five years and not even lost his manners yet. For me personally, I most often do not need the exact count of characters so I have made a rule of thumb for me “count – 15 %†that is close to correct or just count the number of pages and get about as accurate an answer. Linux carries almost an endless variety of Chinese type fonts and not a single way of counting them except for MS Word virtualizations.
@olhat -- If you can get IBM's Lotus Symphony to install (somewhat complicated on Fedora 9, for instance), you *can* count Asian chars + Western chars in a single doc from within Linux. LS is based on the OOo 1.x code branch, and so has issues that OOo no longer does, but I still find LS very useful just for this counting functionality. I suppose it's possible that Abiword or one of the other FOSS packages that also handle .odt files might likewise be able to count mixed CJK + Western texts -- time to experiment!
Erroneous counts of mixed Asian-Western texts identical in OOo 3.1. Almost six years for this bug in total, and over five years since the Asian issues were raised. And we have seen essentially zero progress in all that time. I can only hope that the Oracle acquisition might mean changes for this project.
Just for grins, I just tried installing the Japanese localized version on the theory that maybe some extra Asian counting functionality might be included. But, no dice -- apparently the only difference is in the localized UI. I still get the same laughably nonsensical counts. I think I'll go see if Lotus Symphony has made any headway...
Commit rev#1241345. Please try and reply if this is fixed.
Created attachment 77190 [details] Asian Language Word Count Fix (for Chinese/Japanese/Korean) Asian language word count fix (for Chinese/Japanese/Korean). Please reply if this fixed the problem.
Looks like is problem has be modified.
Marking fixed as discussed in http://s.apache.org/f6