Issue 17964 - Fix word count feature for asian text
Summary: Fix word count feature for asian text
Status: RESOLVED FIXED
Alias: None
Product: Writer
Classification: Application
Component: code (show other issues)
Version: OOo 1.1 RC2
Hardware: All All
: P3 Trivial with 40 votes (vote)
Target Milestone: ---
Assignee: andreas.martens
QA Contact: issues@sw
URL:
Keywords:
: 1793 27422 33888 46339 (view as issue list)
Depends on: 27302
Blocks: 4568
  Show dependency tree
 
Reported: 2003-08-07 15:42 UTC by mmeeks
Modified: 2013-08-07 14:38 UTC (History)
11 users (show)

See Also:
Issue Type: ENHANCEMENT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
ugly word count hack. (5.26 KB, patch)
2003-08-07 15:43 UTC, mmeeks
no flags Details | Diff
Looping wordcounter scripts (live word count) (3.39 KB, text/plain)
2004-02-22 11:55 UTC, kliment
no flags Details
A wordcounting script I found online that also counts selections (1.97 KB, text/plain)
2004-02-22 12:08 UTC, kliment
no flags Details
another macro with some features (3.47 KB, application/vnd.oasis.opendocument.text)
2005-02-28 12:34 UTC, stma
no flags Details
Sample mixed Asian-Latin text, with screenshots of Word's comprehensive and accurate word count, and OOo's inaccurate and inadequate word count (sorry, just telling it like it is) (27.48 KB, application/vnd.sun.xml.writer)
2005-09-16 18:00 UTC, erikanderson3
no flags Details
Asian Language Word Count Fix (for Chinese/Japanese/Korean) (4.41 KB, patch)
2012-02-07 04:39 UTC, imacat@mail.imacat.idv.tw
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description mmeeks 2003-08-07 15:42:58 UTC
This patch adds a Tools->Word Count entry, it seems this is a frequent end-user
request, the end result is not beautiful [ it being still a TabDialog ] but it
does have the merit of being functional ;-)

HTH.
Comment 1 mmeeks 2003-08-07 15:43:39 UTC
Created attachment 8306 [details]
ugly word count hack.
Comment 2 h.ilter 2003-08-11 16:32:09 UTC
HI-AMA: Please analyse the attachment.
Comment 3 andreas.martens 2003-08-11 16:49:29 UTC
What do you think? I'm not sure that we should implement it as a hack.
If the user needs such feature we should think about a proper solution.
Comment 4 Oliver Specht 2003-08-20 08:11:34 UTC
os->cj: Do we want to support a "statistic-only" dialog?
I don't know if this is really often requested. 

If yes the implementation should only add a new slot id to the sfx2
and the SfxDocumentInfoDialog should be stripped in
SfxObjectShell::DocInfoDlg_Impl() 
Comment 5 mmeeks 2003-08-20 10:43:13 UTC
an accessible word count is loved by journalists; Journalists write
the reviews of OpenOffice.org that say "it sucks because I couldn't
find word count" => it's worth fixing IMHO.
Comment 6 christian.jansen 2003-09-23 12:16:55 UTC
cj-ft: could you please take a look at this and add this to your word
count spec. Thanks.
Comment 7 falko.tesch 2003-09-26 15:21:13 UTC
Re-assigned from DEFECT to FEATURE
Comment 8 falko.tesch 2003-11-04 16:33:29 UTC
Title changed
Comment 9 falko.tesch 2003-11-04 16:33:44 UTC
.
Comment 10 bettina.haberer 2004-01-19 08:35:35 UTC
*** Issue 1793 has been marked as a duplicate of this issue. ***
Comment 11 bettina.haberer 2004-01-19 08:39:37 UTC
Copied from 1793 word count should also count words in selections:
The word count cannot count a selection of text, but rather the entire document
(eg, while File -> Statistics gives words for the entire document, users should
also be able to highlight a selection of text and run a wordcount on it).  

Many students and almost all professional writers need to be able to wordcount 
a selection of text, as they must produce documents which consists of different
sections with different word limits.
Comment 12 miles 2004-01-19 16:34:13 UTC
Well, if this isn't a Potemkin's village...
First an issue is closed as duplicate of this one, but the votes for it /20!/
are not added to this one. Then, the closed issue had priority 3, this one has
priority 4! Please immediately change priority of this one to 3 or again open
the old issue with its higher priority and add attachments from this one to it
and close this issue!
Next, this is an issue on all platgorms and all OSs, as was the case with the
closed isssue! Please change the Platform and OS data of this issue!
When another intermediate milestone before 2.0 will be proposed, please change
the target milestone to it (i.e. 1.2 or whatever) - so we don't have to wait
indefinitely for this.
Comment 13 sajer 2004-01-31 12:00:59 UTC
This should not be forgotten:

A superb word count feature is MANDATORY! :o)
No bullsh*t, it really is. Academic writers, journalists and writers 
in general use this simple tool more than you think.
I too was about to leave OOo Writer because this funtionality is 
abscent.  It is no coincidence that Microsoft Word XP enhanced its 
word count feature, thats simply because people needed it and asked 
for more!

We are not asking for it because Word has it and OOo then must have 
it too. It should be quite simple to add, it is a totally harmless 
feature - and on my universite we use it a lot!

What we need is

a) That is is in the tools menu - accessible!
b) That it can count the entire document
c) That it can count the entire document, except footnotes
e) Perhaps that it can count the entire document, except indexes and 
other fields
f) That it can count on a selection of text

As of now Writer includes text in footnotes to when counting. If it 
did not I had been forced to switch back to Word. I know that word 
counting seems like a strange feature to those who do not need it, 
but it really is important!!!

And forcing everybody to use a macro is totally insane. If you want 
OOo to be used by EVERYBODY - then give them a choice they can 
handle! :)

g) And with and without spaces! :)
Comment 14 rblackeagle 2004-01-31 15:41:10 UTC
After hundreds of votes for this and duplicate enhancement requests, it seems
this is a major issue (it is with me -- my publisher was NOT pleased that my
word count included spaces).  I would suggest that under Tools > Options, there
be a place to set word count to include/exclude spaces, footnotes, bibliography
and (for some reason -- I don't know my publishers reasoning for it) TOC entries.

That is for simplicity for users.  I suspect this is not an easy programming
issue, but it is certainly important for professional writers.  Definitely a
priority 3 (for some of us, it is worse).  Why should we save as a Word document
to get correct word counts?
Comment 15 jmm 2004-01-31 20:11:10 UTC
Here is another way to enhance the word count feature, live word count. 
Microsoft Word for Mac has it, but the Windows version lacks it.

http://www.microsoft.com/mac/otherproducts/office2001/using.aspx?
pid=usingoffice2001&type=tips&article=/mac/products/office/2001/word/wordcount.
xml
Comment 16 kliment 2004-02-22 11:53:05 UTC
I have written several scripts implementing the "live word count" feature. It
should be trivial to add a loop to the main program that does the same thing(I
hope). I'll attach the basic code for my scripts.
Comment 17 kliment 2004-02-22 11:55:32 UTC
Created attachment 13347 [details]
Looping wordcounter scripts (live word count)
Comment 18 kliment 2004-02-22 12:08:54 UTC
Created attachment 13348 [details]
A wordcounting script I found online that also counts selections
Comment 19 ingenstans 2004-02-23 19:28:42 UTC
The macro posted above is a very old one. There is a greatly improved version, 
which counts selections with and without footnotes and generally does all you 
can in Basic available from 

http://www.darwinwars.com/lunatic/bugs/oo_macros.html

I know it is not a proper solution, but I really think it's the best hack 
available at the moment.
Comment 20 miller_dscott 2004-03-25 14:56:35 UTC
Hey all. I do Japanese to English translation, and use word counts -- and
Japanese character counts -- extensively for estimates, invoicing, and work pacing.

 In addition to the features already mentioned, something else important is the
way the word count tool deals with Asian characters.
 Many Asian languages are not counted in words, but in characters. In order for
the word count tool to be useful with Asian languages, it needs to be able to
distinguish between Asian and non-Asian characters and produce independent
counts for Asian characters and non-Asian words.
 It's also important that the "Asian character count" not include
Asian/double-byte spaces, or, that it show both Asian character total with
spaces and Asian character total without spaces.
Comment 21 falko.tesch 2004-04-01 14:13:32 UTC
FT: This is is postponed to office later _but_ the main features are specified
in issue 27302 and will be targeted in OO.o 2.0
Comment 22 lohmaier 2004-04-04 12:51:02 UTC
*** Issue 27422 has been marked as a duplicate of this issue. ***
Comment 23 sajer 2004-05-05 22:22:43 UTC
OOo Later? You must be kidding. These are a features of word processors anno 
1980! People around me struggle to get macros working, and when they fail they 
go back to MS Word, not to return to OOo. First impressions last!
Comment 24 erikanderson3 2004-05-06 09:59:33 UTC
As another Japanese -> English translator, I would like to second
miller_dscott's comments.  A word/character count function that distinguishes
between Asian and non-Asian text is vital.  As mentioned in a post over on the
OpenOffice.org Forum
(http://www.oooforum.org/forum/viewtopic.php?p=23214#23214), I currently get
rather silly results with OOo.  A sample paragraph just pasted in from the front
page of http://www.nikkei.com shows 135 Asian characters in Word, and 78 'words'
in Writer.  Again, the very concept of 'word' is irrelevant for counting
Japanese, as everyone goes by character count, not including spaces.  I
understand it's similar for Chinese.  

Cheers,

Erik
Comment 25 con.hennessy 2004-06-02 01:21:51 UTC
miller_dscott can you please add a sample document with Asian and normal 
characters and include the proper count statistics (so that those of us who do not 
know how to count Asian chars can test with something :) 
Comment 26 edsuom 2004-07-12 19:48:00 UTC
Do votes and user requests and comments actually make any difference? 

If so, it would seem that  priority of this bug would be right at the top. But
it's not, it's set for "later," whenever that is.
Comment 27 chadley78 2004-07-12 20:01:41 UTC
Actually, edsuom, It will be significantly improved by OOo 2.0 (due out at the
first of the year).  The 680 - developers snapshot already has it installed and
running.  It is a feature, on the Tools bar, that does total document and
selected text word count and character count.  So, yes the votes and comments do
matter.  This will be fixed, and if you want to get the latest and greatest
(albeit potentially buggy) version, download the latest 680 snapshot and you can
use the improved word count today.  :-)
Comment 28 sajer 2004-08-05 00:37:37 UTC
chadley78, the current implementation in the developer release is nothing but a 
quick fix. The only new feature is the ability to count words in a text 
selection. Look at THIS bug... Target milestone is "OOo later".

I stil don't understand why something as simple as counting words is not 
implemented right away, as you can see from the votes and comments this is a 
major issue to many people.

At least it would please a lot of users and not require too much work from the 
developers. OOo later... when is that? I don't know and I therefore still use 
Word.

The word count enhancement already present in the developer release is 
insufficient for my needs. 
Comment 29 jhonan 2004-09-07 16:12:37 UTC
This feature is essential. It's the main reason I needed to switch back to Word
when I was finalising a dissertation, in order to get word counts for selected text.
Comment 30 stefan.baltzer 2004-09-09 15:35:00 UTC
*** Issue 33888 has been marked as a duplicate of this issue. ***
Comment 31 jnoreiko 2004-11-19 15:31:56 UTC
There still seem to be duplicates floating around, eg Issue 4568.

I'm only going to echo what's been said before: this feature is as important a
part of a word processor as the ability to display characters on screen. For OOo
to stand against competitors, this needs to be implemented as soon as possible.
Comment 32 stma 2005-02-28 12:25:50 UTC
as sajer said:

> c) That it can count the entire document, except footnotes
> f) That it can count on a selection of text
> g) And with and without spaces! :)

It could be important, if there are 142 characters with or without spaces, it is 
(like Word does!) important in some study homeworks if there are 100 words (or 
chars) in the selectable body-Text or if there are 100 words in body + footnotes 
(footnotes which belong to the selected! body text, not all footnotes)!
Comment 33 stma 2005-02-28 12:34:47 UTC
Created attachment 23112 [details]
another macro with some features
Comment 34 stefan.baltzer 2005-04-26 13:41:09 UTC
*** Issue 46339 has been marked as a duplicate of this issue. ***
Comment 35 erikanderson3 2005-09-16 17:33:06 UTC
I recently installed 2.0 Beta 1, and while there is much to be very happy about,
the word count feature is **STILL** next to useless for anyone using Asian
(double-byte) text.  I find it very difficult to believe that counting chars and
checking for double- or single-byte-ness is so hard that this hasn't been
implemented yet.  OOo is *clearly* aware of double-byte-ness, given the presence
of Asian language support options and formatting capabilities.  So why oh why
has this been neglected yet again?  Why is double-byte Asian text counted as
_words_, when the very concept doesn't really exist (at least for Japanese)? 
I'll attach a short sample presently.  
Comment 36 erikanderson3 2005-09-16 18:00:34 UTC
Created attachment 29609 [details]
Sample mixed Asian-Latin text, with screenshots of Word's comprehensive and accurate word count, and OOo's inaccurate and inadequate word count (sorry, just telling it like it is)
Comment 37 erikanderson3 2005-10-04 17:25:44 UTC
Confirmed that this issue remains the same in 2.0 RC.
Comment 38 stx123 2006-03-02 22:40:42 UTC
It seems this issue turned into something different than the initial patch
submission. I understand that the patch is no longer applicable. I changed the
issue type to enhancement. We can then go from here and see who is going to help
and improve word count for asian languages...
Comment 39 matthias.mueller-prove 2006-03-06 11:12:39 UTC
set target to OOo Later
Comment 40 erikanderson3 2007-09-11 18:12:58 UTC
Confirmed that mixed CJK + Western documents still have useless word / character
counts, as of version 2.2.1.  Over four years and counting.  
Comment 41 erikanderson3 2007-09-21 07:17:32 UTC
Confirmed that mixed CJK + Western documents still have useless word / character
counts, as of version 2.3.  Target still vaguely set to "OOo Later" -- does this
have any hope of resolution in 3.0?
Comment 42 erikanderson3 2007-09-21 07:31:22 UTC
Downloaded IBM's Lotus Symphony Beta, and this does indeed break down CJK vs
Western counts, and produces the same results as MS Word for the sample text
included at the top of my upload from 16 Sep 2005.  I had read somewhere that
Lotus Symphony was supposed to be based somehow on OOo, but briefly looking over
the app makes it look like it's built on Eclipse; perhaps it just borrows code
from OOo for .odt support, and thence the confusing media articles?  As I am
wholly ignorant of the internals, this may be a foolish question, but is there
any possibility of using Symphony's more accurate word-count breakdown code in OOo?
Comment 43 erikanderson3 2008-05-07 22:28:16 UTC
Just tried with the 3.0 Beta, and the worse-than-useless count of mixed Asian +
alphabetic text remains the same as back in 2005 when I created the "Asian Count
Sample.odt" file.  

Would someone on the dev team be kind enough to chime in and indicate what "OOo
Later" might mean?  Are we talking the next minor upgrade, the next major
upgrade, or even further down the line?


... As an aside, I got ambitious and thought I'd try to help out by looking into
fixing the source myself, since the word / character counting functionality
should theoretically be relatively simple, but the amazingly byzantine API docs
would require far and away much more time than I can afford to set aside as a
sole-prop...  :(
Comment 44 max.odendahl 2008-05-07 23:20:38 UTC
mmp not longer working at Sun, reassigning to mba
Comment 45 erikanderson3 2008-10-06 18:35:27 UTC
Confirmed that this issue remains the same in 3.0 RC 3 (Build 9357).
Comment 46 Mathias_Bauer 2008-11-05 09:37:02 UTC
Do avoid further confusion I adjusted the summary to reflect the direction the
discussion in this issue has taken.

Andreas, can we make a plan if and when we can work on this?
Comment 47 haroonrasheedjava 2008-11-25 07:39:30 UTC
Hi!

I 'm looking for a fix for Indic script(Indian languages). Word count is almost
double in OO Writer compared to MS Word. When can we get this fixed?
Comment 48 olhat 2008-11-25 10:29:01 UTC
I've been grappling with the non-existing count of Asian characters for weeks
and must use MS Word whenever I really want to count them.  I admire the
patience of Erikanderson3 who has kept the torch on this matter for five years
and not even lost his manners yet.

For me personally, I most often do not need the exact count of characters so I
have made a rule of thumb for me “count – 15 %†that is close to correct or just
count the number of pages and get about as accurate an answer. 

Linux carries almost an endless variety of Chinese type fonts and not a single
way of counting them except for MS Word virtualizations.
Comment 49 erikanderson3 2008-11-25 16:54:50 UTC
@olhat --

If you can get IBM's Lotus Symphony to install (somewhat complicated on Fedora
9, for instance), you *can* count Asian chars + Western chars in a single doc
from within Linux.  LS is based on the OOo 1.x code branch, and so has issues
that OOo no longer does, but I still find LS very useful just for this counting
functionality.  

I suppose it's possible that Abiword or one of the other FOSS packages that also
handle .odt files might likewise be able to count mixed CJK + Western texts --
time to experiment!
Comment 50 erikanderson3 2009-05-12 06:45:49 UTC
Erroneous counts of mixed Asian-Western texts identical in OOo 3.1.  Almost six
years for this bug in total, and over five years since the Asian issues were
raised.  

And we have seen essentially zero progress in all that time.  

I can only hope that the Oracle acquisition might mean changes for this project.  
Comment 51 erikanderson3 2009-05-12 17:31:54 UTC
Just for grins, I just tried installing the Japanese localized version on the
theory that maybe some extra Asian counting functionality might be included. 
But, no dice -- apparently the only difference is in the localized UI.  I still
get the same laughably nonsensical counts.  

I think I'll go see if Lotus Symphony has made any headway...
Comment 52 imacat@mail.imacat.idv.tw 2012-02-07 04:33:59 UTC
Commit rev#1241345.  Please try and reply if this is fixed.
Comment 53 imacat@mail.imacat.idv.tw 2012-02-07 04:39:22 UTC
Created attachment 77190 [details]
Asian Language Word Count Fix (for Chinese/Japanese/Korean)

Asian language word count fix (for Chinese/Japanese/Korean).  Please reply if this fixed the problem.
Comment 54 Mouette Yang 2012-02-11 06:43:06 UTC
Looks like is problem has be modified.
Comment 55 Andrea Pescetti 2012-02-19 20:47:44 UTC
Marking fixed as discussed in http://s.apache.org/f6