Apache OpenOffice (AOO) Bugzilla – Issue 68098
Bug machinery mails get a 3.2 spamassassin score
Last modified: 2017-05-20 10:27:52 UTC
Hi, The mails that the Bug machinery sends have a spamassassin score of 3.2: score=3.2 required=10.0 tests=BAYES_50,FORGED_RCVD_HELO,NO_REAL_NAME,SUBJECT_ENCODED_TWICE,SUBJECT_EXCESS_BASE64 Except for the Bayesian (which I'm trying to teach), all these can be eliminated by fixing the corresponding mail headers.
Please help us to understand what the problems are. NO_REAL_NAME is obvious. Could you explain: SUBJECT_ENCODED_TWICE,SUBJECT_EXCESS_BASE64 FORGED_RCVD_HELO
Hi, FORGED_RCVD_HELO is actually a problem from my part, do not worry about it. SUBJECT_EXCESS_BASE64 is because mail composers usually encode the subject in a way that minimize the size and maximize readability. For instance, an english subject (that hence can be encoded in ascii) shouldn't be encoded with =?utf-8?blabla?= quirk at all. Latin languages (which have a few non-ascii characters) should have non-ascii parts be encoded with =?utf-8?q?blabla?=, i.e. the quoted-printable form, so that ascii characters of the non-ascii parts can still be easily read. Currently, SpamAssassin considers that OOo's bug machinery always using the base64 (=?utf-8?B?hex?=) encoding is excessive: - if the subject is plain ascii, it shouldn't get encoded at all. - if the subject contains only few non-ascii characters, these parts should be encoded with =?utf-8?q?blabla?= - else, =?utf-8?b?hex?= is indeed the preferred way (and SpamAssassin shouldn't frown in such case) SUBJECT_ENCODED_TWICE is actually a consequence of the previous one: The problem is with long subjects, that need to be split in several header lines. Since the bug machinery currently always encode all the subject, it has to split this encoding too, resulting to: Subject: =?utf-8?b?hexhexhexhexhehxehxehexhehexhehexhehxehehxehhexh?= =?utf-8?b?hexhexhexhexhexhehehxehehxehhxehexheh?= Which is what SpamAssassin calls "encoding the subject twice". By avoiding excessive encoding, this should be avoided in most case. But not all. That's why I've requested SpamAssassin to avoid tagging such subjects (since there is no other way to encode them), but they preferred to just reduce the associated score. The bug machinery should hence just try to avoid it as much as possible.
Thanks "sthibaul" for the explanation. Support, could you please take care of these problems.
Started working on this issue .
I dont think this is something which we could much about have been reading the RFC documents and feel what we are following is correct and exactly as stated in RFC 2047 <snip> The following are examples of message headers containing 'encoded- word's: From: =?US-ASCII?Q?Keith_Moore?= <moore@cs.utk.edu> To: =?ISO-8859-1?Q?Keld_J=F8rn_Simonsen?= <keld@dkuug.dk> CC: =?ISO-8859-1?Q?Andr=E9?= Pirard <PIRARD@vm1.ulg.ac.be> Subject: =?ISO-8859-1?B?SWYgeW91IGNhbiByZWFkIHRoaXMgeW8=?= =?ISO-8859-2?B?dSB1bmRlcnN0YW5kIHRoZSBleGFtcGxlLg==?= Note: In the first 'encoded-word' of the Subject field above, the last "=" at the end of the 'encoded-text' is necessary because each 'encoded-word' must be self-contained (the "=" character completes a group of 4 base64 characters representing 2 octets). An additional octet could have been encoded in the first 'encoded-word' (so that the encoded-word would contain an exact multiple of 3 encoded octets), except that the second 'encoded-word' uses a different 'charset' than the first one. </snip> Please have a look at this link for more details on this respect http://aspn.activestate.com/ASPN/Mail/Message/spamassassin-users/3107435 Here is the details provided for an issue reported in the apache site for this kind of problem and the workaround or suggestion provided . http://www.mail-archive.com/dev@spamassassin.apache.org/msg15778.html <snip> http://issues.apache.org/SpamAssassin/show_bug.cgi?id=5026 [EMAIL PROTECTED] changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |RESOLVED Resolution| |WORKSFORME ------- Additional Comments From [EMAIL PROTECTED] 2006-08-04 17:34 ------- Hi, Thanks for the ticket. What you're reporting is commonly referred to as a "false positive" (aka: FP). The rule is actually doing the right thing -- the subject does have two encodings in it, and so the rule is triggered. It appears that this is more common now than it was before: old: 1.047 1.4619 0.0792 0.949 0.58 0.89 SUBJECT_ENCODED_TWICE new: 0.597 0.6926 0.1444 0.827 0.65 0.89 SUBJECT_ENCODED_TWICE which basically means that the spam hits have decreased by ~50% while the ham hits increased by ~50%. So the next time the scores are generated, I would expect this rule's score to drop a bit. In the mean time, you can lower the score on your installation as you see fit. Hope this helps. :) As for the ticket, since the rule is doing the right thing, I'm closing as WFM. </snip> I would like to close this issue as wontfix . Hence i am going ahead and closing as the same . Please reopen if you feel otherwise . Thanks Jobin.
Hi, Could you please at least fix NO_REAL_NAME? (This is really easy). Samuel
I understand however there is a workaround for this . Since the mail generated from the IZ would/might not have a legitmate mail id like jobin@abc since it is internally generated . Which even normal users of AOL also faces . Given below is an example and the workaround provided to set up a rule which could be used to allow mails from the IZ . <snip> AOL has no real name. The NO_REAL_NAME test of SA will add points when email is not in the format Joe Smith <jsmith@anyisp.com>. However, AOL email software does not append a "real name" in the "from" field - so this recipe counteracts the effect of the NO_REAL_NAME test to avoid false positives from AOL users. (Contributed by A. Marshall, 7/26/03) header MAIL_FROM_AOL From = /aol\.com/i meta AM_AOL_HAS_NO_NAME MAIL_FROM_AOL && NO_REAL_NAME describe AM_AOL_HAS_NO_NAME Counteracts NO_REAL_NAME test for AOL email score AM_AOL_HAS_NO_NAME -1.1 </snip>
Yeah, that's easy to do. But the problem is: not every people will know how to do that, so that most of them will have BTS mails to go spam dir. Of course, Spamassassin itself could add a rule in its default config, but that seems pretty ugly to me: should they have to add every bot that doesn't add a real name?!
i have the same problems, especially on IZ automatic notifications 0.6 NO_REAL_NAME Le champ From: ne contient pas le nom complet de 1.5 SUBJECT_ENCODED_TWICE Subject: MIME encoded twice 0.0 SUBJECT_EXCESS_BASE64 Subject: base64 encoded encoded unnecessarily
Updating whiteboard.
We were able to identify a way/workaround which would allow us to the eliminate the NO_REAL_NAME from the test . Thereby any IZ mail notification would have something like Name <emailid> in the From field . More updates to follow .
The engineers have added a facility which would resolve the problem of NO_REAL_NAME for mails generated from the IZ .Hence resolving this issue for the future .
Plans are in place for resolving this issue in the next patch release of CEE 4.5.2.Setting the target milestone to reflect the same. Marking this issue as Resolved Later. Support will continue to track this issue internally and review the fix once the patch has been applied on the site.
The option of adding %currentuserrealname% in the Form field by default via the IZ Email Notification template is present for Add/Modify issue. Stefan , please make the changes in the IZ template to verify this issue has been fixed .
Actually marking this issue as Fixed . Please verify and close this issue .