Issue 24914 - MX servers are not responding
Summary: MX servers are not responding
Status: CLOSED FIXED
Alias: None
Product: Infrastructure
Classification: Infrastructure
Component: Mailing lists (show other issues)
Version: current
Hardware: All All
: P1 (highest) Trivial (vote)
Target Milestone: ---
Assignee: Unknown
QA Contact: issues@www
URL:
Keywords:
: 24945 25375 (view as issue list)
Depends on:
Blocks: 24851
  Show dependency tree
 
Reported: 2004-01-29 16:14 UTC by grsingleton
Modified: 2004-08-31 17:54 UTC (History)
7 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
DNS analysis for OOo (8.85 KB, text/plain)
2004-02-02 14:58 UTC, grsingleton
no flags Details

Note You need to log in before you can comment on or make changes to this issue.
Description grsingleton 2004-01-29 16:14:04 UTC
Tested website.openoffice.org, qa.openoffice.org, openoffice.org,
marketing.openoffice.org, and user-faq.openoffice.org. Noe of the MX servers for
these hosts are responding. I suggest replacing or updating the MTA software.
Comment 1 stx123 2004-01-29 17:11:12 UTC
I see for openoffice.org
openoffice.org.         5M IN MX        20 openoffice.org.
openoffice.org.         5M IN MX        5 asmx1.sfo.collab.net.
and no MX record for the subprojects.

asmx1.sfo.collab.net is responsive for me.
The SMTP server at openoffice.org is not.

My impression is that messages addressed to xxx@openoffice.org are delivered
after some time.
Messages addressed to xxx@<project>.openoffice.org can't be delivered as the
SMTP Server at openoffice.org (64.125.133.202) is not responding.
Comment 2 stx123 2004-01-29 17:16:26 UTC
adding to top5
Comment 3 Unknown 2004-01-29 17:28:05 UTC
I've filed an internal issue for the Ops and engineering team to review. I will
update this issue shortly.
Comment 4 Unknown 2004-01-29 19:08:18 UTC
ops has replied

The subprojects (aka subdomains of "openoffice.org") automatically get routed to
the MX for the domain. Mail is being delivered properly using MX records, this
is *exactly* the way internet mail and DNS is supposed to work. This is a non-issue.

(We closed off openoffice.org from receiving email directly from the internet to
force all email to go through the filtering MX, since we're getting such a high
volume of worm/virus traffic)

closing/invalid
Comment 5 grsingleton 2004-01-29 19:23:07 UTC
Thanks for the update. It is sad that things still do not work. I have mail
queued for the past 48 hours that cannot be transmitted because there is NO MTA
to pick it up. So much for your claim.
Comment 6 pavel 2004-01-29 20:03:17 UTC
E.g. porting:

pavel@pavel:~> dig -t mx porting.openoffice.org

; <<>> DiG 9.2.2 <<>> -t mx porting.openoffice.org
;; global options:  printcmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 213
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 0

;; QUESTION SECTION:
;porting.openoffice.org.                IN      MX

;; AUTHORITY SECTION:
openoffice.org.         284     IN      SOA     ns1.collab.net.
hostmaster.collab.net. 2004010900 3600 1800 2419200 300

;; Query time: 12 msec
;; SERVER: 10.20.0.1#53(10.20.0.1)
;; WHEN: Thu Jan 29 21:05:32 2004
;; MSG SIZE  rcvd: 101

It returns nothing thus mails are are delivered directly to IP. and
64.125.133.202 does not listen on 25 :-(

MX records are invalid, IMHO. Please show the part of the ooo zone.
Comment 7 pavel 2004-01-29 20:43:35 UTC
-bash-2.05b$ host -t mx openoffice.org
openoffice.org mail is handled by 5 asmx1.sfo.collab.net.
openoffice.org mail is handled by 20 openoffice.org.

Why openoffice.org does not listen on 25?

This is the reason for

E9185AA7C7      635 Thu Jan 29 21:14:29  Pavel@Janik.cz
     (connect to council.openoffice.org[64.125.133.202]: Connection timed out)
                                         doesnotexist@council.openoffice.org

and similar.
Comment 8 stx123 2004-01-29 20:59:20 UTC
You might want to take a MX wildcard record for *.openoffice.org into
consideration. IIRC we had soemthing like that ealier.
Comment 9 pavel 2004-01-29 21:40:26 UTC
Kenneths:

> The subprojects (aka subdomains of "openoffice.org") automatically get routed to
> the MX for the domain.

What do you mean by automatically? Could you please name the mechanism that is
used? Do you have domain-wide MX? *.openoffice.org? As I stated in previous
comments, this is not the case. Thus there is only MX pointing to
openoffice.org. The rest is routed *directly* to IP address!

> Mail is being delivered properly using MX records, this
> is *exactly* the way internet mail and DNS is supposed to work. This is a
> non-issue.

No - as you can see, mails to project.openoffice.org are not delivered to MX
with the lowest prio. Why? Because there is no MX thus all mails are delivered
to A of openoffice.org:

pavel@pavel:~> host -t A openoffice.org
openoffice.org has address 64.125.133.202

Which is not listening/it seems to firewall port 25 -> no mails at all!

When I manually (via telnet ... 25) sent mail via primary MX
(asmx1.sfo.collab.net.) to announce@cs.openoffice.org. It came back to me almost
instantly (I;m the moderator of it)!
Comment 10 pavel 2004-01-29 21:46:27 UTC
To be exact, all mails are delivered to project.openoffice.org which is A to the
same IP as openoffice.org, ie. 64.125.133.202.
Comment 11 grsingleton 2004-01-29 22:15:42 UTC
One of ten::
   **********************************************    **      THIS IS A WARNING
MESSAGE ONLY      **    **  YOU DO NOT NEED TO RESEND YOUR MESSAGE  **   
**********************************************The original message was received
at Thu, 29 Jan 2004 12:53:59 -0500from www.pathtech.org [205.189.41.25]   -----
Transcript of session follows -----<dev@marketing.openoffice.org>... Deferred:
Connection timed out with
marketing.openoffice.org.<authors@user-faq.openoffice.org>... Deferred:
Connection timed out with
user-faq.openoffice.org.<dev@website.openoffice.org>... Deferred: Connection
timed out with website.openoffice.org.Warning: message still undelivered after 4
hoursWill keep trying until message is 5 days old
Comment 12 lsuarezpotts 2004-01-30 01:33:05 UTC
as is obvious, project list mail is not being delivered.
I have updated the PCN issue to reflect that.
louis
Comment 13 lsuarezpotts 2004-01-30 08:00:29 UTC
update:
I have entered pavel's point in the PCN issue and further updated the issue. This is going on the 
3rd day of no project mail.
louis
Comment 14 lsuarezpotts 2004-01-30 08:56:44 UTC
this probably more closely corresponds to PCN 25658
adding dthomas to cc list and kerry, too.
louis
Comment 15 stx123 2004-01-30 14:28:56 UTC
*** Issue 24945 has been marked as a duplicate of this issue. ***
Comment 16 drewethomas 2004-01-30 18:57:46 UTC
We continue to work on this. This morning we added a wildcard MX record 
pointing to asmx1.sfo.collab.net.

We continue to monitor the situation closely.
Comment 17 lsuarezpotts 2004-02-02 07:57:48 UTC
Mail is working its way through OOo but there is a lot of it.  As the engineer on duty reported, for 
Saturday, 
"Lots of mail is getting through. Between the hours of 7am and 8am today, for
example, the inbound mail exchanger for the Sun sites processed 41390 incoming
messages. Of these:
31214 (75%) were Novarg worms
5264 (13%) were spam
4912 (12%) were delivered on to one of the sun sites."

Saturday AM (-0800) is not a busy time.  The resulting load is probably producing some erratic 
behavior. Please indicate in this issue such behavior.
louis
Comment 18 grsingleton 2004-02-02 14:58:51 UTC
Created attachment 12840 [details]
DNS analysis for OOo
Comment 19 grsingleton 2004-02-02 14:59:25 UTC
The statistics are interesting but there is more to the problem. DNS for OOo has
errors that are preventing messages from being received. Please see the attachment.
Comment 20 lsuarezpotts 2004-02-02 19:22:48 UTC
updating the PCN issue accordingly; thanks Ger.
Louis
Comment 21 stx123 2004-02-03 18:18:42 UTC
You are certainly aware that asmx1 is not available since 4 hours...
Comment 22 Unknown 2004-02-03 18:36:03 UTC
Could you please elaborate a bit more on your statement "asmx1 is not available"?
Comment 23 stx123 2004-02-03 18:46:41 UTC
$ telnet asmx1.sfo.collab.net smtp
Trying 64.125.133.81...
telnet: connect to address 64.125.133.81: Connection refused
$
Comment 24 Unknown 2004-02-03 18:55:52 UTC
per our internal issue Ops has restated that the mail gateway is available and
is processing:

<snip>It's up and working fine, in fact it's processing mail at a rapid rate.
Unfortunately for periods of time it gets too busy and rejects new connections.
So, MTA's will just need to keep retrying (which they will do!) and eventually
the messages will be delivered. <snip>
Comment 25 lsuarezpotts 2004-02-03 19:20:19 UTC
Ger,
I presented your point to Ops at CollabNet. They appreciated the insight and data and systematically 
went through all the points. Mail is, at it happens, working fine--it's just that we are way overloaded, 
on the order of 3 million mail messages last week, 2.2 of which were clearly spam/viral. The errors you 
point to are not actually relevant.  Thus:

[quoting:]
ERROR: no SOA record for openoffice.org. from tld1.ultradns.net.
ERROR: no SOA record for openoffice.org. from tld2.ultradns.net.

These "errors" are meaningless. The toplevel server doesn't need an SOA record
for openoffice.org.  All the toplevel server needs is
the ns records for the openoffice domain. Our nameservers, ns{1,2,3}.collab.net
 have SOA records in place. A missing SOA record for the TLD server could
not have any possible effect on email delivery. For example, kernel.org also
"fails" this test.

The last error message looks valid, but is in fact wrong:
ERROR: NS list from openoffice.org. authoritative servers does not
  === match NS list from parent (org.) servers

If this were indeed true, it would be a misconfiguration, but it is
untrue. Doing the following queries easily proves this:
dig @TLD1.ULTRADNS.NET openoffice.org ns
dig @TLD2.ULTRADNS.NET openoffice.org ns
dig @ns1.collab.net. openoffice.org ns
dig @ns2.collab.net. openoffice.org ns
dig @ns3.collab.net. openoffice.org ns

End quote.

So, the fact of the matter is that things are working--but b/c of the tsunami of wormy crap coming 
from out there, the system is just slow in delivering mail.  
thanks for collaborating!
louis 
Comment 26 lsuarezpotts 2004-02-03 20:20:19 UTC
To give insight into how heavily the system is being taxed:

Quote:
FYI, on the outgoing side, here's stats through 10:30AM 
from openoffice. As you can see, plenty of mail going out!

Total delivery attempts: 277806
Accepted by destination: 176531 (63.5%)
Deferred by destination: 67223 (24.2%)
Failures: 23404 (8.4%)
Double bounces: 10646 (3.8%)
Triple bounces: 2 (0.0%)

Meanwhile, on the ingoing, we've been hit by several million/week, most of which (>95%) are spam/
viruses.  Expect delays. Mail however is going through and the system is correctly configured.
louis
Comment 27 grsingleton 2004-02-03 20:38:51 UTC
Outgoing is NOT my concern. Inbound is. I have posted messages to user-faq
starting Saturday that have yet to arrive and be posted. This concerns me a lot.
THat is why I started to look at DNS and the MX records and how they were
handled. When I saw errors I added to the issue.

I received your personal email with the update of this issue and replied. I'll
bet that because out-going is working well you will see this first.
Comment 28 lsuarezpotts 2004-02-03 20:54:15 UTC
hi
actually, I received both about the same time. Ger, please check your own ISP. The blockage could be 
there, esp. for mail sent Saturday.  Mail is getting through, though, as I emphasized before, not as 
smoothly or swiftly as desired.  
louis
Comment 29 grsingleton 2004-02-03 21:53:15 UTC
For the record, I act as my own ISP. I have my own portable Class "C" that my
upline routes for me. Other than paying once a month that the only involvement
of an outside service. My MTA connects directly and thus I know that the OOo
mail server accepted the messages but that's as far as I could go with
troubleshooting my own message submissions. Still haven't shown up in users-faq.
Maybe time to try again ;-)
Comment 30 Unknown 2004-02-11 16:21:11 UTC
*** Issue 25375 has been marked as a duplicate of this issue. ***
Comment 31 sander_traveling 2004-02-11 16:26:42 UTC
Now that 25375 is closed as a duplicate of this - whats the status of having the
problems fixed so mailing lists atcually work? And why didn't the hardware
upgrade give any noticeable performance improvements?

Comment 32 Unknown 2004-02-11 16:34:06 UTC
The servers are still under a heavy load related to the very high traffic
related to the worm. I'll update this issue with some statistics shortly.
Comment 33 sander_traveling 2004-02-11 16:36:32 UTC
But load statistics will only tell me why things are broken, not how they are
going to get better 8-(
Comment 34 grsingleton 2004-02-11 16:45:55 UTC
Since unwanted traffic seems to be a major problem, is there any solid technical
reason that http://qmail-scanner.sourceforge.net/ cannot be implemented?
Comment 35 sander_traveling 2004-02-11 17:40:45 UTC
Maybe. But one would need to know where the load is coming from
Comment 36 sander_traveling 2004-02-11 19:36:26 UTC
if there is a high load due to worm activity, how come the announce@moderation
queue - the only part of mail system that appears to work as expected - only
receives 20 mails per hour?
Comment 37 simonbr 2004-02-11 20:25:44 UTC
I tried to send an e-mail yesterday and today to discussie@nl.openoffice.org, 
but it's not coming through (also not in the archives). 
Are e-mails getting lost, or are they just held up in some queue?
Comment 38 simonbr 2004-02-11 21:36:46 UTC
> I tried to send an e-mail yesterday and today to discussie@nl.openoffice.org,
The second mail arrived, more than an hour late. I suppose the first one has 
been lost...
Comment 39 Unknown 2004-02-11 21:58:36 UTC
simonbr: what time did you send the mail, what was the subject line and what
host was the message sent from? I'll relay that information to the Ops Engineers
to investigate.
Comment 40 simonbr 2004-02-11 22:30:24 UTC
@kenneth, 
Both mails, Subject: "Open Office en OpenOffice.org", sent to 
discussie@nl.openoffice.org via smtp.xs4all.nl, from: simon.oo.o@xs4all.nl
First mail (lost) sent at 10 feb 2004, 22:42 +0100
Second mail (received) sent at 11 feb 2004, 19:52 +0100

Comment 41 grsingleton 2004-02-11 23:08:01 UTC
More evidence:

To: 	OOo_marketing list <dev@marketing.openoffice.org>
Subject: 	Re: [Marketing] Re: [website-dev] Accessibility of Openoffice
Date: 	Wed, 11 Feb 2004 13:08:32 -0500

To: 	OOo_marketing list <dev@marketing.openoffice.org>
Cc: 	Louis Suarez-Potts <louis@openoffice.org>, Jacqueline McNally
<openoffice.org@decisions-and-designs.com.au>
Subject: 	[Fwd: Accessibility of Openoffice]
Date: 	Tue, 10 Feb 2004 08:59:43 -0500

To: 	OOo_marketing list <dev@marketing.openoffice.org>
Cc: 	OOo_website mailing list <dev@website.openoffice.org>
Subject: 	Accessibility of Openoffice
Date: 	Tue, 10 Feb 2004 00:37:24 -0500

Comment 42 sander_traveling 2004-02-11 23:24:10 UTC
as a way of conducting project communications, the mailing lists simply don't
work right now. I have 10 mails now which have received 'cannot deliver for 4
hours' notices today and have not made to the lists. 
Comment 43 sander_traveling 2004-02-11 23:33:52 UTC
Some of subjects / times:

subject: accessibility of openoffice.org 
time: 13:54 +0000

subject: Re: [discuss] Triple O Reader
time: 14:39 +0000

subject: ping 
time: 15:39 +0000

subject: insert random topic here 
time: 15:51 +0000

subject: 1.1.1 and marketing 
time: 16:14 +0000

Comment 44 Unknown 2004-02-11 23:37:49 UTC
I needed the host information as well please.
Comment 45 sander_traveling 2004-02-12 00:11:35 UTC
the host would one of nwkea-mail-{1,2,3,4}.sun.com or brmea-mail-{1,2,3,4}.sun.com 

for the listed mails, in that order:
brmea-mail-3.sun.com
nwkea-mail-1.sun.com
brmea-mail-3.sun.com
brmea-mail-4.sun.com
brmea-mail-4.sun.com

In general, sun management and ITops tend to frown upon publishing names of
internal machines.
Comment 46 pavel 2004-02-12 08:44:28 UTC
The status right now is:

openoffice.org.         300     IN      MX      5 asmx1.sfo.collab.net.
openoffice.org.         300     IN      MX      20 openoffice.org.

Two MX records. One of them (the preferred one) is asmx1, other is
openoffice.org which is the same as www.openoffice.org.

I propose another solution:

- remove openoffice.org from MX records
- put more machines as MX with the same priority to get more rotation
- those machines, together with current asmx1 will only do prefiltering for
viruses, bounces and such and the rest will be relayed to openoffice.org which
will accept incoming SMTP only from those systems.

This has much better scaling - you can add more and more machines and can
overcome SC's badly designed system that has to (according to available
informations) run on only one single machine. Really?
Comment 47 Unknown 2004-02-17 21:35:57 UTC
As stated elsewhere, the extraordinary load of email is being filtered and is
making it through- albeit slowly. Any bounced messages can be resent and sit in
the queue.

I'm going to close out this issue as it's morphed from it's original problem
which is being dealt with, followed and described in detail in another issue.
Any suggestions on replacing hardware or the MTA should be sent to st & mh who
will forward their recommendations to our Sun contacts. Other failed message
delivery notices can go in the previously filed issues as well.
Comment 48 grsingleton 2004-02-17 21:43:49 UTC
This issue is not about replacing anything. That was a suggestion as existing
software was failing badly. Close this as invalid is silly. If you want to close
it close as resolved if, indeed, you feel it is resolve.
Comment 49 Unknown 2004-02-24 20:22:43 UTC
closing as resolved
Comment 50 grsingleton 2004-02-24 20:59:13 UTC
Good. And thanks
Comment 51 Unknown 2004-08-31 17:54:27 UTC
Closing this issue.