postfix-users October 2010 archive
Main Archive Page > Month Archives  > postfix-users archives
postfix-users: Re: Postfix locking up, not accepting connections

Re: Postfix locking up, not accepting connections / smtp not sending emails out

From: Wietse Venema <wietse_at_nospam>
Date: Fri Oct 29 2010 - 20:35:21 GMT
To: Postfix users <postfix-users@postfix.org>

Christian Rohmann:
> Hey again,
>
> On 10/29/2010 07:23 PM, Wietse Venema wrote:
> > The main loop in the master is as follows:
> >
> > forever {
> > set an alarm for 1000s
> > do an EPOLL_WAIT for up to 500s and handle any child process
> > events, or short-term timer requests that are implemented
> > around the EPOLL_WAIT timer.
> > respond to sighup (the sighup flag is set by a signal handler)
> > respond to sigchld (the sigchld flag is set by a signal handler)
> > }
>
> Just now one machine had the issue again. I checked and saw that we
> where down to just two smtpd processes and even though master was still
> bound to port 25 no new connections where accepted. I did telnet to it,
> but the connection was not accepted and ran into timeout.

This means that the smtpd processes are hanging, the master is
hanging, or both.

At this point I will not speculate further until you report the
result of following the instructions in
http://www.postfix.org/DEBUG_README.html#logging

If I don't see a credible report about warnings etc. in Postfix
logfiles, then that means that you are flying blind, and that needs
to be addressed first.

The following is for background information only.

The master daemon watches the SMTP port only when all existing
smtpd processes have reported that they are busy (i.e. talking to
an SMTP client). Otherwise, some idle smtpd process will watch
the port.

When all smtpd processes have reported that they are busy, the
master starts a new smtpd processes in response to a new connection,
provided that the per-service process limit is not reached (otherwise
the master logs a warning that all ports are busy).

In your case, the two smtpd processes got stuck before sending the
"I am busy" message to the master daemon, so the master daemon
still believes that the two processes are idle. I don't know if
this has anything to do with broken virtual timers.

Regardless of why a process hangs, if it hangs then you should see
watchdog errors in the Postfix logs. If you don't see those then
either your virtual timer is busted, or your logging is busted.

        Wietse