postfix-users October 2010 archive
Main Archive Page > Month Archives  > postfix-users archives
postfix-users: Re: Postfix locking up, not accepting connections

Re: Postfix locking up, not accepting connections / smtp not sending emails out

From: Ravindra Gupta // Viva <ravindra_at_nospam>
Date: Sat Oct 30 2010 - 09:03:00 GMT
To: Christian Rohmann <crohmann@netcologne.de>

Hiii,

How to improve postfix server sending speed, any parameters required in
main.cf file so confirm me.
My requirement is we will send minimum 500 to 600 email in less then 1
minutes.

Please give the suggestion

On Fri, Oct 29, 2010 at 9:05 PM, Christian Rohmann
<crohmann@netcologne.de>wrote:

> Hey postfix-users,
>
> we are currently analyzing very strange postfix behavior which I can
> only describe as lockup or freeze.
> Honestly we reached our abilities to gather more info and to find the
> root cause of this issue.
>
> You are my last hope obi wan ... eh Wietse ....
>
>
> --- Setup / Configuration ---
> We are running three VMware ESX 4.1 guests (multiple VMware hosts) with
> Debian Lenny amd64.
> Each machine has 8 virtual CPUs and 4GB of RAM. We are using the Debian
> lenny postfix package (2.5.5) and now moved up to 2.7.1 which we
> backported (problems remained unchanged).
> Each instance of postfix recieves about 200-300 mails/minute. The mail
> addresses of the incoming emails are checked using a relay_recipient_map
> (proxymap + mysql) are rewritten (domain is changed) and then relayed
> using a transport map towards a multi-A record containing two IPs.
>
>
> --- Lockup Issues ---
> The setup and configuration works like a charm for hours at a time and
> all of a sudden it stops working leading to two issues (not at the same
> time):
>
> 1) First issue was that suddenly smtp stopped delivering email to that
> mutli-A record. We noticed a few thousand emails in the active queue (I
> guess all emails where in the active queue by that time). We rule out
> problems with the destination servers since the remaining postfix
> instances still delivered mail during that time. Even the last
> submission of email from the now locked up postfix finished without
> issue. There are just no more tries to reach the destination. Postfix
> also stops to deliver to both destination IPs at the same time. There
> was no logging anymore, but from anvil giving us some statistics about
> connection rates.
>
> 2) The second issue, occurring more often is that smtpd stopped
> responding or doing anything acutally. Sometimes the number of processes
> went to max (500) at other occurences it just stayed at like 30-50 but
> symptoms where still the same.
>
> Looking at netstat all tcp connections to the smtpd processes went away
> after some time.We believe that clients trying to deliver email to us
> did disconnect due to reaching their timeout.
>
> Here is a sniplet of log from before and after the issue:
> --- connect before and after lockup ---
> Oct 29 12:57:35 mailserver postfix/smtpd[12457]: connect from
> newsletter.xxx.de[xxx.162.xxx.28]
> Oct 29 13:29:02 mailserver postfix/smtpd[12457]: 826C611FA8:
> client=newsletter.xxx.de[xxx.162.xxx.28]
> Oct 29 13:29:03 mailserver postfix/smtpd[12457]: lost connection after
> RCPT from newsletter.xxx.de[xxx.162.xxx.28]
> Oct 29 13:29:03 mailserver postfix/smtpd[12457]: disconnect from
> newsletter.xxx.de[xxx.162.xxx.28]
> --- / connect before and after lockup ---
> There is not a single line of log from smtpd processes while being in
> the locked up state. At 12:57:35 there is a last connect from a client,
> no more log happens until we run "postfix reload" (! a reload is
> sufficient) about half an hour later. Then the last few lines show that
> the smtpd process is "alive" again talking to the now already
> disconnected client.
>
> Also the master process is still there but not accepting anymore
> connections until we issue "postfix reload". Here is an strace of the
> master process:
> --- strace on master-pid while locked up ---
> 10648 alarm(333) = 307
> 10648 epoll_wait(13,
> --- / strace on master-pid while locked up ---
>
>
> Another strange thing is, that some of the last log entries contain
> "UNKNOWN" instead of the IP:
> --- smtpd connects towards UNKNOWN ---
> Oct 29 13:05:02 mailserver postfix/smtpd[14196]: connect from
> unknown[unknown]
> Oct 29 13:05:02 mailserver postfix/smtpd[14196]: NOQUEUE: reject:
> CONNECT from unknown[unknown]: 550 5.7.1 Client host rejected: cannot
> find your reverse hostname, [unknown]; proto=SMTP
> Oct 29 13:05:02 mailserver postfix/smtpd[14196]: too many errors after
> CONNECT from unknown[unknown]
> Oct 29 13:05:02 mailserver postfix/smtpd[14196]: disconnect from
> unknown[unknown]
> --- / smtpd connects towards UNKNOWN ---
>
> Anvil also gives statistics about this obscure client-IP "UNKNOWN":
> --- anvil statistics for UNKNOWN ---
> Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max
> connection rate 92/60s for (smtp:unknown) at Oct 29 13:05:02
> Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max
> connection count 1 for (smtp:unknown) at Oct 29 13:05:02
> Oct 29 13:14:57 mailserver postfix/anvil[9312]: statistics: max cache
> size 413 at Oct 29 13:05:02
> --- / anvil statistics for UNKNOWN ---
>
> During the lock up we still see the *:25 listening socket of the master
> process. When we tried to connect ourselves we just waited forever for a
> connection or even a SMTP prompt.
>
> New incoming connections are in state "SYN_RECV":
> --- netstat while locked up---
> Proto Recv-Q Send-Q Local Address Foreign Address State
> tcp 0 0 x.x.x.143:25 xx.0.xx.217:58556 SYN_RECV
> tcp 0 0 x.x.x.143:25 xx.115.xx.166:46355 SYN_RECV
> tcp 0 0 x.x.x.143:25 xx.63.xx.197:37216 SYN_RECV
> .
> .
> .
> ---/ netstat while locked up---
>
>
> Again a "postfix reload" does fix the problem (for a few hours) and
> postfix delivers the > 10,000 emails in the active queue in a few
> seconds towards the two IPs from the DNS round robin and everything goes
> back to normal for a few hours.
>
>
>
> I'm hoping for suggestions on what to look for or check,
> TIA
>
> Christian
>

-- Incase of any further queries, Please feel free to mail me or contact me on the numbers provided below. Thanks & Regards, Ravindra Gupta Asst Manager - Tech Support Viva Infomedia Pvt. Ltd. 242, Oshiwara Industrial Centre, New Link Road, Opp. Oshiwara Bus Depot, Goregaon West, Mumbai 400104. Direct: +91.22.40310353 Board: +91.22.40310310 Email: ravindra@vivaconnect.in Viva Infomedia: Awarded as Best SME (E-Commerce) at CNBC Emerging India Awards 2009