spamassassin-users April 2010 archive
Main Archive Page > Month Archives  > spamassassin-users archives
spamassassin-users: Re: FROM_STARTS_WITH_NUMS matches on text-to

Re: FROM_STARTS_WITH_NUMS matches on text-to-email

From: Jason Bertoch <jason_at_nospam>
Date: Mon Apr 12 2010 - 23:30:08 GMT
To: users@spamassassin.apache.org

On 4/12/2010 4:58 PM, Martin Gregorie wrote:
> I had quite a bit to do with phone numbers en mass a while back. My
> initial reaction is that its not easy: not only do phone numbers vary in
> length between locales, but even such things as the 'international
> dialing' and non-local-call prefix vary from country to country.
That is certainly true with all phone numbers, but I suspect it's not
for cell phone numbers using text-to-email. I don't have any non-US
examples to verify against, but it really wouldn't make sense for
providers to use international dialing codes in this case...at least not
a huge variety at any rate. I'm hoping that those in the non-US
community can contribute opinions. Maybe this problem isn't as complex
as it initially sounds.

On 4/12/2010 5:57 PM, Ted Mittelstaedt wrote:
> The fundamental flaw
> here is in the assumption that an all-number mailbox user ID is
> virtually certain to be spam. It is not. Clearly, the default score
> assignment to that rule is too high.

That could certainly be true and it may prove that doing the proposed
tests just aren't worth the CPU cycles. Only a test against the corpus
will say with any degree of certainty. Sadly, I don't have the perl
skills to make that judgment, hence my appeal to the community for
ideas, opinions, and possible code to test the theory.

/Jason