| Main Archive Page > Month Archives > spamassassin-users archives |
Adam Katz wrote:
> % grep html_text_match..comment 20_html_tests.cf
I hadn't known about that function until I saw Henrik's replies last
week, so it would have been hard to search for it.
> Any more that 512 chars isn't going to be helpful but will end up being
> computationally expensive (I've played with this idea). Also, I'd say
> this is more of a ham indicator than a spam indicator.
*shrug* I happen to be getting a wave of ~400K spams that consist of
about 1K of real HTML tags, loading the spam content via image from a
remote server, with the remainder of that 400K message consisting of
maybe four *very* long HTML comments (50K+) with nothing but gibberish
(groups of ~4-8 words, separated by /, ;, # and occasionally some other
symbol).
I've also seen gobs of mail with ~5K of CSS in an HTML comment - mostly
from Outlook. *eyeroll*
These are most of what's still getting through to *my* inbox, but with
~50K users I'd assume they're hitting other people as well.
Unfortunately, as an ISP sysadmin, my ability to get useful, timely
feedback from a high proportion of the userbase is... limited.
-kgd