|Main Archive Page > Month Archives > clamav-devel archives|
On Apr 27, 2010, at 7:19 AM, Török Edwin wrote:
> On 04/26/2010 10:20 PM, Mohammed Al-Saleh wrote:
>> Hi Edwin,
>> Thanks for your reply.
>> I need to know the cases where ClamAV has performance bottlenecks or issues.
> The best way to do that is by measuring it.
> Read the last part of this reply:
>> What kind of texts that could make ClamAV takes more time than usual.
> That question is hard to answer, since the signatures change each day,
> thus the AC trie changes, the prefiltering patterns change ...
>> Aho-Corasick and Boyer-Moore might have some situations that cause performance issue.
> There is also a prefiltering step now.
> You can search bugzilla on why it was introduced.
>> I might consider doing improvements or study performance impact.
> Don't expect it to be easy to make improvements.
> I spent quite a lot of time on the prefiltering step, and the problem is
> that some signatures falsely match a lot of times (like 'PE' from the PE
> signature), but the entire signature usually doesn't.
> So ClamAV has to stop the trie lookup, test the match, continue the trie
> lookup lots of times.
My understanding (please correct me if I am wrong) is that the first step in matching (let's ignore the filetype recognition and such) is the prefiltering step.
If the filter matches then further matching (using either AC or BM) is needed to make sure that it is not a false positive because the filter could contain more patterns than it should (and the filter matches at most 8 characters of the original signature so the other parts might not match).
I am not sure if I understand your point here and I really want to understand it:
"So ClamAV has to stop the trie lookup, test the match, continue the trie lookup lots of times."
Can you please explain this to me more?
If the filter matches but AC or BM does not, would we return back to the filter to continue from the point it matches?
> Although the actual test is "fast enough", if it happens a million times
> it does slow things down.
> Also the AC and BM are not "textbook" versions, they contain extensions
> (like wildcards).
> It is important that you study the performance with the actual
> signatures from main/daily.cvd, and on real files (both clean and infected).
>> Do you think that this could be a realistic problem to study?
> That depends if you have some specific ideas on how to improve AC/BM, or
> you just want to try improving it, and give up if its not possible.
> Best regards,
> Please submit your patches to our Bugzilla: http://bugs.clamav.net