full-disclosure-uk May 2007 archive
Main Archive Page > Month Archives  > full-disclosure-uk archives
full-disclosure-uk: Re: [Full-disclosure] [WEB SECURITY] Re: noi

Re: [Full-disclosure] [WEB SECURITY] Re: noise about full-width encoding bypass?

From: Arian J. Evans <arian.evans_at_nospam>
Date: Tue May 22 2007 - 20:54:14 GMT
To: "Brian Eaton" <eaton.lists@gmail.com>, "Web Security" <websecurity@webappsec.org>, Full-Disclosure <full-disclosure@lists.grok.org.uk>

comments inline

On 5/22/07, Brian Eaton <eaton.lists@gmail.com> wrote:
> What surprises me is that not all codepage conversion libraries are
> doing the same thing with this data. I've tested a few, and some of
> them are canonicalizing full-width unicode to ASCII equivalents, and
> others are not. Where we run into trouble is where one component
> doing input validation uses one technique for canonicalization, and
> another component trying to do the actual work is using a different
> technique. Figuring out exactly what different application platforms
> are doing would help to figure out how much of a problem this poses in
> the real world.
> Somebody ought to put together a test suite for this, just to see what
> different vendors have done.

Funny thing you should say that. :) That's one of the exact things we are working on, but specifically from a "software defect with security implications" perspective. What you really probably need are some unit-test type suites that ram home a huge charset in different encoding types and see what happens. I am focusing on testing a small subset of that (focusing on metacharacter transforms primarily) across a lot of software as efficiently as possible. Anyway...

This subject gets really confusing because people mean many different things when they say "encoding attack" or "encoding bypass". The two common meanings are:

  1. Obfuscation of the attack through encoding, e.g.-slipping by the hall monitor (e.g-K2's polymorphic shellcode in the IDS/AV static string-match days, etc.),
  2. Evading very specific input filters: Taking an attack that will get interpreted or executed by a specific parser and encoding in a way to evade specific input filters, but so it will reach the parser in an interpretable state.

#2 gets confusing because people myopically focus on the parser/interpreter that is the *target* of the attack, and debate that parser's ability to execute a
given input encoding type... Which many have nothing to do with intermediary functions & transforms performed on the attack data on the way to the target parser. Things like canonicalization or normalizing data for full-text searching
are examples of key intermediary transforms performed upon one's data.

This leaves the sub-points: 2.1 What parser are you targeting? 2.2 What encoding types will that parser interpret/execute? 2.3 What intermediary decoding/canonicalization steps will *all* software & hardware
involved in the transaction a priori the target parser take?

Sub-point 2.3 is a real bear, eh? People under-estimate this one. I've gotten
several direct inquiries from folks that usually ask some form of the question:

:: Which of these is responsible for the issue? ::

+ Is it the client? (Accept-Charset: ISO-8859-1,utf-8;q=0.7,*;q=0.7)

+ Is it the protocol? (Content-Type: text/html; charset=utf-8; charset=iso-8859-1, etc)

+ Is it the web server? (IIS Hex URL & Unicode decode/double-decode issues)

+ Is it the framework? (.NET's Hex canonicalization issue from 2005)

+ Is it the language? (glued together open source PHP crap; huge monolithic J2EE projects)

+ Is it our custom code? (insert random canonicalization library, add /random canonicalization step to your software for situational normalization issue you run into...but make it global for all data passing through those functions)

  1. Yes, Yes, Yes, etc.

One part of this, then, is clearly defining your target.

The other part is evaluating the transforms performed on your data, and the transforms & canonicalization your software is *capable* of. We can directly deduce this in some situations, I believe, given a valid data type and the ability to correlate output, but in some cases where we are targeting
a parser internal to the system (e.g.-SQL interpreter) this will have to be inferred by some state change, or context change, which is going to be very difficult to do in an automated fashion with any sense of reliability.

But, definitely, that problem is being worked on.

I think this is a classic case where run-time black-box analysis is essential.

There is simply no way a source code or controls audit or binary analysis is going to find the majority of issues in this case (when evaluating real-world
production software deployments), because they are usually the result of emergent behaviors of complex, glued-together systems with many different components (including even things like firewalls/IPS that may "fix" or "re-code"
protocols in transit, etc., assuming they really even understand the protocol). -- Arian Evans software security stuff "Diplomacy is the art of saying "Nice doggie" until you can find a rock." -- Will Rogers

_______________________________________________ Full-Disclosure - We believe in it. Charter: http://lists.grok.org.uk/full-disclosure-charter.html Hosted and sponsored by Secunia - http://secunia.com/