Using Unicode Encoding to Bypass Validation Logic
Attack Pattern ID: 71 (Detailed Attack Pattern Completeness: Complete)Typical Severity: HighStatus: Draft
+ Description

Summary

An attacker may provide a unicode string to a system component that is not unicode aware and use that to circumvent the filter or cause the classifying mechanism to fail to properly understanding the request. That may allow the attacker to slip malicious data past the content filter and/or possibly cause the application to route the request incorrectly.

Attack Execution Flow

Explore
  1. Survey the application for user-controllable inputs:

    Using a browser or an automated tool, an attacker follows all public links and actions on a web site. He records all the links, the forms, the resources accessed and all other potential entry-points for the web application.

    Attack Step Techniques

    IDAttack Step Technique DescriptionEnvironments
    1

    Use a spidering tool to follow and record all links and analyze the web pages to find entry points. Make special note of any links that include parameters in the URL.

    env-Web
    2

    Use a proxy tool to record all user input entry points visited during a manual traversal of the web application.

    env-Web
    3

    Use a browser to manually explore the website and analyze how it is constructed. Many browsers' plugins are available to facilitate the analysis or automate the discovery.

    env-Web

    Indicators

    IDtypeIndicator DescriptionEnvironments
    1Positive

    Inputs are used by the application or the browser (DOM)

    env-Web
    2Inconclusive

    Using URL rewriting, parameters may be part of the URL path.

    env-Web
    3Inconclusive

    No parameters appear to be used on the current page. Even though none appear, the web application may still use them if they are provided.

    env-Web
    4Negative

    Applications that have only static pages or that simply present information without accepting input are unlikely to be susceptible.

    env-Web

    Outcomes

    IDtypeOutcome Description
    1Success
    A list of URLs, with their corresponding parameters (POST, GET, COOKIE, etc.) is created by the attacker.
    2Success
    A list of application user interface entry fields is created by the attacker.
    3Success
    A list of resources accessed by the application is created by the attacker.

    Security Controls

    IDtypeSecurity Control Description
    1Detective
    Monitor velocity of page fetching in web logs. Humans who view a page and select a link from it will click far slower and far less regularly than tools. Tools make requests very quickly and the requests are typically spaced apart regularly (e.g. 0.8 seconds between them).
    2Detective
    Create links on some pages that are visually hidden from web browsers. Using IFRAMES, images, or other HTML techniques, the links can be hidden from web browsing humans, but visible to spiders and programs. A request for the page, then, becomes a good predictor of an automated tool probing the application.
    3Preventative
    Use CAPTCHA to prevent the use of the application by an automated tool.
    4Preventative
    Actively monitor the application and either deny or redirect requests from origins that appear to be automated.
Experiment
  1. Probe entry points to locate vulnerabilities:

    The attacker uses the entry points gathered in the "Explore" phase as a target list and injects various Unicode encoded payloads to determine if an entry point actually represents a vulnerability with insufficient validation logic and to characterize the extent to which the vulnerability can be exploited.

    Attack Step Techniques

    IDAttack Step Technique DescriptionEnvironments
    1

    Try to use Unicode encoding of content in Scripts in order to bypass validation routines.

    env-Web
    2

    Try to use Unicode encoding of content in HTML in order to bypass validation routines.

    env-Web
    3

    Try to use Unicode encoding of content in CSS in order to bypass validation routines.

    env-Web

    Indicators

    IDtypeIndicator DescriptionEnvironments
    1Positive

    The application accepts user-controllable input.

    env-Web

    Outcomes

    IDtypeOutcome Description
    1Success
    The attacker's Unicode encoded payload is processed and acted on by the application without filtering or transcoding
    2Failure
    The application decodes the charset and filters the inputs.

    Security Controls

    IDtypeSecurity Control Description
    1Preventative
    Implement input validation routines that filter or transcode for Unicode content.
    2Preventative
    Specify the charset of the HTTP transaction/content.
    3Detective
    Monitor inputs to web servers. Alert on unusual charset and/or characters.
    4Preventative
    Actively monitor the application and either deny or redirect requests from origins that appear to be attack attempts.
+ Attack Prerequisites

Filtering is performed on data that has not be properly canonicalized.

+ Typical Likelihood of Exploit

Likelihood: Medium

+ Methods of Attack
  • Modification of Resources
  • API Abuse
  • Injection
+ Examples-Instances

Description

A very common technique for a unicode attack involves traversing directories looking for interesting files. An example of this idea applied to the Web is

http://target.server/some_directory/../../../winnt

In this case, the attacker is attempting to traverse to a directory that is not supposed to be part of standard Web services. The trick is fairly obvious, so many Web servers and scripts prevent it. However, using alternate encoding tricks, an attacker may be able to get around badly implemented request filters.

In October 2000, a hacker publicly revealed that Microsoft's IIS server suffered from a variation of this problem. In the case of IIS, all the attacker had to do was provide alternate encodings for the dots and/or slashes found in a classic attack. The unicode translations are

. yields C0 AE
/ yields C0 AF
\ yields C1 9C

Using this conversion, the previously displayed URL can be encoded as

http://target.server/some_directory/%C0AE/%C0AE/%C0AE%C0AE/%C0AE%C0AE/winnt

Related Vulnerabilities

CVE-2000-0884

+ Attacker Skills or Knowledge Required

Skill or Knowledge Level: Medium

An attacker needs to understand unicode encodings and have an idea (or be able to find out) what system components may not be unicode aware.

+ Indicators-Warnings of Attack

Unicode encoded data is passed to APIs where it is not expected

+ Solutions and Mitigations

Ensure that the system is Unicode aware and can properly process Unicode data. Do not make an assumption that data will be in ASCII.

Ensure that filtering or input validation is applied to canonical data.

Assume all input is malicious. Create a white list that defines all valid input to the software system based on the requirements specifications. Input that does not match against the white list should not be permitted to enter into the system.

+ Attack Motivation-Consequences
  • Privilege Escalation
  • Run Arbitrary Code
  • Data Modification
  • Denial of Service
+ Related Weaknesses
CWE-IDWeakness NameWeakness Relationship Type
176Failure to Handle Unicode EncodingTargeted
171Cleansing, Canonicalization, and Comparison ErrorsTargeted
179Incorrect Behavior Order: Early ValidationTargeted
180Incorrect Behavior Order: Validate Before CanonicalizeTargeted
173Failure to Handle Alternate EncodingTargeted
172Encoding ErrorTargeted
184Incomplete BlacklistTargeted
183Permissive WhitelistTargeted
74Failure to Sanitize Data into a Different Plane ('Injection')Targeted
20Improper Input ValidationTargeted
697Insufficient ComparisonTargeted
692Incomplete Blacklist to Cross-Site ScriptingTargeted
+ Related Attack Patterns
NatureTypeIDNameDescriptionView(s) this relationship pertains toView\(s\)
PeerOfAttack PatternAttack Pattern43Exploiting Multiple Input Interpretation Layers 
Mechanism of Attack1000
PeerOfAttack PatternAttack Pattern64Using Slashes and URL Encoding Combined to Bypass Validation Logic 
Mechanism of Attack1000
PeerOfAttack PatternAttack Pattern72URL Encoding 
Mechanism of Attack1000
PeerOfAttack PatternAttack Pattern79Using Slashes in Alternate Encoding 
Mechanism of Attack1000
ChildOfAttack PatternAttack Pattern267Leverage Alternate Encoding 
Mechanism of Attack (primary)1000
PeerOfAttack PatternAttack Pattern78Using Escaped Slashes in Alternate Encoding 
Mechanism of Attack1000
PeerOfAttack PatternAttack Pattern80Using UTF-8 Encoding to Bypass Validation Logic 
Mechanism of Attack1000
+ Relevant Security Requirements

Canonicalize data prior to performing any validation or filtering on it. Be aware of alternate encodings.

+ Purposes
  • Penetration
+ CIA Impact
Confidentiality Impact: MediumIntegrity Impact: HighAvailability Impact: Medium
+ Technical Context
Architectural Paradigms
All
Frameworks
All
Platforms
All
Languages
All
+ References
G. Hoglund and G. McGraw. "Exploiting Software: How to Break Code". Addison-Wesley. February 2004.

CWE - Input Validation

+ Content History
Submissions
SubmitterOrganizationDate
G. Hoglund and G. McGraw. Exploiting Software: How to Break Code. Addison-Wesley, February 2004.Cigital, Inc2007-03-01
Modifications
ModifierOrganizationDateComments
Eugene LebanidzeCigital, Inc2007-02-26Fleshed out content to CAPEC schema from the original descriptions in "Exploiting Software"
Sean BarnumCigital, Inc2007-03-05Review and revise
Richard StruseVOXEM, Inc2007-03-26Review and feedback leading to changes in Name, Related Attack Patterns
Sean BarnumCigital, Inc2007-04-13Modified pattern content according to review and feedback
Romain GaucherCigital, Inc2009-02-10Created draft content for detailed description
Sean BarnumCigital Federal, Inc2009-04-13Reviewed and revised content for detailed description