Improper Encoding or Escaping of Output
Weakness ID: 116 (Weakness Class)Status: Draft
+ Description

Description Summary

The software prepares a structured message for communication with another component, but encoding or escaping of the data is either missing or done incorrectly. As a result, the intended structure of the message is not preserved.

Extended Description

Improper encoding or escaping can allow attackers to change the commands that are sent to another component, inserting malicious commands instead.

Most software follows a certain protocol that uses structured messages for communication between components, such as queries or commands. These structured messages can contain raw data interspersed with metadata or control information. For example, "GET /index.html HTTP/1.1" is a structured message containing a command ("GET") with a single argument ("/index.html") and metadata about which protocol version is being used ("HTTP/1.1").

If an application uses attacker-supplied inputs to construct a structured message without properly encoding or escaping, then the attacker could insert special characters that will cause the data to be interpreted as control information or metadata. Consequently, the component that receives the output will perform the wrong operations, or otherwise interpret the data incorrectly.

+ Alternate Terms
Output Sanitization
Output Validation
Output Encoding
+ Terminology Notes

The usage of the "encoding" and "escaping" terms varies widely. For example, in some programming languages, the terms are used interchangeably, while other languages provide APIs that use both terms for different tasks. This overlapping usage extends to the Web, such as the "escape" JavaScript function whose purpose is stated to be encoding. Of course, the concepts of encoding and escaping predate the Web by decades. Given such a context, it is difficult for CWE to adopt a consistent vocabulary that will not be misinterpreted by some constituency.

+ Time of Introduction
  • Architecture and Design
  • Implementation
  • Operation
+ Applicable Platforms

Languages

All

Technology Classes

Database-Server: (Often)

Web-Server: (Often)

+ Common Consequences
ScopeEffect
Integrity
Confidentiality
Authorization

The communications between components can be modified in unexpected ways. Unexpected commands can be executed, bypassing other security mechanisms. Incoming data can be misinterpreted

+ Likelihood of Exploit

Very High

+ Detection Methods

Automated Static Analysis

This weakness can often be detected using automated static analysis tools. Many modern tools use data flow analysis or constraint-based techniques to minimize the number of false positives.

Effectiveness: Moderate

This is not a perfect solution, since 100% accuracy and coverage are not feasible.

Automated Dynamic Analysis

This weakness can be detected using dynamic tools and techniques that interact with the software using large test suites with many diverse inputs, such as fuzz testing (fuzzing), robustness testing, and fault injection. The software's operation may slow down, but it should not become unstable, crash, or generate incorrect results.

+ Demonstrative Examples

Example 1

Here a value read from an HTML form parameter is reflected back to the client browser without having been encoded prior to output.

(Bad Code)
Example Language: JSP 
<% String email = request.getParameter("email"); %>
...
Email Address: <%= email %>

Example 2

Consider a chat application in which a front-end web application communicates with a back-end server. The back-end is legacy code that does not perform authentication or authorization, so the front-end must implement it. The chat protocol supports two commands, SAY and BAN, although only administrators can use the BAN command. Each argument must be separated by a single space. The raw inputs are URL-encoded. The messaging protocol allows multiple commands to be specified on the same line if they are separated by a "|" character.

Back End: Command Processor Code

(Bad Code)
Example Language: Perl 
$inputString = readLineFromFileHandle($serverFH);
# generate an array of strings separated by the "|" character.
@commands = split(/\|/, $inputString);
foreach $cmd (@commands) {
# separate the operator from its arguments based on a single whitespace
($operator, $args) = split(/ /, $cmd, 2);
$args = UrlDecode($args);
if ($operator eq "BAN") {
ExecuteBan($args);
}
elsif ($operator eq "SAY") {
ExecuteSay($args);
}
}

Front End: Web Application

In this code, the web application receives a command, encodes it for sending to the server, performs the authorization check, and sends the command to the server.

(Bad Code)
Example Language: Perl 
$inputString = GetUntrustedArgument("command");
($cmd, $argstr) = split(/\s+/, $inputString, 2);
# removes extra whitespace and also changes CRLF's to spaces
$argstr =~ s/\s+/ /gs;
$argstr = UrlEncode($argstr);
if (($cmd eq "BAN") && (! IsAdministrator($username))) {
die "Error: you are not the admin.\n";
}
# communicate with file server using a file handle
$fh = GetServerFileHandle("myserver");
print $fh "$cmd $argstr\n";

Diagnosis

It is clear that, while the protocol and back-end allow multiple commands to be sent in a single request, the front end only intends to send a single command. However, the UrlEncode function could leave the "|" character intact. If an attacker provides:

(Attack)
 
SAY hello world|BAN user12

then the front end will see this is a "SAY" command, and the $argstr will look like "hello world | BAN user12". Since the command is "SAY", the check for the "BAN" command will fail, and the front end will send the URL-encoded command to the back end:

(Result)
 
SAY hello%20world|BAN%20user12

The back end, however, will treat these as two separate commands:

(Result)
 
SAY hello world
BAN user12

Notice, however, that if the front end properly encodes the "|" with "%7C", then the back end will only process a single command.

Example 3

This example takes user input, passes it through an encoding scheme and then creates a directory specified by the user.

(Bad Code)
Example Language: Perl 
sub GetUntrustedInput {
return($ARGV[0]);
}

sub encode {
my($str) = @_;
$str =~ s/\&/\&amp;/gs;
$str =~ s/\"/\&quot;/gs;
$str =~ s/\'/\&apos;/gs;
$str =~ s/\</\&lt;/gs;
$str =~ s/\>/\&gt;/gs;
return($str);
}

sub doit {
my $uname = encode(GetUntrustedInput("username"));
print "<b>Welcome, $uname!</b><p>\n";
system("cd /home/$uname; /bin/ls -l");
}

The programmer attempts to encode dangerous characters, however the blacklist for encoding is incomplete (CWE-184) and an attacker can still pass a semicolon, resulting in a chain with command injection (CWE-77).

Additionally, the encoding routine is used inappropriately with command execution. An attacker doesn't even need to insert their own semicolon. The attacker can instead leverage the encoding routine to provide the semicolon to separate the commands. If an attacker supplies a string of the form:

(Attack)
 
' pwd

then the program will encode the apostrophe and insert the semicolon, which functions as a command separator when passed to the system function. This allows the attacker to complete the command injection.

+ Observed Examples
ReferenceDescription
CVE-2008-4636OS command injection in backup software using shell metacharacters in a filename; correct behavior would require that this filename could not be changed.
CVE-2008-0769Web application does not set the charset when sending a page to a browser, allowing for XSS exploitation when a browser chooses an unexpected encoding.
CVE-2008-0005Program does not set the charset when sending a page to a browser, allowing for XSS exploitation when a browser chooses an unexpected encoding.
CVE-2008-5573SQL injection via password parameter; a strong password might contain "&"
CVE-2008-3773Cross-site scripting in chat application via a message subject, which normally might contain "&" and other XSS-related characters.
CVE-2008-0757Cross-site scripting in chat application via a message, which normally might be allowed to contain arbitrary content.
+ Potential Mitigations

Phase: Architecture and Design

Use languages, libraries, or frameworks that make it easier to generate properly encoded output.

Examples include the ESAPI Encoding control.

Alternately, use built-in functions, but consider using wrappers in case those functions are discovered to have a vulnerability.

Phase: Architecture and Design

Strategy: Parameterization

If available, use structured mechanisms that automatically enforce the separation between data and code. These mechanisms may be able to provide the relevant quoting, encoding, and validation automatically, instead of relying on the developer to provide this capability at every point where output is generated.

For example, stored procedures can enforce database query structure and reduce the likelihood of SQL injection.

Phases: Architecture and Design; Implementation

Understand the context in which your data will be used and the encoding that will be expected. This is especially important when transmitting data between different components, or when generating outputs that can contain multiple encodings at the same time, such as web pages or multi-part mail messages. Study all expected communication protocols and data representations to determine the required encoding strategies.

Phase: Architecture and Design

In some cases, input validation may be an important strategy when output encoding is not a complete solution. For example, you may be providing the same output that will be processed by multiple consumers that use different encodings or representations. In other cases, you may be required to allow user-supplied input to contain control information, such as limited HTML tags that support formatting in a wiki or bulletin board. When this type of requirement must be met, use an extremely strict whitelist to limit which control sequences can be used. Verify that the resulting syntactic structure is what you expect. Use your normal encoding methods for the remainder of the input.

Phase: Architecture and Design

Use input validation as a defense-in-depth measure to reduce the likelihood of output encoding errors (see CWE-20).

Phase: Requirements

Fully specify which encodings are required by components that will be communicating with each other.

Phase: Implementation

When exchanging data between components, ensure that both components are using the same character encoding. Ensure that the proper encoding is applied at each interface. Explicitly set the encoding you are using whenever the protocol allows you to do so.

+ Relationships
NatureTypeIDNameView(s) this relationship pertains toView(s)
ChildOfCategoryCategory19Data Handling
Development Concepts (primary)699
ChildOfWeakness ClassWeakness Class707Improper Enforcement of Message or Data Structure
Research Concepts (primary)1000
ChildOfCategoryCategory7512009 Top 25 - Insecure Interaction Between Components
Weaknesses in the 2009 CWE/SANS Top 25 Most Dangerous Programming Errors (primary)750
CanPrecedeWeakness ClassWeakness Class74Failure to Sanitize Data into a Different Plane ('Injection')
Research Concepts1000
ParentOfWeakness BaseWeakness Base117Improper Output Sanitization for Logs
Development Concepts (primary)699
Research Concepts (primary)1000
ParentOfWeakness VariantWeakness Variant644Improper Sanitization of HTTP Headers for Scripting Syntax
Development Concepts (primary)699
Research Concepts (primary)1000
+ Relationship Notes

This weakness is primary to all weaknesses related to injection (CWE-74) since the inherent nature of injection involves the violation of structured messages.

CWE-116 and CWE-20 have a close association because, depending on the nature of the structured message, proper input validation can indirectly prevent special characters from changing the meaning of a structured message. For example, by validating that a numeric ID field should only contain the 0-9 characters, the programmer effectively prevents injection attacks.

However, input validation is not always sufficient, especially when less stringent data types must be supported, such as free-form text. Consider a SQL injection scenario in which a last name is inserted into a query. The name "O'Reilly" would likely pass the validation step since it is a common last name in the English language. However, it cannot be directly inserted into the database because it contains the "'" apostrophe character, which would need to be escaped or otherwise handled. In this case, stripping the apostrophe might reduce the risk of SQL injection, but it would produce incorrect behavior because the wrong name would be recorded.

+ Research Gaps

While many published vulnerabilities are related to insufficient output encoding, there is such an emphasis on input validation as a protection mechanism that the underlying causes are rarely described. Within CVE, the focus is primarily on well-understood issues like cross-site scripting and SQL injection. It is likely that this weakness frequently occurs in custom protocols that support multiple encodings, which are not necessarily detectable with automated techniques.

+ Theoretical Notes

This is a data/directive boundary error in which data boundaries are not sufficiently enforced before it is sent to a different control sphere.

+ Taxonomy Mappings
Mapped Taxonomy NameNode IDFitMapped Node Name
WASC22Improper Output Handling
+ Related Attack Patterns
CAPEC-IDAttack Pattern Name
(CAPEC Version: 1.4)
73User-Controlled Filename
85Client Network Footprinting (using AJAX/XSS)
86Embedding Script (XSS ) in HTTP Headers
18Embedding Scripts in Nonscript Elements
63Simple Script Injection
81Web Logs Tampering
104Cross Zone Scripting
+ References
"OWASP Enterprise Security API (ESAPI) Project". <http://www.owasp.org/index.php/ESAPI>.
Jeremiah Grossman. "Input validation or output filtering, which is better?". <http://jeremiahgrossman.blogspot.com/2007/01/input-validation-or-output-filtering.html>.
Joshbw. "Output Sanitization". 2008-09-18. <http://www.analyticalengine.net/archives/58>.
Niyaz PK. "Sanitizing user data: How and where to do it". 2008-09-11. <http://www.diovo.com/2008/09/sanitizing-user-data-how-and-where-to-do-it/>.
Jeremiah Grossman. "Input validation or output filtering, which is better?". 2007-01-30. <http://jeremiahgrossman.blogspot.com/2007/01/input-validation-or-output-filtering.html>.
Jim Manico. "Input Validation - Not That Important". 2008-08-10. <http://manicode.blogspot.com/2008/08/input-validation-not-that-important.html>.
Michael Eddington. "Preventing XSS with Correct Output Encoding". <http://phed.org/2008/05/19/preventing-xss-with-correct-output-encoding/>.
[REF-11] M. Howard and D. LeBlanc. "Writing Secure Code". Chapter 11, "Canonical Representation Issues" Page 363. 2nd Edition. Microsoft. 2002.
+ Content History
Modifications
Modification DateModifierOrganizationSource
2008-07-01Sean EidemillerCigitalExternal
added/updated demonstrative examples
2008-07-01Eric DalciCigitalExternal
updated Time of Introduction
2008-09-08CWE Content TeamMITREInternal
updated Name, Relationships
2009-01-12CWE Content TeamMITREInternal
updated Alternate Terms, Applicable Platforms, Common Consequences, Demonstrative Examples, Description, Likelihood of Exploit, Name, Observed Examples, Potential Mitigations, References, Relationship Notes, Relationships, Research Gaps, Terminology Notes, Theoretical Notes
2009-03-10CWE Content TeamMITREInternal
updated Description, Potential Mitigations
2009-05-27CWE Content TeamMITREInternal
updated Related Attack Patterns
2009-07-27CWE Content TeamMITREInternal
updated Demonstrative Examples
2009-10-29CWE Content TeamMITREInternal
updated Relationships
2009-12-28CWE Content TeamMITREInternal
updated Demonstrative Examples, Potential Mitigations
Previous Entry Names
Change DatePrevious Entry Name
2008-04-11Output Validation
2008-09-09Incorrect Output Sanitization
2009-01-12Insufficient Output Sanitization