Regular Expression without Anchors
Weakness ID: 777 (Weakness Variant)Status: Incomplete
+ Description

Description Summary

The software uses a regular expression to perform sanitization, but the regular expression is not anchored and may allow malicious or malformed data to slip through.

Extended Description

When performing tasks such as whitelist validation, data is examined and possibly modified to ensure that it is well-formed and adheres to a list of safe values. If the regular expression is not anchored, malicious or malformed data may be included before or after any string matching the regular expression. The type of malicious data that is allowed will depend on the context of the application and which anchors are omitted from the regular expression.

+ Time of Introduction
  • Implementation
+ Common Consequences
ScopeEffect
Availability
Confidentiality
Integrity

An unanchored regular expression in the context of a whitelist will possibly result in a protection mechanism failure, allowing malicious or malformed data to enter trusted regions of the program. The specific consequences will depend on what functionality the whitelist was protecting.

+ Likelihood of Exploit

Low to Medium

+ Demonstrative Examples

Example 1

Consider a web application that supports multiple languages. It selects messages for an appropriate language by using the lang parameter.

(Bad Code)
Example Language: PHP 
$dir = "/home/cwe/languages";
$lang = $_GET['lang'];
if (preg_match("/[A-Za-z0-9]+/", $lang)) {
include("$dir/$lang");
}
else {
echo "You shall not pass!\n";
}

The previous code attempts to match only alphanumeric values so that language values such as "english" and "french" are valid while also protecting against path traversal, CWE-22. However, the regular expression anchors are omitted, so any text containing at least one alphanumeric character will now pass the validation step. For example, the attack string below will match the regular expression.

(Attack)
 
../../etc/passwd

If the attacker can inject code sequences into a file, such as the web server's HTTP request log, then the attacker may be able to redirect the lang parameter to the log file and execute arbitrary code.

+ Potential Mitigations

Phase: Implementation

Be sure to understand both what will be matched and what will not be matched by a regular expression. Anchoring the ends of the expression will allow the programmer to define a whitelist strictly limited to what is matched by the text in the regular expression. If you are using a package that only matches one line by default, ensure that you can match multi-line inputs if necessary.

+ Background Details

Regular expressions are typically used to match a pattern of text. Anchors are used in regular expressions to specify where the pattern should match: at the beginning, the end, or both (the whole input).

+ Relationships
NatureTypeIDNameView(s) this relationship pertains toView(s)
ChildOfWeakness BaseWeakness Base625Permissive Regular Expression
Development Concepts (primary)699
Research Concepts (primary)1000
+ Content History
Submissions
Submission DateSubmitterOrganizationSource
2009-06-30Internal CWE Team