Executive Summary

Title Compilers permit Unicode control and homoglyph characters
Name VU#999008 First vendor Publication 2021-11-09
Vendor VU-CERT Last vendor Modification 2021-11-09
Severity (Vendor) N/A Revision M

Security-Database Scoring CVSS v3

Cvss vector : CVSS:3.1/AV:N/AC:H/PR:N/UI:R/S:C/C:H/I:H/A:H
Overall CVSS Score 8.3
Base Score 8.3 Environmental Score 8.3
impact SubScore 6 Temporal Score 8.3
Exploitabality Sub Score 1.6
Attack Vector Network Attack Complexity High
Privileges Required None User Interaction Required
Scope Changed Confidentiality Impact High
Integrity Impact High Availability Impact High
Calculate full CVSS 3.0 Vectors scores

Security-Database Scoring CVSS v2

Cvss vector : (AV:N/AC:H/Au:N/C:P/I:P/A:P)
Cvss Base Score 5.1 Attack Range Network
Cvss Impact Score 6.4 Attack Complexity High
Cvss Expoit Score 4.9 Authentication None Required
Calculate full CVSS 2.0 Vectors scores



Attacks that allow for unintended control of Unicode and homoglyphic characters, described by the researchers in this report leverage text encoding that may cause source code to be interpreted differently by a compiler than it appears visually to a human reviewer. Source code compilers, interpreters, and other development tools may permit Unicode control and homoglyph characters, changing the visually apparent meaning of source code.


Internationalized text encodings require support for both left-to-right languages and also right-to-left languages. Unicode has built-in functions to allow for encoding of characters to account for bi-directional, or Bidi ordering. Included in these functions are characters that represent non-visual functions. These characters, as well as characters from other human language sets (i.e., English vs. Cyrillic) can also introduce ambiguities into the code base if improperly used.

This type of attack could potentially be used to compromise a code base by capitalizing on a gap in visually rendered source code as a human reviewer would see and the raw code that the compiler would evaluate.


The use of attacks that incorporate maliciously encoded source code may go undetected by human developers and by many automated coding tools. These attacks also work against many of the compilers currently in use. An attacker with the ability to influence source code could introduce undetected ambiguity into source code using this type of attack.


The simplest defense is to ban the use of text directionality control characters both in language specifications and in compilers implementing these languages.

Two CVEs were assigned to address the two types of attacks described in this report.

CVE-2021-42574 was created for tracking the Bidi attack. CVE-2021-42694 was created for tracking the homoglyph attack.


Thanks to the reporters, Nicholas Boucher and Ross Anderson of The University of Cambridge (UK).

This document was written by Chuck Yarbrough.

Original Source

Url : https://kb.cert.org/vuls/id/999008

CWE : Common Weakness Enumeration

% Id Name
100 % CWE-94 Failure to Control Generation of Code ('Code Injection')

CPE : Common Platform Enumeration

Application 1
Application 1
Os 3

Alert History

If you want to see full details history, please login or register.
Date Informations
2022-01-08 00:17:45
  • Multiple Updates
2021-11-09 21:19:34
  • First insertion