Insufficient Visual Distinction of Homoglyphs Presented to User
Description
Insufficient Visual Distinction of Homoglyphs Presented to User occurs when visually similar or identical characters (homoglyphs) are not clearly distinguished when displayed to users. Different Unicode characters can appear nearly identical or completely identical in many fonts—for example, the Latin letter "a" and Cyrillic "а", or the Latin "o" and Greek "ο". Attackers exploit this by registering domain names, creating usernames, or crafting content using characters that look identical to legitimate ones but are semantically different, enabling phishing, impersonation, and spoofing attacks.
Risk
This vulnerability enables various deceptive attacks. Internationalized Domain Name (IDN) homograph attacks use lookalike domains for phishing (е.g., "аpple.com" using Cyrillic). Username spoofing allows impersonation in chat systems, forums, or social media. Log forgery becomes possible when malicious entries appear to come from legitimate users. Source code files may contain malicious code hidden by similar-looking variable or function names. Email addresses can be spoofed to appear legitimate. The risk is particularly high because humans cannot visually distinguish many homoglyph pairs, even when examining them carefully.
Solution
Implement visual disambiguation for displayed content in security-sensitive contexts. Use fonts that clearly distinguish homoglyphs where possible. For domain names, use Punycode display (e.g., "xn--...") or show a warning for IDN domains. Implement normalization and confusable character detection. Restrict character sets in usernames and identifiers to prevent mixing scripts. Display Unicode script information alongside text in security contexts. Use syntax highlighting that distinguishes character scripts. Implement browser-level protections for IDN homograph attacks. Alert users when confusable characters are detected.
Common Consequences
| Impact | Details |
|---|---|
| Integrity | Scope: Integrity Modify Application Data - Attackers can forge log entries or inject malicious content that appears legitimate. |
| Confidentiality | Scope: Confidentiality Read Application Data - Phishing attacks using homograph domains can steal credentials and sensitive information. |
| Other | Scope: Other Other - Users may be deceived into trusting malicious content, URLs, or identities. |
Example Code
Vulnerable Code
<!-- Vulnerable: Browser displaying IDN without warning -->
<!-- User sees: https://www.аpple.com (looks like apple.com) -->
<!-- Actual domain uses Cyrillic 'а' (U+0430) not Latin 'a' (U+0061) -->
<!-- Vulnerable: Link with homograph domain -->
<a href="https://www.аpple.com/login">Login to Apple</a>
<!-- User thinks they're going to Apple, but it's a phishing site -->
# Vulnerable: Chat application without homoglyph detection
class VulnerableChat:
def display_message(self, username, message):
# Vulnerable: No homoglyph detection
# Attacker uses "аdmin" (Cyrillic 'а') instead of "admin"
return f"{username}: {message}"
def display_user_list(self, users):
# Users "admin" and "аdmin" appear identical
# but are different accounts
for user in users:
print(user.name)
# Vulnerable: Log display without character distinction
class VulnerableLogViewer:
def display_log_entry(self, entry):
# Vulnerable: Can't distinguish homoglyphs in logs
# Entry from "аdmin" looks like it's from "admin"
print(f"[{entry.timestamp}] {entry.user}: {entry.action}")
# Attacker creates account "аdmin" (Cyrillic)
# Performs malicious actions that appear in logs as "admin"
// Vulnerable: URL validation that doesn't detect homoglyphs
function vulnerableIsValidDomain(domain) {
// Vulnerable: Only checks pattern, not character scripts
const pattern = /^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
return pattern.test(domain);
// "аpple.com" passes because Cyrillic 'а' isn't detected
}
// Vulnerable: Source code with hidden homoglyphs
// This code appears to compare string correctly, but doesn't
int vulnerableCheckAdmin(const char *username) {
// This looks like "admin" but uses Cyrillic characters
// Compiler treats them as different strings
if (strcmp(username, "аdmin") == 0) { // Cyrillic 'а'
return 1; // Grant admin access
}
// Real admin check with Latin "admin"
if (strcmp(username, "admin") == 0) {
return 1;
}
return 0;
}
// Attacker registers "аdmin" account and bypasses checks
Fixed Code
// Fixed: Browser extension for IDN homograph detection
class HomoglyphDetector {
constructor() {
// Confusable character mappings
this.confusables = {
'\u0430': 'a', // Cyrillic а -> Latin a
'\u0435': 'e', // Cyrillic е -> Latin e
'\u043E': 'o', // Cyrillic о -> Latin o
'\u0440': 'p', // Cyrillic р -> Latin p
'\u0441': 'c', // Cyrillic с -> Latin c
'\u0445': 'x', // Cyrillic х -> Latin x
'\u0391': 'A', // Greek Alpha -> Latin A
'\u0392': 'B', // Greek Beta -> Latin B
'\u0395': 'E', // Greek Epsilon -> Latin E
// ... extensive mapping
};
}
detectHomoglyphs(text) {
const detected = [];
for (let i = 0; i < text.length; i++) {
const char = text[i];
if (this.confusables[char]) {
detected.push({
position: i,
char: char,
looksLike: this.confusables[char],
codePoint: char.codePointAt(0).toString(16)
});
}
}
return detected;
}
getScriptMixWarning(domain) {
const scripts = new Set();
for (const char of domain) {
const script = this.getCharacterScript(char);
if (script !== 'Common' && script !== 'Unknown') {
scripts.add(script);
}
}
if (scripts.size > 1) {
return `Warning: Domain uses mixed scripts (${Array.from(scripts).join(', ')})`;
}
return null;
}
displayDomainSafely(domain) {
const homoglyphs = this.detectHomoglyphs(domain);
const mixedScript = this.getScriptMixWarning(domain);
if (homoglyphs.length > 0 || mixedScript) {
// Show Punycode version instead
return {
display: this.toPunycode(domain),
warning: mixedScript || 'Domain contains confusable characters'
};
}
return { display: domain, warning: null };
}
}
# Fixed: Chat application with homoglyph detection
import unicodedata
class FixedChat:
def __init__(self):
# Mapping of confusable characters to their ASCII equivalents
self.confusables = self._load_confusables()
def normalize_for_comparison(self, text):
"""Normalize text to detect homoglyphs"""
# NFKC normalization converts compatible characters
normalized = unicodedata.normalize('NFKC', text)
# Replace known confusables
result = []
for char in normalized:
if char in self.confusables:
result.append(self.confusables[char])
else:
result.append(char)
return ''.join(result).lower()
def check_username_collision(self, new_username, existing_users):
"""Check if username is confusable with existing users"""
normalized_new = self.normalize_for_comparison(new_username)
for existing in existing_users:
normalized_existing = self.normalize_for_comparison(existing)
if normalized_new == normalized_existing:
return True, existing
return False, None
def display_message(self, username, message):
# Fixed: Detect and flag suspicious usernames
if self.contains_homoglyphs(username):
# Display with script indicators
annotated = self.annotate_scripts(username)
return f"{annotated} [Mixed Scripts]: {message}"
return f"{username}: {message}"
def contains_homoglyphs(self, text):
"""Check if text contains mixed Unicode scripts"""
scripts = set()
for char in text:
if char.isalpha():
script = self.get_script(char)
if script not in ('Common', 'Inherited'):
scripts.add(script)
return len(scripts) > 1
def annotate_scripts(self, text):
"""Add script annotations to characters"""
result = []
for char in text:
script = self.get_script(char)
if script not in ('Latin', 'Common', 'Inherited'):
result.append(f'{char}[{script[0]}]')
else:
result.append(char)
return ''.join(result)
# Fixed: Log viewer with homoglyph highlighting
from colorama import Fore, Style
class FixedLogViewer:
def __init__(self):
self.known_admins = {'admin', 'root', 'system'}
self.detector = HomoglyphDetector()
def display_log_entry(self, entry):
user = entry.user
# Fixed: Detect if username is confusable with known admins
normalized = self.detector.normalize(user)
if normalized in self.known_admins and user not in self.known_admins:
# Suspicious: looks like admin but isn't
highlighted = self._highlight_suspicious(user)
print(f"{Fore.RED}[SUSPICIOUS]{Style.RESET_ALL} "
f"[{entry.timestamp}] {highlighted}: {entry.action}")
print(f" Note: Username '{user}' appears similar to '{normalized}' "
f"but uses different characters")
# Show character codes
self._show_character_details(user)
else:
print(f"[{entry.timestamp}] {user}: {entry.action}")
def _highlight_suspicious(self, text):
result = []
for char in text:
if not char.isascii() or self.detector.is_confusable(char):
result.append(f"{Fore.RED}{char}{Style.RESET_ALL}")
else:
result.append(char)
return ''.join(result)
def _show_character_details(self, text):
for i, char in enumerate(text):
code = ord(char)
name = unicodedata.name(char, 'UNKNOWN')
print(f" Position {i}: '{char}' U+{code:04X} ({name})")
// Fixed: Domain validation with homoglyph detection
const punycode = require('punycode');
class SafeDomainValidator {
constructor() {
this.confusableMap = this.loadConfusables();
this.trustedDomains = new Set(['apple.com', 'google.com', 'microsoft.com']);
}
validateDomain(domain) {
const result = {
isValid: true,
warnings: [],
safeDisplay: domain
};
// Check if IDN (internationalized domain name)
if (this.hasNonASCII(domain)) {
result.warnings.push('Domain contains non-ASCII characters');
// Check for mixed scripts
if (this.hasMixedScripts(domain)) {
result.warnings.push('Domain uses mixed Unicode scripts');
result.safeDisplay = punycode.toASCII(domain);
}
// Check if confusable with trusted domain
const normalized = this.normalizeForComparison(domain);
if (this.trustedDomains.has(normalized)) {
result.isValid = false;
result.warnings.push(
`Domain is confusable with trusted domain: ${normalized}`
);
result.safeDisplay = `SUSPICIOUS: ${punycode.toASCII(domain)}`;
}
}
return result;
}
hasNonASCII(str) {
return /[^\x00-\x7F]/.test(str);
}
hasMixedScripts(str) {
const scripts = new Set();
for (const char of str) {
const script = this.getScript(char);
if (script !== 'Common') {
scripts.add(script);
}
}
return scripts.size > 1;
}
normalizeForComparison(domain) {
let normalized = domain.toLowerCase();
for (const [confusable, ascii] of Object.entries(this.confusableMap)) {
normalized = normalized.replace(new RegExp(confusable, 'g'), ascii);
}
return normalized;
}
}
// Usage in browser
function checkLinkSafety(url) {
const validator = new SafeDomainValidator();
const domain = new URL(url).hostname;
const result = validator.validateDomain(domain);
if (!result.isValid) {
showWarning(`Suspicious domain detected: ${result.safeDisplay}`);
return false;
}
if (result.warnings.length > 0) {
showInfo(result.warnings.join('\n'));
}
return true;
}
# Fixed: Username registration with homoglyph prevention
import unicodedata
import re
class SafeUsernameValidator:
def __init__(self):
# Only allow basic Latin characters
self.allowed_pattern = re.compile(r'^[a-zA-Z0-9_-]+$')
# Or define allowed Unicode categories/scripts
self.allowed_categories = {'Ll', 'Lu', 'Nd'} # lowercase, uppercase, digits
def validate_username(self, username):
"""Validate username doesn't contain homoglyphs"""
errors = []
# Option 1: Restrict to ASCII
if not self.allowed_pattern.match(username):
errors.append("Username must contain only letters, numbers, underscores, and hyphens")
# Option 2: Check for mixed scripts
if self.has_mixed_scripts(username):
errors.append("Username cannot mix different writing systems")
# Option 3: Check for known confusable characters
confusables = self.find_confusables(username)
if confusables:
chars = ', '.join(f"'{c}'" for c in confusables)
errors.append(f"Username contains confusable characters: {chars}")
# Check collision with existing usernames
normalized = self.skeleton(username)
# Query database for collision...
return len(errors) == 0, errors
def has_mixed_scripts(self, text):
scripts = set()
for char in text:
if char.isalpha():
script = self.get_script(char)
if script not in ('Common', 'Inherited'):
scripts.add(script)
return len(scripts) > 1
def skeleton(self, text):
"""Generate skeleton for confusable detection (UTS39)"""
# NFKD normalize, then map confusables, then lowercase
normalized = unicodedata.normalize('NFKD', text)
# Apply confusable mapping...
return normalized.lower()
def get_script(self, char):
"""Get Unicode script for character"""
# Simplified - use unicodedata or ICU for full implementation
code = ord(char)
if 0x0400 <= code <= 0x04FF:
return 'Cyrillic'
if 0x0370 <= code <= 0x03FF:
return 'Greek'
if 0x0041 <= code <= 0x007A:
return 'Latin'
return 'Unknown'
CVE Examples
- CVE-2013-7236: IDN homograph attack in browser URL display.
- CVE-2012-0584: Browser vulnerable to URL spoofing via IDN.
- CVE-2005-0233: Browser displaying IDN domains without warning.
- CVE-2017-5015: Chrome address bar spoofing using RTL characters.
Related CWEs
- CWE-451: User Interface (UI) Misrepresentation of Critical Information (parent)
- CWE-1021: Improper Restriction of Rendered UI Layers or Frames (related)
- CWE-346: Origin Validation Error (can lead to)
References
- MITRE Corporation. "CWE-1007: Insufficient Visual Distinction of Homoglyphs Presented to User." https://cwe.mitre.org/data/definitions/1007.html
- Unicode Technical Report #36. "Unicode Security Considerations."
- Unicode Technical Standard #39. "Unicode Security Mechanisms."