Insufficient Visual Distinction of Homoglyphs Presented to User

Description

Insufficient Visual Distinction of Homoglyphs Presented to User occurs when visually similar or identical characters (homoglyphs) are not clearly distinguished when displayed to users. Different Unicode characters can appear nearly identical or completely identical in many fonts—for example, the Latin letter "a" and Cyrillic "а", or the Latin "o" and Greek "ο". Attackers exploit this by registering domain names, creating usernames, or crafting content using characters that look identical to legitimate ones but are semantically different, enabling phishing, impersonation, and spoofing attacks.

Risk

This vulnerability enables various deceptive attacks. Internationalized Domain Name (IDN) homograph attacks use lookalike domains for phishing (е.g., "аpple.com" using Cyrillic). Username spoofing allows impersonation in chat systems, forums, or social media. Log forgery becomes possible when malicious entries appear to come from legitimate users. Source code files may contain malicious code hidden by similar-looking variable or function names. Email addresses can be spoofed to appear legitimate. The risk is particularly high because humans cannot visually distinguish many homoglyph pairs, even when examining them carefully.

Solution

Implement visual disambiguation for displayed content in security-sensitive contexts. Use fonts that clearly distinguish homoglyphs where possible. For domain names, use Punycode display (e.g., "xn--...") or show a warning for IDN domains. Implement normalization and confusable character detection. Restrict character sets in usernames and identifiers to prevent mixing scripts. Display Unicode script information alongside text in security contexts. Use syntax highlighting that distinguishes character scripts. Implement browser-level protections for IDN homograph attacks. Alert users when confusable characters are detected.

Common Consequences

ImpactDetails
IntegrityScope: Integrity

Modify Application Data - Attackers can forge log entries or inject malicious content that appears legitimate.
ConfidentialityScope: Confidentiality

Read Application Data - Phishing attacks using homograph domains can steal credentials and sensitive information.
OtherScope: Other

Other - Users may be deceived into trusting malicious content, URLs, or identities.

Example Code

Vulnerable Code

<!-- Vulnerable: Browser displaying IDN without warning -->
<!-- User sees: https://www.аpple.com (looks like apple.com) -->
<!-- Actual domain uses Cyrillic 'а' (U+0430) not Latin 'a' (U+0061) -->

<!-- Vulnerable: Link with homograph domain -->
<a href="https://www.аpple.com/login">Login to Apple</a>
<!-- User thinks they're going to Apple, but it's a phishing site -->
# Vulnerable: Chat application without homoglyph detection
class VulnerableChat:

    def display_message(self, username, message):
        # Vulnerable: No homoglyph detection
        # Attacker uses "аdmin" (Cyrillic 'а') instead of "admin"
        return f"{username}: {message}"

    def display_user_list(self, users):
        # Users "admin" and "аdmin" appear identical
        # but are different accounts
        for user in users:
            print(user.name)
# Vulnerable: Log display without character distinction
class VulnerableLogViewer:

    def display_log_entry(self, entry):
        # Vulnerable: Can't distinguish homoglyphs in logs
        # Entry from "аdmin" looks like it's from "admin"
        print(f"[{entry.timestamp}] {entry.user}: {entry.action}")

    # Attacker creates account "аdmin" (Cyrillic)
    # Performs malicious actions that appear in logs as "admin"
// Vulnerable: URL validation that doesn't detect homoglyphs
function vulnerableIsValidDomain(domain) {
    // Vulnerable: Only checks pattern, not character scripts
    const pattern = /^[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$/;
    return pattern.test(domain);

    // "аpple.com" passes because Cyrillic 'а' isn't detected
}
// Vulnerable: Source code with hidden homoglyphs
// This code appears to compare string correctly, but doesn't

int vulnerableCheckAdmin(const char *username) {
    // This looks like "admin" but uses Cyrillic characters
    // Compiler treats them as different strings
    if (strcmp(username, "аdmin") == 0) {  // Cyrillic 'а'
        return 1;  // Grant admin access
    }

    // Real admin check with Latin "admin"
    if (strcmp(username, "admin") == 0) {
        return 1;
    }

    return 0;
}
// Attacker registers "аdmin" account and bypasses checks

Fixed Code

// Fixed: Browser extension for IDN homograph detection
class HomoglyphDetector {

    constructor() {
        // Confusable character mappings
        this.confusables = {
            '\u0430': 'a',  // Cyrillic а -> Latin a
            '\u0435': 'e',  // Cyrillic е -> Latin e
            '\u043E': 'o',  // Cyrillic о -> Latin o
            '\u0440': 'p',  // Cyrillic р -> Latin p
            '\u0441': 'c',  // Cyrillic с -> Latin c
            '\u0445': 'x',  // Cyrillic х -> Latin x
            '\u0391': 'A',  // Greek Alpha -> Latin A
            '\u0392': 'B',  // Greek Beta -> Latin B
            '\u0395': 'E',  // Greek Epsilon -> Latin E
            // ... extensive mapping
        };
    }

    detectHomoglyphs(text) {
        const detected = [];
        for (let i = 0; i < text.length; i++) {
            const char = text[i];
            if (this.confusables[char]) {
                detected.push({
                    position: i,
                    char: char,
                    looksLike: this.confusables[char],
                    codePoint: char.codePointAt(0).toString(16)
                });
            }
        }
        return detected;
    }

    getScriptMixWarning(domain) {
        const scripts = new Set();
        for (const char of domain) {
            const script = this.getCharacterScript(char);
            if (script !== 'Common' && script !== 'Unknown') {
                scripts.add(script);
            }
        }

        if (scripts.size > 1) {
            return `Warning: Domain uses mixed scripts (${Array.from(scripts).join(', ')})`;
        }
        return null;
    }

    displayDomainSafely(domain) {
        const homoglyphs = this.detectHomoglyphs(domain);
        const mixedScript = this.getScriptMixWarning(domain);

        if (homoglyphs.length > 0 || mixedScript) {
            // Show Punycode version instead
            return {
                display: this.toPunycode(domain),
                warning: mixedScript || 'Domain contains confusable characters'
            };
        }

        return { display: domain, warning: null };
    }
}
# Fixed: Chat application with homoglyph detection
import unicodedata

class FixedChat:

    def __init__(self):
        # Mapping of confusable characters to their ASCII equivalents
        self.confusables = self._load_confusables()

    def normalize_for_comparison(self, text):
        """Normalize text to detect homoglyphs"""
        # NFKC normalization converts compatible characters
        normalized = unicodedata.normalize('NFKC', text)

        # Replace known confusables
        result = []
        for char in normalized:
            if char in self.confusables:
                result.append(self.confusables[char])
            else:
                result.append(char)

        return ''.join(result).lower()

    def check_username_collision(self, new_username, existing_users):
        """Check if username is confusable with existing users"""
        normalized_new = self.normalize_for_comparison(new_username)

        for existing in existing_users:
            normalized_existing = self.normalize_for_comparison(existing)
            if normalized_new == normalized_existing:
                return True, existing

        return False, None

    def display_message(self, username, message):
        # Fixed: Detect and flag suspicious usernames
        if self.contains_homoglyphs(username):
            # Display with script indicators
            annotated = self.annotate_scripts(username)
            return f"{annotated} [Mixed Scripts]: {message}"

        return f"{username}: {message}"

    def contains_homoglyphs(self, text):
        """Check if text contains mixed Unicode scripts"""
        scripts = set()
        for char in text:
            if char.isalpha():
                script = self.get_script(char)
                if script not in ('Common', 'Inherited'):
                    scripts.add(script)

        return len(scripts) > 1

    def annotate_scripts(self, text):
        """Add script annotations to characters"""
        result = []
        for char in text:
            script = self.get_script(char)
            if script not in ('Latin', 'Common', 'Inherited'):
                result.append(f'{char}[{script[0]}]')
            else:
                result.append(char)
        return ''.join(result)
# Fixed: Log viewer with homoglyph highlighting
from colorama import Fore, Style

class FixedLogViewer:

    def __init__(self):
        self.known_admins = {'admin', 'root', 'system'}
        self.detector = HomoglyphDetector()

    def display_log_entry(self, entry):
        user = entry.user

        # Fixed: Detect if username is confusable with known admins
        normalized = self.detector.normalize(user)

        if normalized in self.known_admins and user not in self.known_admins:
            # Suspicious: looks like admin but isn't
            highlighted = self._highlight_suspicious(user)
            print(f"{Fore.RED}[SUSPICIOUS]{Style.RESET_ALL} "
                  f"[{entry.timestamp}] {highlighted}: {entry.action}")
            print(f"    Note: Username '{user}' appears similar to '{normalized}' "
                  f"but uses different characters")

            # Show character codes
            self._show_character_details(user)
        else:
            print(f"[{entry.timestamp}] {user}: {entry.action}")

    def _highlight_suspicious(self, text):
        result = []
        for char in text:
            if not char.isascii() or self.detector.is_confusable(char):
                result.append(f"{Fore.RED}{char}{Style.RESET_ALL}")
            else:
                result.append(char)
        return ''.join(result)

    def _show_character_details(self, text):
        for i, char in enumerate(text):
            code = ord(char)
            name = unicodedata.name(char, 'UNKNOWN')
            print(f"    Position {i}: '{char}' U+{code:04X} ({name})")
// Fixed: Domain validation with homoglyph detection
const punycode = require('punycode');

class SafeDomainValidator {

    constructor() {
        this.confusableMap = this.loadConfusables();
        this.trustedDomains = new Set(['apple.com', 'google.com', 'microsoft.com']);
    }

    validateDomain(domain) {
        const result = {
            isValid: true,
            warnings: [],
            safeDisplay: domain
        };

        // Check if IDN (internationalized domain name)
        if (this.hasNonASCII(domain)) {
            result.warnings.push('Domain contains non-ASCII characters');

            // Check for mixed scripts
            if (this.hasMixedScripts(domain)) {
                result.warnings.push('Domain uses mixed Unicode scripts');
                result.safeDisplay = punycode.toASCII(domain);
            }

            // Check if confusable with trusted domain
            const normalized = this.normalizeForComparison(domain);
            if (this.trustedDomains.has(normalized)) {
                result.isValid = false;
                result.warnings.push(
                    `Domain is confusable with trusted domain: ${normalized}`
                );
                result.safeDisplay = `SUSPICIOUS: ${punycode.toASCII(domain)}`;
            }
        }

        return result;
    }

    hasNonASCII(str) {
        return /[^\x00-\x7F]/.test(str);
    }

    hasMixedScripts(str) {
        const scripts = new Set();
        for (const char of str) {
            const script = this.getScript(char);
            if (script !== 'Common') {
                scripts.add(script);
            }
        }
        return scripts.size > 1;
    }

    normalizeForComparison(domain) {
        let normalized = domain.toLowerCase();
        for (const [confusable, ascii] of Object.entries(this.confusableMap)) {
            normalized = normalized.replace(new RegExp(confusable, 'g'), ascii);
        }
        return normalized;
    }
}

// Usage in browser
function checkLinkSafety(url) {
    const validator = new SafeDomainValidator();
    const domain = new URL(url).hostname;
    const result = validator.validateDomain(domain);

    if (!result.isValid) {
        showWarning(`Suspicious domain detected: ${result.safeDisplay}`);
        return false;
    }

    if (result.warnings.length > 0) {
        showInfo(result.warnings.join('\n'));
    }

    return true;
}
# Fixed: Username registration with homoglyph prevention
import unicodedata
import re

class SafeUsernameValidator:

    def __init__(self):
        # Only allow basic Latin characters
        self.allowed_pattern = re.compile(r'^[a-zA-Z0-9_-]+$')

        # Or define allowed Unicode categories/scripts
        self.allowed_categories = {'Ll', 'Lu', 'Nd'}  # lowercase, uppercase, digits

    def validate_username(self, username):
        """Validate username doesn't contain homoglyphs"""

        errors = []

        # Option 1: Restrict to ASCII
        if not self.allowed_pattern.match(username):
            errors.append("Username must contain only letters, numbers, underscores, and hyphens")

        # Option 2: Check for mixed scripts
        if self.has_mixed_scripts(username):
            errors.append("Username cannot mix different writing systems")

        # Option 3: Check for known confusable characters
        confusables = self.find_confusables(username)
        if confusables:
            chars = ', '.join(f"'{c}'" for c in confusables)
            errors.append(f"Username contains confusable characters: {chars}")

        # Check collision with existing usernames
        normalized = self.skeleton(username)
        # Query database for collision...

        return len(errors) == 0, errors

    def has_mixed_scripts(self, text):
        scripts = set()
        for char in text:
            if char.isalpha():
                script = self.get_script(char)
                if script not in ('Common', 'Inherited'):
                    scripts.add(script)
        return len(scripts) > 1

    def skeleton(self, text):
        """Generate skeleton for confusable detection (UTS39)"""
        # NFKD normalize, then map confusables, then lowercase
        normalized = unicodedata.normalize('NFKD', text)
        # Apply confusable mapping...
        return normalized.lower()

    def get_script(self, char):
        """Get Unicode script for character"""
        # Simplified - use unicodedata or ICU for full implementation
        code = ord(char)
        if 0x0400 <= code <= 0x04FF:
            return 'Cyrillic'
        if 0x0370 <= code <= 0x03FF:
            return 'Greek'
        if 0x0041 <= code <= 0x007A:
            return 'Latin'
        return 'Unknown'

CVE Examples

  • CVE-2013-7236: IDN homograph attack in browser URL display.
  • CVE-2012-0584: Browser vulnerable to URL spoofing via IDN.
  • CVE-2005-0233: Browser displaying IDN domains without warning.
  • CVE-2017-5015: Chrome address bar spoofing using RTL characters.

  • CWE-451: User Interface (UI) Misrepresentation of Critical Information (parent)
  • CWE-1021: Improper Restriction of Rendered UI Layers or Frames (related)
  • CWE-346: Origin Validation Error (can lead to)

References

  1. MITRE Corporation. "CWE-1007: Insufficient Visual Distinction of Homoglyphs Presented to User." https://cwe.mitre.org/data/definitions/1007.html
  2. Unicode Technical Report #36. "Unicode Security Considerations."
  3. Unicode Technical Standard #39. "Unicode Security Mechanisms."