Improper Validation of Unsafe Equivalence in Input

Description

Improper Validation of Unsafe Equivalence in Input occurs when a product accepts an input used as a resource identifier but fails to validate that it's equivalent to potentially dangerous values. Attackers can sometimes bypass input validation schemes by finding inputs that appear to be safe, but will be dangerous when processed at a lower layer or by a downstream component. A practical example: XSS filters using case-sensitive matching for <script> tags can be bypassed with <ScrIpT> since HTML rendering is case-insensitive.

Risk

Unsafe equivalence bypass has severe security implications. Input filters can be circumvented. XSS attacks may succeed. Path traversal possible. Access controls bypassable. Injection attacks enabled. Security policies ineffective. Denylists become useless. Downstream processing exploited.

Solution

Use "accept known good" validation: maintain strict allowlists of acceptable inputs conforming to specifications. Reject non-conforming data. Consider length, type, value ranges, syntax, and business logic. Normalize inputs before validation where appropriate. Use case-insensitive comparison for case-insensitive protocols. Denylists alone are insufficient - they cannot cover all equivalences.

Common Consequences

ImpactDetails
OtherScope: Other

Security bypass depending on exploitation method.
Access ControlScope: Access Control

Filters and restrictions can be bypassed.
IntegrityScope: Integrity

Malicious input may reach downstream components.

Example Code

Vulnerable Code

# Vulnerable: Case-sensitive XSS filter

import re

# VULNERABLE: Case-sensitive blacklist
DANGEROUS_TAGS = ['<script>', '</script>', '<iframe>', '<object>']

def vulnerable_xss_filter(input_html):
    """Filter XSS - VULNERABLE to case variation bypass."""
    result = input_html

    # VULNERABLE: Case-sensitive matching
    for tag in DANGEROUS_TAGS:
        result = result.replace(tag, '')

    return result

    # Attacker uses: <ScRiPt>alert(1)</sCrIpT>
    # Filter doesn't catch mixed case
    # Browser still executes it (HTML is case-insensitive)

# VULNERABLE: Case-sensitive file extension check
def vulnerable_file_upload(filename):
    """Check file extension - VULNERABLE to case bypass."""
    BLOCKED_EXTENSIONS = ['.exe', '.bat', '.cmd', '.ps1']

    # VULNERABLE: Case-sensitive comparison
    for ext in BLOCKED_EXTENSIONS:
        if filename.endswith(ext):
            return False

    return True  # Allow upload

    # Attacker uploads: malware.EXE or malware.Exe
    # Filter doesn't catch uppercase extensions

# VULNERABLE: URL encoding not considered
def vulnerable_path_check(path):
    """Check for directory traversal - VULNERABLE to encoding bypass."""
    # VULNERABLE: Only checks literal "../"
    if '../' in path:
        return False

    return True

    # Attacker uses: %2e%2e%2f (URL-encoded ../)
    # Or: ..%2f (partial encoding)
    # Or: ..\ (backslash on Windows)
// Vulnerable: JavaScript with equivalence issues

// VULNERABLE: Case-sensitive hostname check
function vulnerableCheckRedirect(url) {
    // VULNERABLE: Case-sensitive comparison
    if (url.includes('malicious.com')) {
        return false;
    }
    return true;

    // Attacker uses: MALICIOUS.COM or Malicious.Com
    // DNS is case-insensitive, redirect succeeds
}

// VULNERABLE: Whitespace equivalence not considered
function vulnerableValidateUrl(url) {
    const allowedHosts = ['example.com', 'trusted.com'];

    // Parse URL
    const parsed = new URL(url);

    // VULNERABLE: Doesn't consider whitespace variants
    if (!allowedHosts.includes(parsed.hostname)) {
        return false;
    }

    return true;

    // Attacker uses: "https://evil.com%20.example.com"
    // Or encoded whitespace that affects parsing
}

// VULNERABLE: Unicode equivalence
function vulnerableUsernameCheck(username) {
    const blockedUsers = ['admin', 'root', 'administrator'];

    // VULNERABLE: Doesn't normalize Unicode
    if (blockedUsers.includes(username.toLowerCase())) {
        return false;
    }

    return true;

    // Attacker uses: "ɑdmin" (Latin alpha instead of 'a')
    // Or: "аdmin" (Cyrillic 'а' instead of Latin 'a')
}
// Vulnerable: C with equivalence bypass issues

#include <string.h>
#include <ctype.h>

// VULNERABLE: Case-sensitive command check
int vulnerable_check_command(const char* cmd) {
    // VULNERABLE: Only blocks exact lowercase match
    if (strcmp(cmd, "shutdown") == 0 ||
        strcmp(cmd, "reboot") == 0 ||
        strcmp(cmd, "format") == 0) {
        return 0;  // Blocked
    }
    return 1;  // Allowed

    // Attacker uses: "SHUTDOWN", "Shutdown", "sHuTdOwN"
}

// VULNERABLE: Path check without normalization
int vulnerable_check_path(const char* path) {
    // VULNERABLE: Only checks literal pattern
    if (strstr(path, "..") != NULL) {
        return 0;  // Blocked
    }
    if (strstr(path, "/etc/") != NULL) {
        return 0;  // Blocked
    }
    return 1;

    // Attacker uses:
    // - "....//etc/passwd" (double dots)
    // - "/etc//passwd" (double slashes)
    // - "/etc/./passwd" (current directory reference)
}

// VULNERABLE: Extension check without case normalization
int vulnerable_check_extension(const char* filename) {
    const char* ext = strrchr(filename, '.');
    if (ext == NULL) return 0;

    // VULNERABLE: Case-sensitive comparison
    if (strcmp(ext, ".php") == 0 ||
        strcmp(ext, ".asp") == 0 ||
        strcmp(ext, ".jsp") == 0) {
        return 0;  // Blocked
    }
    return 1;

    // Attacker uploads: shell.PHP, shell.Php, shell.pHp
}

Fixed Code

# Fixed: Proper equivalence handling

import re
import html
import urllib.parse
import unicodedata

# FIXED: Case-insensitive XSS filter with normalization
def secure_xss_filter(input_html):
    """Filter XSS with proper case handling."""
    # FIXED: Use allowlist approach instead of denylist
    # Only allow specific safe tags

    # Approach 1: Escape all HTML
    escaped = html.escape(input_html)
    return escaped

    # Approach 2: If HTML needed, use proper sanitizer
    # import bleach
    # return bleach.clean(input_html, tags=['p', 'b', 'i', 'em', 'strong'])

# FIXED: Case-insensitive file extension check
def secure_file_upload(filename):
    """Check file extension with case normalization."""
    ALLOWED_EXTENSIONS = {'.jpg', '.jpeg', '.png', '.gif', '.pdf', '.txt'}

    # FIXED: Normalize to lowercase before checking
    normalized = filename.lower()

    # FIXED: Extract extension properly
    if '.' not in normalized:
        return False

    ext = '.' + normalized.rsplit('.', 1)[-1]

    # FIXED: Use allowlist, not denylist
    if ext not in ALLOWED_EXTENSIONS:
        return False

    # FIXED: Also check for double extensions
    if normalized.count('.') > 1:
        # Additional scrutiny for files like "image.jpg.php"
        parts = normalized.split('.')
        for part in parts[:-1]:
            if part in ['php', 'asp', 'jsp', 'exe', 'bat']:
                return False

    return True

# FIXED: Path check with normalization
import os

def secure_path_check(path, base_dir):
    """Check path with proper normalization."""
    # FIXED: URL decode first
    decoded = urllib.parse.unquote(path)

    # FIXED: Normalize path (resolves .., ., multiple slashes)
    normalized = os.path.normpath(decoded)

    # FIXED: Resolve to absolute path
    full_path = os.path.abspath(os.path.join(base_dir, normalized))
    base_path = os.path.abspath(base_dir)

    # FIXED: Ensure path is under base directory
    if not full_path.startswith(base_path + os.sep):
        return False

    return True

# FIXED: Unicode normalization for username
def secure_username_check(username):
    """Check username with Unicode normalization."""
    # FIXED: Normalize Unicode (NFKC normalizes look-alikes)
    normalized = unicodedata.normalize('NFKC', username)

    # FIXED: Case-insensitive comparison
    normalized = normalized.lower()

    # FIXED: Check against blocked users
    blocked_users = {'admin', 'root', 'administrator', 'system'}

    if normalized in blocked_users:
        return False

    # FIXED: Additional checks for confusable characters
    # Remove non-ASCII characters that could be confusables
    ascii_only = normalized.encode('ascii', 'ignore').decode('ascii')
    if ascii_only != normalized:
        # Contains non-ASCII - additional scrutiny
        if ascii_only in blocked_users:
            return False  # "ɑdmin" normalizes to "admin"

    return True
// Fixed: JavaScript with proper equivalence handling

// FIXED: Case-insensitive hostname check
function secureCheckRedirect(url) {
    try {
        const parsed = new URL(url);

        // FIXED: Normalize hostname to lowercase
        const hostname = parsed.hostname.toLowerCase();

        // FIXED: Use allowlist instead of denylist
        const allowedHosts = ['example.com', 'trusted.com', 'api.example.com'];

        if (!allowedHosts.includes(hostname)) {
            return false;
        }

        // FIXED: Also check protocol
        if (parsed.protocol !== 'https:') {
            return false;
        }

        return true;
    } catch (e) {
        // Invalid URL
        return false;
    }
}

// FIXED: URL validation with normalization
function secureValidateUrl(url) {
    try {
        // FIXED: Normalize the URL
        const parsed = new URL(url);

        // FIXED: Check for whitespace in hostname (suspicious)
        if (/\s/.test(parsed.hostname)) {
            return false;
        }

        // FIXED: Normalize and check hostname
        const hostname = parsed.hostname.toLowerCase().trim();

        const allowedHosts = ['example.com', 'trusted.com'];

        // FIXED: Exact match or subdomain match
        for (const allowed of allowedHosts) {
            if (hostname === allowed ||
                hostname.endsWith('.' + allowed)) {
                return true;
            }
        }

        return false;
    } catch (e) {
        return false;
    }
}

// FIXED: Username check with normalization
function secureUsernameCheck(username) {
    // FIXED: Normalize Unicode using String.normalize()
    const normalized = username.normalize('NFKC').toLowerCase();

    const blockedUsers = new Set(['admin', 'root', 'administrator', 'system']);

    if (blockedUsers.has(normalized)) {
        return false;
    }

    // FIXED: Check if contains only allowed characters
    if (!/^[a-z0-9_-]+$/.test(normalized)) {
        // Contains characters outside basic ASCII
        // Could be Unicode confusables
        return false;
    }

    return true;
}
// Fixed: C with proper equivalence handling

#include <string.h>
#include <ctype.h>
#include <stdbool.h>
#include <stdlib.h>

// FIXED: Case-insensitive command check
bool secure_check_command(const char* cmd) {
    // FIXED: Convert to lowercase for comparison
    char* lower = strdup(cmd);
    if (lower == NULL) return false;

    for (char* p = lower; *p; p++) {
        *p = tolower(*p);
    }

    // FIXED: Use allowlist approach
    const char* allowed_commands[] = {"list", "status", "help", "info"};
    bool allowed = false;

    for (size_t i = 0; i < sizeof(allowed_commands)/sizeof(allowed_commands[0]); i++) {
        if (strcmp(lower, allowed_commands[i]) == 0) {
            allowed = true;
            break;
        }
    }

    free(lower);
    return allowed;
}

// FIXED: Path check with normalization
bool secure_check_path(const char* path, const char* base_dir,
                        char* result, size_t result_size) {
    // FIXED: Allocate buffer for normalized path
    char* normalized = realpath(path, NULL);
    if (normalized == NULL) {
        // Path doesn't exist or is invalid
        return false;
    }

    // FIXED: Get canonical base directory
    char* base_canonical = realpath(base_dir, NULL);
    if (base_canonical == NULL) {
        free(normalized);
        return false;
    }

    // FIXED: Check that normalized path starts with base directory
    size_t base_len = strlen(base_canonical);
    bool safe = (strncmp(normalized, base_canonical, base_len) == 0) &&
                (normalized[base_len] == '/' || normalized[base_len] == '\0');

    if (safe && result != NULL) {
        strncpy(result, normalized, result_size - 1);
        result[result_size - 1] = '\0';
    }

    free(normalized);
    free(base_canonical);

    return safe;
}

// FIXED: Extension check with case normalization
bool secure_check_extension(const char* filename) {
    const char* ext = strrchr(filename, '.');
    if (ext == NULL) return false;

    // FIXED: Convert extension to lowercase
    char lower_ext[16];
    size_t i;
    for (i = 0; ext[i] && i < sizeof(lower_ext) - 1; i++) {
        lower_ext[i] = tolower(ext[i]);
    }
    lower_ext[i] = '\0';

    // FIXED: Use allowlist of safe extensions
    const char* safe_extensions[] = {".txt", ".pdf", ".jpg", ".jpeg", ".png", ".gif"};

    for (size_t j = 0; j < sizeof(safe_extensions)/sizeof(safe_extensions[0]); j++) {
        if (strcmp(lower_ext, safe_extensions[j]) == 0) {
            return true;
        }
    }

    return false;
}

// FIXED: Helper function for case-insensitive string comparison
int strcasecmp_safe(const char* s1, const char* s2) {
    while (*s1 && *s2) {
        int c1 = tolower((unsigned char)*s1);
        int c2 = tolower((unsigned char)*s2);
        if (c1 != c2) return c1 - c2;
        s1++;
        s2++;
    }
    return tolower((unsigned char)*s1) - tolower((unsigned char)*s2);
}

CVE Examples

  • CVE-2021-39155: Hostname comparison using case-sensitive matching bypassed authorization checks.
  • CVE-2020-11053: HTML-encoded whitespace bypassed redirect URL validation.
  • CVE-2005-0269: File upload filter checked only lowercase extensions, allowing bypass with uppercase.
  • CVE-2001-1238: Process names with uppercase letters couldn't be terminated by filter.
  • CVE-2004-2214: Mixed-case URIs bypassed access restrictions.

  • CWE-20: Improper Input Validation (parent)
  • CWE-41: Improper Resolution of Path Equivalence (related)
  • CWE-178: Improper Handling of Case Sensitivity (related)
  • CWE-1215: Data Validation Issues (category)

References

  1. MITRE Corporation. "CWE-1289: Improper Validation of Unsafe Equivalence in Input." https://cwe.mitre.org/data/definitions/1289.html
  2. OWASP. "Input Validation Cheat Sheet"
  3. Unicode Consortium. "Unicode Security Considerations"