Incomplete Denylist to Cross-Site Scripting

Description

Incomplete Denylist to Cross-Site Scripting is a compound vulnerability where software uses a denylist-based protection mechanism to prevent XSS (Cross-Site Scripting) attacks, but the denylist fails to cover all possible XSS attack vectors. Denylists attempt to filter known dangerous patterns like <script> tags or javascript: URLs, but browsers parse web content in highly variable ways with numerous encoding options and tag variations. Since no denylist can anticipate all attack variations, this approach is fundamentally flawed for XSS prevention, allowing attackers to craft novel payloads that bypass the incomplete filter.

Risk

Denylist-based XSS prevention creates a false sense of security while remaining vulnerable. Attackers actively research new bypass techniques, including character encoding variations, HTML parsing quirks, event handlers, CSS-based attacks, and browser-specific behaviors. Each new browser feature potentially introduces new XSS vectors that denylists don't cover. The approach fails particularly against: mixed-case tags (<ScRiPt>), alternate encodings (UTF-7, HTML entities), lesser-known event handlers (onerror, onload), and novel HTML5 features. This creates ongoing vulnerability as attackers continuously find bypasses while defenders play catch-up updating denylists.

Solution

Use allowlist-based input validation instead of denylists—define exactly what characters and patterns are permitted and reject everything else. Apply context-aware output encoding appropriate to where data is rendered (HTML body, attribute, JavaScript, CSS, URL). Use established security libraries like OWASP Java Encoder or DOMPurify rather than custom filtering. Implement Content Security Policy (CSP) as defense in depth. Use templating engines with automatic encoding. Never rely solely on input filtering—always encode output. Consider using structured data formats and APIs that don't interpret HTML.

Common Consequences

Impact	Details
Confidentiality	Scope: Confidentiality Read Application Data - XSS allows attackers to steal session cookies, credentials, and sensitive data.
Integrity	Scope: Integrity Modify Application Data - Attackers can modify page content, inject forms, or alter application behavior.
Access Control	Scope: Access Control Gain Privileges or Assume Identity - Session hijacking enables attackers to impersonate legitimate users.

Example Code

Vulnerable Code

<?php
// Vulnerable: Incomplete denylist only removes <script> tag
function vulnerable_sanitize_script_only($input) {
    // Only removes exact <script> tags
    $sanitized = preg_replace('/<script>/i', '', $input);
    $sanitized = preg_replace('/<\/script>/i', '', $sanitized);
    return $sanitized;
}

// Bypasses:
// <SCRIPT>alert(1)</SCRIPT> - different case (if not using /i)
// <script >alert(1)</script> - space in tag
// <scr<script>ipt>alert(1)</scr</script>ipt> - nested tags
// <img onerror="alert(1)" src=x> - event handlers
// <body onload="alert(1)"> - different tags

// Vulnerable: Removes common dangerous patterns but misses others
function vulnerable_sanitize_incomplete($input) {
    $denylist = array(
        '<script', '</script>',
        'javascript:',
        'onclick', 'onerror', 'onload'
    );

    $sanitized = str_ireplace($denylist, '', $input);
    return $sanitized;
}

// Bypasses:
// <img src=x oOnError="alert(1)"> - mixed case event handler
// <svg onmouseover="alert(1)"> - unlisted event handler
// <a href="javas&#99;ript:alert(1)"> - HTML entities
// <script/src="evil.js"> - no space needed
// <IMG SRC=/ onerror="alert(1)"> - unlisted src variant
?>

<!-- Vulnerable: Output without proper encoding -->
<div class="user-content">
    <?php echo vulnerable_sanitize_incomplete($_GET['comment']); ?>
</div>

// Vulnerable: JavaScript denylist filter
function vulnerableSanitize(input) {
    // Vulnerable: Only checks for common patterns
    const denylist = [
        /<script[\s\S]*?>[\s\S]*?<\/script>/gi,
        /javascript:/gi,
        /on\w+=/gi  // Tries to catch event handlers
    ];

    let sanitized = input;
    for (const pattern of denylist) {
        sanitized = sanitized.replace(pattern, '');
    }
    return sanitized;
}

// Bypasses:
// <img src="x" onerror ="alert(1)"> - space before equals
// <svg><script>alert(1)</script></svg> - SVG context
// java\nscript:alert(1) - newline in URL
// <iframe srcdoc="<script>alert(1)</script>"> - srcdoc attribute
// <math><mtext><table><mglyph><style><img src=x onerror=alert(1)></style></mglyph></table></mtext></math> - parser confusion

// Vulnerable: Using innerHTML with "sanitized" content
function displayComment(comment) {
    const sanitized = vulnerableSanitize(comment);
    document.getElementById('comments').innerHTML += sanitized;
}

# Vulnerable: Python denylist approach
import re

def vulnerable_sanitize(user_input):
    """Incomplete denylist for XSS prevention."""

    # Vulnerable: Pattern list is incomplete
    dangerous_patterns = [
        r'<script.*?>.*?</script>',
        r'javascript:',
        r'on\w+=',
        r'<iframe',
        r'<object',
        r'<embed'
    ]

    sanitized = user_input
    for pattern in dangerous_patterns:
        sanitized = re.sub(pattern, '', sanitized, flags=re.IGNORECASE)

    return sanitized

# Bypasses include:
# <img src=x onerror=alert(1)> - img not blocked
# <svg/onload=alert(1)> - svg not blocked
# <body background="javascript:alert(1)"> - background attribute
# <input onfocus=alert(1) autofocus> - autofocus trick
# <marquee onstart=alert(1)> - marquee tag
# <details ontoggle=alert(1) open> - HTML5 details element

from flask import Flask, request, render_template_string

app = Flask(__name__)

@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    sanitized = vulnerable_sanitize(comment)

    # Vulnerable: Rendering "sanitized" content
    return render_template_string(f'<div>{sanitized}</div>')

// Vulnerable: Java servlet with incomplete filtering
import javax.servlet.http.*;
import java.io.*;

public class VulnerableCommentServlet extends HttpServlet {

    private static final String[] DENYLIST = {
        "<script", "</script>",
        "javascript:",
        "onerror=", "onclick=", "onload="
    };

    private String vulnerableSanitize(String input) {
        String result = input;
        for (String pattern : DENYLIST) {
            result = result.replaceAll("(?i)" + pattern, "");
        }
        return result;
    }

    @Override
    protected void doGet(HttpServletRequest request,
                        HttpServletResponse response)
            throws IOException {

        String comment = request.getParameter("comment");
        String sanitized = vulnerableSanitize(comment);

        response.setContentType("text/html");
        PrintWriter out = response.getWriter();

        // Vulnerable: Writing "sanitized" content directly
        out.println("<html><body>");
        out.println("<div class='comment'>" + sanitized + "</div>");
        out.println("</body></html>");
    }
}

Fixed Code

<?php
// Fixed: Use proper output encoding, not denylist
function secure_output_html($input) {
    // Fixed: Encode for HTML context
    return htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}

function secure_output_attribute($input) {
    // Fixed: Encode for HTML attribute context
    return htmlspecialchars($input, ENT_QUOTES | ENT_HTML5, 'UTF-8');
}

function secure_output_javascript($input) {
    // Fixed: Encode for JavaScript string context
    return json_encode($input, JSON_HEX_TAG | JSON_HEX_APOS |
                               JSON_HEX_QUOT | JSON_HEX_AMP);
}

// For rich text, use established library
function secure_rich_text($input) {
    // Use HTMLPurifier library
    require_once 'HTMLPurifier.auto.php';
    $config = HTMLPurifier_Config::createDefault();
    $config->set('HTML.Allowed', 'p,b,i,u,a[href],ul,ol,li');
    $purifier = new HTMLPurifier($config);
    return $purifier->purify($input);
}
?>

<!-- Fixed: Proper encoding in template -->
<div class="user-content">
    <?php echo secure_output_html($_GET['comment']); ?>
</div>

<!-- Fixed: Attribute context -->
<input type="text" value="<?php echo secure_output_attribute($value); ?>">

<!-- Fixed: JavaScript context -->
<script>
var userInput = <?php echo secure_output_javascript($data); ?>;
</script>

// Fixed: Use proper encoding and safe DOM manipulation
function secureTextContent(input) {
    // Fixed: Use textContent which doesn't parse HTML
    const div = document.createElement('div');
    div.textContent = input;  // Automatically escaped
    return div.innerHTML;  // Get the escaped version
}

// Fixed: Use DOM methods instead of innerHTML
function displayCommentSecure(comment) {
    const container = document.getElementById('comments');
    const commentDiv = document.createElement('div');
    commentDiv.className = 'comment';
    commentDiv.textContent = comment;  // Safe: doesn't parse HTML
    container.appendChild(commentDiv);
}

// Fixed: For rich text, use DOMPurify library
function displayRichComment(htmlContent) {
    // Use DOMPurify for HTML sanitization
    const clean = DOMPurify.sanitize(htmlContent, {
        ALLOWED_TAGS: ['b', 'i', 'u', 'a', 'p', 'br'],
        ALLOWED_ATTR: ['href']
    });
    document.getElementById('comments').innerHTML = clean;
}

// Fixed: Context-aware encoding function
const encode = {
    forHTML: (str) => {
        return str
            .replace(/&/g, '&amp;')
            .replace(/</g, '&lt;')
            .replace(/>/g, '&gt;')
            .replace(/"/g, '&quot;')
            .replace(/'/g, '&#x27;');
    },
    forAttribute: (str) => {
        return str.replace(/[^a-zA-Z0-9]/g, (char) => {
            return '&#' + char.charCodeAt(0) + ';';
        });
    },
    forJavaScript: (str) => {
        return JSON.stringify(str);
    }
};

# Fixed: Use proper encoding with Flask/Jinja2
from flask import Flask, request, render_template, Markup
from markupsafe import escape
import bleach

app = Flask(__name__)

# Fixed: Automatic escaping with Jinja2
@app.route('/comment')
def show_comment():
    comment = request.args.get('text', '')
    # render_template automatically escapes variables
    return render_template('comment.html', comment=comment)

# comment.html template:
# <div>{{ comment }}</div>  <!-- Automatically escaped -->

# Fixed: For rich text, use allowlist with bleach
ALLOWED_TAGS = ['p', 'b', 'i', 'u', 'a', 'ul', 'ol', 'li', 'br']
ALLOWED_ATTRS = {'a': ['href', 'title']}

def secure_rich_text(html_input):
    """Sanitize HTML using allowlist approach."""
    return bleach.clean(
        html_input,
        tags=ALLOWED_TAGS,
        attributes=ALLOWED_ATTRS,
        strip=True
    )

@app.route('/rich-comment')
def show_rich_comment():
    comment = request.args.get('text', '')
    # Clean with allowlist, then mark as safe for template
    clean_comment = secure_rich_text(comment)
    return render_template('comment.html',
                          comment=Markup(clean_comment))

# Fixed: CSP header for defense in depth
@app.after_request
def add_security_headers(response):
    response.headers['Content-Security-Policy'] = \
        "default-src 'self'; script-src 'self'"
    return response

// Fixed: Java with OWASP Encoder
import org.owasp.encoder.Encode;
import javax.servlet.http.*;
import java.io.*;

public class SecureCommentServlet extends HttpServlet {

    @Override
    protected void doGet(HttpServletRequest request,
                        HttpServletResponse response)
            throws IOException {

        String comment = request.getParameter("comment");

        response.setContentType("text/html; charset=UTF-8");
        response.setHeader("Content-Security-Policy",
                          "default-src 'self'");
        PrintWriter out = response.getWriter();

        out.println("<html><body>");

        // Fixed: Use OWASP Encoder for HTML context
        out.print("<div class='comment'>");
        out.print(Encode.forHtml(comment));
        out.println("</div>");

        // For attributes
        out.print("<input value='");
        out.print(Encode.forHtmlAttribute(comment));
        out.println("'>");

        // For JavaScript
        out.print("<script>var data = '");
        out.print(Encode.forJavaScript(comment));
        out.println("';</script>");

        out.println("</body></html>");
    }
}

CVE Examples

CVE-2007-5727: XSS filter bypass—denylist only removed <SCRIPT> tags, allowing other vectors.
CVE-2006-3617: Web application XSS filter only blocked <SCRIPT> tags.
CVE-2006-4308: XSS filter only checked for "javascript:" pattern, allowing variations.

References

MITRE Corporation. "CWE-692: Incomplete Denylist to Cross-Site Scripting." https://cwe.mitre.org/data/definitions/692.html
OWASP. "XSS Prevention Cheat Sheet." https://cheatsheetseries.owasp.org/cheatsheets/Cross_Site_Scripting_Prevention_Cheat_Sheet.html
OWASP. "Java Encoder Project." https://owasp.org/owasp-java-encoder/