Excessive Data Query Operations in a Large Data Table

Description

Excessive Data Query Operations in a Large Data Table occurs when software performs data queries with a large number of joins and sub-queries on large data tables. CISQ defines thresholds as: data tables with 1 million or more rows qualify as "large," queries should have a minimum of 5 joins to be considered problematic, and a minimum of 3 sub-queries raises concern. Such queries consume excessive database resources (CPU, memory, I/O) and can cause significant performance degradation, potentially leading to denial-of-service conditions.

Risk

While primarily a performance concern, excessive query complexity creates security vulnerabilities. Attackers who can trigger such queries (even through legitimate application features) may cause denial-of-service by exhausting database resources. Long-running queries can cause connection pool exhaustion, affecting all users. Database servers under heavy load may expose sensitive data through error messages or timeout behaviors. Slow queries may also lead to race conditions if operations timeout inconsistently. Additionally, complex queries are harder to audit for SQL injection or data exposure issues.

Solution

Optimize queries by: (1) Adding appropriate indexes for frequently queried columns, (2) Breaking complex queries into multiple simpler queries, (3) Using database views or materialized views for common complex joins, (4) Implementing pagination for large result sets, (5) Using query caching where appropriate, (6) Denormalizing data for read-heavy operations, (7) Implementing query timeouts to prevent runaway queries, (8) Using database query analyzers to identify problematic queries. Monitor query performance and set alerts for slow queries. Consider using read replicas to distribute query load.

Common Consequences

ImpactDetails
AvailabilityScope: Availability

DoS: Resource Consumption - Complex queries on large tables consume excessive CPU, memory, and I/O resources.
AvailabilityScope: Availability

DoS: Amplification - Single query can impact overall database performance, affecting all users.
OtherScope: Other

Reduce Performance - General performance degradation that may enable denial-of-service attacks.

Example Code

Vulnerable Code

-- Vulnerable: Excessive joins and subqueries on large tables
-- This query has 6 joins and 4 subqueries

SELECT
    c.customer_id,
    c.name,
    c.email,
    o.order_id,
    o.order_date,
    p.product_name,
    p.category,
    s.supplier_name,
    sh.shipment_date,
    r.review_text
FROM customers c
-- Join 1
JOIN orders o ON c.customer_id = o.customer_id
-- Join 2
JOIN order_items oi ON o.order_id = oi.order_id
-- Join 3
JOIN products p ON oi.product_id = p.product_id
-- Join 4
JOIN suppliers s ON p.supplier_id = s.supplier_id
-- Join 5
LEFT JOIN shipments sh ON o.order_id = sh.order_id
-- Join 6
LEFT JOIN reviews r ON p.product_id = r.product_id AND c.customer_id = r.customer_id
WHERE c.region = 'North America'
-- Subquery 1
AND o.total_amount > (SELECT AVG(total_amount) FROM orders)
-- Subquery 2
AND p.product_id IN (
    SELECT product_id FROM order_items
    GROUP BY product_id
    HAVING COUNT(*) > 100
)
-- Subquery 3
AND c.customer_id NOT IN (
    SELECT customer_id FROM customer_blacklist
)
-- Subquery 4
AND EXISTS (
    SELECT 1 FROM loyalty_points lp
    WHERE lp.customer_id = c.customer_id
    AND lp.points > 1000
)
ORDER BY o.order_date DESC;

-- With millions of rows in orders and order_items,
-- this query can take minutes and consume significant resources
// Vulnerable: Building complex dynamic query
public class VulnerableReportGenerator {

    public List<ReportRow> generateComplexReport(ReportCriteria criteria) {
        StringBuilder sql = new StringBuilder();

        // Building query with many joins
        sql.append("SELECT * FROM transactions t ");
        sql.append("JOIN accounts a ON t.account_id = a.id ");
        sql.append("JOIN customers c ON a.customer_id = c.id ");
        sql.append("JOIN branches b ON a.branch_id = b.id ");
        sql.append("JOIN regions r ON b.region_id = r.id ");
        sql.append("JOIN account_types at ON a.type_id = at.id ");
        sql.append("LEFT JOIN transaction_details td ON t.id = td.transaction_id ");

        // Adding multiple subqueries based on criteria
        if (criteria.hasAmountFilter()) {
            sql.append("WHERE t.amount > (SELECT AVG(amount) FROM transactions) ");
        }

        if (criteria.hasVolumeFilter()) {
            sql.append("AND a.id IN (SELECT account_id FROM transactions ");
            sql.append("GROUP BY account_id HAVING COUNT(*) > ?) ");
        }

        // No pagination, returns all matching rows
        // With millions of transactions, this is dangerous

        return jdbcTemplate.query(sql.toString(), new ReportRowMapper());
    }
}
# Vulnerable: ORM query with excessive eager loading and filtering
def vulnerable_get_report_data(session, filters):
    # Eager loading all relationships causes massive joins
    query = session.query(Order)\
        .options(
            joinedload(Order.customer)
                .joinedload(Customer.addresses)
                .joinedload(Address.region),
            joinedload(Order.items)
                .joinedload(OrderItem.product)
                .joinedload(Product.category)
                .joinedload(Category.parent),
            joinedload(Order.shipment)
                .joinedload(Shipment.carrier),
            joinedload(Order.payment)
                .joinedload(Payment.method)
        )

    # Complex filtering with subqueries
    if filters.get('high_value'):
        avg_subquery = session.query(func.avg(Order.total)).scalar_subquery()
        query = query.filter(Order.total > avg_subquery)

    if filters.get('frequent_customers'):
        customer_subquery = session.query(Order.customer_id)\
            .group_by(Order.customer_id)\
            .having(func.count(Order.id) > 10)\
            .scalar_subquery()
        query = query.filter(Order.customer_id.in_(customer_subquery))

    # No limit - returns potentially millions of rows
    return query.all()

Fixed Code

-- Fixed: Optimized query with reduced complexity

-- Step 1: Use indexed views or materialized views for common aggregations
CREATE MATERIALIZED VIEW mv_order_stats AS
SELECT
    customer_id,
    COUNT(*) as order_count,
    AVG(total_amount) as avg_order_amount,
    SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id;

-- Step 2: Break into multiple focused queries

-- Query 1: Get qualifying customers (with index on region)
SELECT customer_id, name, email
INTO #qualifying_customers
FROM customers c
WHERE c.region = 'North America'
AND EXISTS (
    SELECT 1 FROM loyalty_points lp
    WHERE lp.customer_id = c.customer_id
    AND lp.points > 1000
)
AND c.customer_id NOT IN (SELECT customer_id FROM customer_blacklist);

-- Query 2: Get recent orders with pagination
SELECT
    c.customer_id,
    c.name,
    o.order_id,
    o.order_date,
    o.total_amount
FROM #qualifying_customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN mv_order_stats os ON c.customer_id = os.customer_id
WHERE o.total_amount > (SELECT overall_avg FROM global_stats)
AND os.order_count > 100
ORDER BY o.order_date DESC
OFFSET @page * @pageSize ROWS
FETCH NEXT @pageSize ROWS ONLY;

-- Query 3: Get order details for specific orders (separate call)
SELECT oi.*, p.product_name, p.category
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
WHERE oi.order_id IN (@orderIds);
// Fixed: Optimized report generation with pagination and caching
public class FixedReportGenerator {

    private final QueryOptimizer queryOptimizer;
    private final CacheManager cacheManager;

    @Cacheable(value = "reportCache", key = "#criteria.cacheKey()")
    public PagedResult<ReportRow> generateReport(ReportCriteria criteria, Pageable pageable) {

        // Use query timeout to prevent runaway queries
        Query query = entityManager.createQuery(buildOptimizedQuery(criteria));
        query.setHint("javax.persistence.query.timeout", 30000); // 30 second timeout

        // Always paginate results
        query.setFirstResult(pageable.getOffset());
        query.setMaxResults(pageable.getPageSize());

        List<ReportRow> results = query.getResultList();

        // Get count with separate optimized query
        long totalCount = getCountWithCache(criteria);

        return new PagedResult<>(results, totalCount, pageable);
    }

    private String buildOptimizedQuery(ReportCriteria criteria) {
        // Use query builder that limits complexity
        return queryOptimizer.build()
            .select("t.id", "t.amount", "a.accountNumber", "c.name")
            .from("Transaction t")
            .join("t.account a")
            .join("a.customer c")
            .where(criteria.getFilters())
            .maxJoins(3)  // Enforce join limit
            .build();
    }

    private long getCountWithCache(ReportCriteria criteria) {
        String cacheKey = "count:" + criteria.cacheKey();

        return cacheManager.get(cacheKey, () -> {
            // Simplified count query without all joins
            return jdbcTemplate.queryForObject(
                "SELECT COUNT(*) FROM transactions t " +
                "JOIN accounts a ON t.account_id = a.id " +
                "WHERE " + criteria.getCountWhereClause(),
                Long.class
            );
        });
    }
}
# Fixed: Optimized ORM queries with lazy loading and pagination
from sqlalchemy import func
from sqlalchemy.orm import lazyload, selectinload

def fixed_get_report_data(session, filters, page=1, page_size=50):
    """Optimized query with pagination and selective loading"""

    # Pre-calculate aggregates if needed (cached)
    stats = get_cached_stats(session)

    # Build base query with minimal joins
    query = session.query(Order)\
        .join(Order.customer)\
        .options(
            # Only load what's needed
            selectinload(Order.items).load_only('quantity', 'price'),
            lazyload(Order.shipment)  # Load only if accessed
        )

    # Simple, indexed filters
    if filters.get('high_value'):
        query = query.filter(Order.total > stats['avg_order_value'])

    if filters.get('frequent_customers'):
        # Use pre-computed view/table for frequent customers
        frequent_customers = get_frequent_customer_ids(session)
        query = query.filter(Order.customer_id.in_(frequent_customers))

    # Add date range filter (indexed)
    if filters.get('date_from'):
        query = query.filter(Order.created_at >= filters['date_from'])

    # Always paginate
    query = query.order_by(Order.created_at.desc())
    query = query.offset((page - 1) * page_size).limit(page_size)

    # Set query timeout
    query = query.execution_options(timeout=30)

    return {
        'data': query.all(),
        'page': page,
        'page_size': page_size,
        'total': get_filtered_count(session, filters)  # Cached count
    }


@lru_cache(maxsize=100, ttl=300)  # Cache for 5 minutes
def get_cached_stats(session):
    """Pre-compute and cache aggregate statistics"""
    return {
        'avg_order_value': session.query(func.avg(Order.total)).scalar(),
        'total_orders': session.query(func.count(Order.id)).scalar()
    }


def get_frequent_customer_ids(session):
    """Use materialized view or cached results for frequent customers"""
    # This could query a materialized view updated periodically
    return session.query(FrequentCustomerView.customer_id).all()
# Fixed: Query complexity limiter
class QueryComplexityLimiter:
    """Enforces query complexity limits"""

    MAX_JOINS = 4
    MAX_SUBQUERIES = 2
    MAX_RESULT_SIZE = 10000
    QUERY_TIMEOUT = 30  # seconds

    def validate_query(self, query_string):
        join_count = query_string.lower().count(' join ')
        subquery_count = query_string.lower().count('select') - 1

        if join_count > self.MAX_JOINS:
            raise QueryTooComplexError(
                f"Query has {join_count} joins, max allowed is {self.MAX_JOINS}"
            )

        if subquery_count > self.MAX_SUBQUERIES:
            raise QueryTooComplexError(
                f"Query has {subquery_count} subqueries, max is {self.MAX_SUBQUERIES}"
            )

    def execute_with_limits(self, session, query):
        """Execute query with safety limits"""
        # Add row limit
        query = query.limit(self.MAX_RESULT_SIZE)

        # Set timeout
        query = query.execution_options(timeout=self.QUERY_TIMEOUT)

        try:
            return query.all()
        except TimeoutError:
            raise QueryTimeoutError("Query exceeded time limit")

CVE Examples

This CWE describes a performance issue that can contribute to denial-of-service vulnerabilities. While not directly mapped to specific CVEs, database performance issues caused by complex queries have contributed to availability incidents.


  • CWE-1176: Inefficient CPU Computation (parent)
  • CWE-400: Uncontrolled Resource Consumption (can lead to)
  • CWE-1006: Bad Coding Practices (category member)

References

  1. MITRE Corporation. "CWE-1049: Excessive Data Query Operations in a Large Data Table." https://cwe.mitre.org/data/definitions/1049.html
  2. CISQ. "Automated Source Code Quality Measures."
  3. PostgreSQL Documentation. "Performance Tips."
  4. Oracle. "SQL Tuning Guide."