Excessive Data Query Operations in a Large Data Table
Description
Excessive Data Query Operations in a Large Data Table occurs when software performs data queries with a large number of joins and sub-queries on large data tables. CISQ defines thresholds as: data tables with 1 million or more rows qualify as "large," queries should have a minimum of 5 joins to be considered problematic, and a minimum of 3 sub-queries raises concern. Such queries consume excessive database resources (CPU, memory, I/O) and can cause significant performance degradation, potentially leading to denial-of-service conditions.
Risk
While primarily a performance concern, excessive query complexity creates security vulnerabilities. Attackers who can trigger such queries (even through legitimate application features) may cause denial-of-service by exhausting database resources. Long-running queries can cause connection pool exhaustion, affecting all users. Database servers under heavy load may expose sensitive data through error messages or timeout behaviors. Slow queries may also lead to race conditions if operations timeout inconsistently. Additionally, complex queries are harder to audit for SQL injection or data exposure issues.
Solution
Optimize queries by: (1) Adding appropriate indexes for frequently queried columns, (2) Breaking complex queries into multiple simpler queries, (3) Using database views or materialized views for common complex joins, (4) Implementing pagination for large result sets, (5) Using query caching where appropriate, (6) Denormalizing data for read-heavy operations, (7) Implementing query timeouts to prevent runaway queries, (8) Using database query analyzers to identify problematic queries. Monitor query performance and set alerts for slow queries. Consider using read replicas to distribute query load.
Common Consequences
| Impact | Details |
|---|---|
| Availability | Scope: Availability DoS: Resource Consumption - Complex queries on large tables consume excessive CPU, memory, and I/O resources. |
| Availability | Scope: Availability DoS: Amplification - Single query can impact overall database performance, affecting all users. |
| Other | Scope: Other Reduce Performance - General performance degradation that may enable denial-of-service attacks. |
Example Code
Vulnerable Code
-- Vulnerable: Excessive joins and subqueries on large tables
-- This query has 6 joins and 4 subqueries
SELECT
c.customer_id,
c.name,
c.email,
o.order_id,
o.order_date,
p.product_name,
p.category,
s.supplier_name,
sh.shipment_date,
r.review_text
FROM customers c
-- Join 1
JOIN orders o ON c.customer_id = o.customer_id
-- Join 2
JOIN order_items oi ON o.order_id = oi.order_id
-- Join 3
JOIN products p ON oi.product_id = p.product_id
-- Join 4
JOIN suppliers s ON p.supplier_id = s.supplier_id
-- Join 5
LEFT JOIN shipments sh ON o.order_id = sh.order_id
-- Join 6
LEFT JOIN reviews r ON p.product_id = r.product_id AND c.customer_id = r.customer_id
WHERE c.region = 'North America'
-- Subquery 1
AND o.total_amount > (SELECT AVG(total_amount) FROM orders)
-- Subquery 2
AND p.product_id IN (
SELECT product_id FROM order_items
GROUP BY product_id
HAVING COUNT(*) > 100
)
-- Subquery 3
AND c.customer_id NOT IN (
SELECT customer_id FROM customer_blacklist
)
-- Subquery 4
AND EXISTS (
SELECT 1 FROM loyalty_points lp
WHERE lp.customer_id = c.customer_id
AND lp.points > 1000
)
ORDER BY o.order_date DESC;
-- With millions of rows in orders and order_items,
-- this query can take minutes and consume significant resources
// Vulnerable: Building complex dynamic query
public class VulnerableReportGenerator {
public List<ReportRow> generateComplexReport(ReportCriteria criteria) {
StringBuilder sql = new StringBuilder();
// Building query with many joins
sql.append("SELECT * FROM transactions t ");
sql.append("JOIN accounts a ON t.account_id = a.id ");
sql.append("JOIN customers c ON a.customer_id = c.id ");
sql.append("JOIN branches b ON a.branch_id = b.id ");
sql.append("JOIN regions r ON b.region_id = r.id ");
sql.append("JOIN account_types at ON a.type_id = at.id ");
sql.append("LEFT JOIN transaction_details td ON t.id = td.transaction_id ");
// Adding multiple subqueries based on criteria
if (criteria.hasAmountFilter()) {
sql.append("WHERE t.amount > (SELECT AVG(amount) FROM transactions) ");
}
if (criteria.hasVolumeFilter()) {
sql.append("AND a.id IN (SELECT account_id FROM transactions ");
sql.append("GROUP BY account_id HAVING COUNT(*) > ?) ");
}
// No pagination, returns all matching rows
// With millions of transactions, this is dangerous
return jdbcTemplate.query(sql.toString(), new ReportRowMapper());
}
}
# Vulnerable: ORM query with excessive eager loading and filtering
def vulnerable_get_report_data(session, filters):
# Eager loading all relationships causes massive joins
query = session.query(Order)\
.options(
joinedload(Order.customer)
.joinedload(Customer.addresses)
.joinedload(Address.region),
joinedload(Order.items)
.joinedload(OrderItem.product)
.joinedload(Product.category)
.joinedload(Category.parent),
joinedload(Order.shipment)
.joinedload(Shipment.carrier),
joinedload(Order.payment)
.joinedload(Payment.method)
)
# Complex filtering with subqueries
if filters.get('high_value'):
avg_subquery = session.query(func.avg(Order.total)).scalar_subquery()
query = query.filter(Order.total > avg_subquery)
if filters.get('frequent_customers'):
customer_subquery = session.query(Order.customer_id)\
.group_by(Order.customer_id)\
.having(func.count(Order.id) > 10)\
.scalar_subquery()
query = query.filter(Order.customer_id.in_(customer_subquery))
# No limit - returns potentially millions of rows
return query.all()
Fixed Code
-- Fixed: Optimized query with reduced complexity
-- Step 1: Use indexed views or materialized views for common aggregations
CREATE MATERIALIZED VIEW mv_order_stats AS
SELECT
customer_id,
COUNT(*) as order_count,
AVG(total_amount) as avg_order_amount,
SUM(total_amount) as total_spent
FROM orders
GROUP BY customer_id;
-- Step 2: Break into multiple focused queries
-- Query 1: Get qualifying customers (with index on region)
SELECT customer_id, name, email
INTO #qualifying_customers
FROM customers c
WHERE c.region = 'North America'
AND EXISTS (
SELECT 1 FROM loyalty_points lp
WHERE lp.customer_id = c.customer_id
AND lp.points > 1000
)
AND c.customer_id NOT IN (SELECT customer_id FROM customer_blacklist);
-- Query 2: Get recent orders with pagination
SELECT
c.customer_id,
c.name,
o.order_id,
o.order_date,
o.total_amount
FROM #qualifying_customers c
JOIN orders o ON c.customer_id = o.customer_id
JOIN mv_order_stats os ON c.customer_id = os.customer_id
WHERE o.total_amount > (SELECT overall_avg FROM global_stats)
AND os.order_count > 100
ORDER BY o.order_date DESC
OFFSET @page * @pageSize ROWS
FETCH NEXT @pageSize ROWS ONLY;
-- Query 3: Get order details for specific orders (separate call)
SELECT oi.*, p.product_name, p.category
FROM order_items oi
JOIN products p ON oi.product_id = p.product_id
WHERE oi.order_id IN (@orderIds);
// Fixed: Optimized report generation with pagination and caching
public class FixedReportGenerator {
private final QueryOptimizer queryOptimizer;
private final CacheManager cacheManager;
@Cacheable(value = "reportCache", key = "#criteria.cacheKey()")
public PagedResult<ReportRow> generateReport(ReportCriteria criteria, Pageable pageable) {
// Use query timeout to prevent runaway queries
Query query = entityManager.createQuery(buildOptimizedQuery(criteria));
query.setHint("javax.persistence.query.timeout", 30000); // 30 second timeout
// Always paginate results
query.setFirstResult(pageable.getOffset());
query.setMaxResults(pageable.getPageSize());
List<ReportRow> results = query.getResultList();
// Get count with separate optimized query
long totalCount = getCountWithCache(criteria);
return new PagedResult<>(results, totalCount, pageable);
}
private String buildOptimizedQuery(ReportCriteria criteria) {
// Use query builder that limits complexity
return queryOptimizer.build()
.select("t.id", "t.amount", "a.accountNumber", "c.name")
.from("Transaction t")
.join("t.account a")
.join("a.customer c")
.where(criteria.getFilters())
.maxJoins(3) // Enforce join limit
.build();
}
private long getCountWithCache(ReportCriteria criteria) {
String cacheKey = "count:" + criteria.cacheKey();
return cacheManager.get(cacheKey, () -> {
// Simplified count query without all joins
return jdbcTemplate.queryForObject(
"SELECT COUNT(*) FROM transactions t " +
"JOIN accounts a ON t.account_id = a.id " +
"WHERE " + criteria.getCountWhereClause(),
Long.class
);
});
}
}
# Fixed: Optimized ORM queries with lazy loading and pagination
from sqlalchemy import func
from sqlalchemy.orm import lazyload, selectinload
def fixed_get_report_data(session, filters, page=1, page_size=50):
"""Optimized query with pagination and selective loading"""
# Pre-calculate aggregates if needed (cached)
stats = get_cached_stats(session)
# Build base query with minimal joins
query = session.query(Order)\
.join(Order.customer)\
.options(
# Only load what's needed
selectinload(Order.items).load_only('quantity', 'price'),
lazyload(Order.shipment) # Load only if accessed
)
# Simple, indexed filters
if filters.get('high_value'):
query = query.filter(Order.total > stats['avg_order_value'])
if filters.get('frequent_customers'):
# Use pre-computed view/table for frequent customers
frequent_customers = get_frequent_customer_ids(session)
query = query.filter(Order.customer_id.in_(frequent_customers))
# Add date range filter (indexed)
if filters.get('date_from'):
query = query.filter(Order.created_at >= filters['date_from'])
# Always paginate
query = query.order_by(Order.created_at.desc())
query = query.offset((page - 1) * page_size).limit(page_size)
# Set query timeout
query = query.execution_options(timeout=30)
return {
'data': query.all(),
'page': page,
'page_size': page_size,
'total': get_filtered_count(session, filters) # Cached count
}
@lru_cache(maxsize=100, ttl=300) # Cache for 5 minutes
def get_cached_stats(session):
"""Pre-compute and cache aggregate statistics"""
return {
'avg_order_value': session.query(func.avg(Order.total)).scalar(),
'total_orders': session.query(func.count(Order.id)).scalar()
}
def get_frequent_customer_ids(session):
"""Use materialized view or cached results for frequent customers"""
# This could query a materialized view updated periodically
return session.query(FrequentCustomerView.customer_id).all()
# Fixed: Query complexity limiter
class QueryComplexityLimiter:
"""Enforces query complexity limits"""
MAX_JOINS = 4
MAX_SUBQUERIES = 2
MAX_RESULT_SIZE = 10000
QUERY_TIMEOUT = 30 # seconds
def validate_query(self, query_string):
join_count = query_string.lower().count(' join ')
subquery_count = query_string.lower().count('select') - 1
if join_count > self.MAX_JOINS:
raise QueryTooComplexError(
f"Query has {join_count} joins, max allowed is {self.MAX_JOINS}"
)
if subquery_count > self.MAX_SUBQUERIES:
raise QueryTooComplexError(
f"Query has {subquery_count} subqueries, max is {self.MAX_SUBQUERIES}"
)
def execute_with_limits(self, session, query):
"""Execute query with safety limits"""
# Add row limit
query = query.limit(self.MAX_RESULT_SIZE)
# Set timeout
query = query.execution_options(timeout=self.QUERY_TIMEOUT)
try:
return query.all()
except TimeoutError:
raise QueryTimeoutError("Query exceeded time limit")
CVE Examples
This CWE describes a performance issue that can contribute to denial-of-service vulnerabilities. While not directly mapped to specific CVEs, database performance issues caused by complex queries have contributed to availability incidents.
Related CWEs
- CWE-1176: Inefficient CPU Computation (parent)
- CWE-400: Uncontrolled Resource Consumption (can lead to)
- CWE-1006: Bad Coding Practices (category member)
References
- MITRE Corporation. "CWE-1049: Excessive Data Query Operations in a Large Data Table." https://cwe.mitre.org/data/definitions/1049.html
- CISQ. "Automated Source Code Quality Measures."
- PostgreSQL Documentation. "Performance Tips."
- Oracle. "SQL Tuning Guide."