Identifying Table Skew on Teradata Vantage Using Normalised Standard Deviation
Skew remains one of the most common causes of performance and stability issues on Teradata Vantage. While classic metrics such as Max AMP / Avg AMP are still useful, they often fail to tell the full story—particularly on modern platforms with mixed workloads, columnar tables, compression, and varying table sizes.
In this post, we’ll look at a simple, robust, and scalable way to identify the most skewed tables using AMP-level space distribution and a normalised statistical measure.
Why look beyond Max/Avg?
The traditional skew check:
Max AMP space ÷ Average AMP spaceis effective at spotting single hot AMPs, but it has limitations:
It ignores the distribution between min and max
It over-penalises very large tables
It’s hard to compare skew across tables of very different sizes
On a modern Vantage system, we often want to answer a slightly different question:
Which tables are most unevenly distributed relative to their size?
Using AMP-level space from DBC.TableSizeV
Teradata already provides exactly what we need.
DBC.TableSizeV exposes:
One row per table per AMP
vproc→ AMP numberCurrentPerm→ space used by that table on that AMP
SELECT vproc AS
"AMP", TableName (FORMAT 'X(20)'), CurrentPerm
FROM DBC.TableSizeV
WHERE DatabaseName = 'database'
ORDER BY TableName, "AMP" ;Finding and Fixing Skewed Tables • Database Administration • Reader • Teradata Developers Portal
This allows us to treat AMP space usage as a distribution, which we can analyse statistically.
The skew metric: Coefficient of Variation
The metric we’ll use is:
STDDEV_POP(CurrentPerm) / AVG(CurrentPerm)
This is known as the coefficient of variation (CV).
In plain terms, it answers:
“On average, how far does an AMP deviate from its expected share of this table’s space?”
Why this works well:
STDDEV_POPmeasures spread across AMPsAVGnormalises for table sizeThe result is scale-independent
Tables of very different sizes become directly comparable
The query
Here’s the full query used to identify the top 5 most skewed tables on the system:
WITH t1 (databasename, tablename, "amp", currentperm) AS
(
SELECT
DatabaseName,
TableName (FORMAT 'X(20)'),
vproc AS "AMP",
CurrentPerm
FROM
DBC.TableSizeV
)
SELECT
databasename,
tablename,
STDDEV_POP(CurrentPerm) / NULLIFZERO(AVG(CurrentPerm)) AS norm_stddev,
DENSE_RANK() OVER (ORDER BY norm_stddev DESC) AS "rank"
FROM
t1
QUALIFY
"rank" <= 5
GROUP BY
1,2;
How the query works
Step 1: AMP-level data (CTE)
The CTE (t1) produces:
One row per database, table, and AMP
The space used by that table on each AMP
This is the correct grain for skew analysis.
Step 2: Aggregate to table level
In the outer query we:
Aggregate across AMPs
Calculate:
AVG(CurrentPerm)→ ideal per-AMP spaceSTDDEV_POP(CurrentPerm)→ distribution spread
Step 3: Normalise and rank
We then:
Divide stddev by average to produce
norm_stddevRank tables using
DENSE_RANK()
DENSE_RANK() is deliberate:
Tables with identical skew scores share the same rank
No gaps in ranking
Results are predictable and clean for reporting
Interpreting the results
Typical guidance for norm_stddev values:
norm_stddev Interpretation
0.10 - Very well distributed
0.10 – 0.25 Mild skew
0.25 – 0.49 Noticeable skew
0.50 - 0.99 Serious skew
1.00 - Extreme skew / broken PI
Exact thresholds depend on:
AMP count
Compression
Table design
Workload characteristics
What this does (and doesn’t) measure
This query measures:
Physical space skew
Which tables are likely to stress:
AMP memory
Spool
Redistribution
Joins and aggregations
It does not directly measure:
Logical row skew
Join cardinality skew
Data access skew from predicates
That said, space skew is often the most actionable early warning signal on a Vantage system.
Why this approach works well on modern Vantage
Independent of table size
Stable across platform expansions
Works with columnar and compressed tables
Ideal for:
Health checks
Automation
Dashboards
Tooling and extension demos
Final thoughts
This approach doesn’t replace classic Teradata skew metrics—it complements them. Max/Avg still has value, but a normalised standard deviation gives you a clearer, more scalable way to identify true problem tables across an enterprise system.
If you’re building operational tooling, performing regular health checks, or demonstrating Teradata Vantage capabilities, this is a strong, modern pattern to have in your toolkit.