Identifying Table Skew on Teradata Vantage Using Normalised Standard Deviation

Jan 18

Skew remains one of the most common causes of performance and stability issues on Teradata Vantage. While classic metrics such as Max AMP / Avg AMP are still useful, they often fail to tell the full story—particularly on modern platforms with mixed workloads, columnar tables, compression, and varying table sizes.

In this post, we’ll look at a simple, robust, and scalable way to identify the most skewed tables using AMP-level space distribution and a normalised statistical measure.

Why look beyond Max/Avg?

The traditional skew check:

Max AMP space ÷ Average AMP space

is effective at spotting single hot AMPs, but it has limitations:

It ignores the distribution between min and max
It over-penalises very large tables
It’s hard to compare skew across tables of very different sizes

On a modern Vantage system, we often want to answer a slightly different question:

Which tables are most unevenly distributed relative to their size?

Using AMP-level space from `DBC.TableSizeV`

Teradata already provides exactly what we need.

DBC.TableSizeV exposes:

One row per table per AMP
vproc → AMP number
CurrentPerm → space used by that table on that AMP

SELECT vproc AS
"AMP", TableName (FORMAT 'X(20)'), CurrentPerm
FROM DBC.TableSizeV
WHERE DatabaseName = 'database'
ORDER BY TableName, "AMP" ;

Finding and Fixing Skewed Tables • Database Administration • Reader • Teradata Developers Portal

This allows us to treat AMP space usage as a distribution, which we can analyse statistically.

The skew metric: Coefficient of Variation

The metric we’ll use is:

STDDEV_POP(CurrentPerm) / AVG(CurrentPerm)

This is known as the coefficient of variation (CV).

In plain terms, it answers:

“On average, how far does an AMP deviate from its expected share of this table’s space?”

Why this works well:

STDDEV_POP measures spread across AMPs
AVG normalises for table size
The result is scale-independent
Tables of very different sizes become directly comparable

The query

Here’s the full query used to identify the top 5 most skewed tables on the system:

WITH t1 (databasename, tablename, "amp", currentperm) AS
(
    SELECT 
        DatabaseName,    
        TableName (FORMAT 'X(20)'), 
        vproc AS "AMP", 
        CurrentPerm
    FROM 
        DBC.TableSizeV
)
SELECT 
    databasename,
    tablename, 
    STDDEV_POP(CurrentPerm) / NULLIFZERO(AVG(CurrentPerm)) AS norm_stddev,
    DENSE_RANK() OVER (ORDER BY norm_stddev DESC) AS "rank"
FROM 
    t1
QUALIFY 
    "rank" <= 5
GROUP BY 
    1,2;

How the query works

Step 1: AMP-level data (CTE)

The CTE (t1) produces:

One row per database, table, and AMP
The space used by that table on each AMP

This is the correct grain for skew analysis.

Step 2: Aggregate to table level

In the outer query we:

Aggregate across AMPs
Calculate:
- AVG(CurrentPerm) → ideal per-AMP space
- STDDEV_POP(CurrentPerm) → distribution spread

Step 3: Normalise and rank

We then:

Divide stddev by average to produce norm_stddev
Rank tables using DENSE_RANK()

DENSE_RANK() is deliberate:

Tables with identical skew scores share the same rank
No gaps in ranking
Results are predictable and clean for reporting

Interpreting the results

Typical guidance for norm_stddev values:

norm_stddev Interpretation

0.10 - Very well distributed
0.10 – 0.25 Mild skew
0.25 – 0.49 Noticeable skew
0.50 - 0.99 Serious skew
1.00 - Extreme skew / broken PI

Exact thresholds depend on:

AMP count
Compression
Table design
Workload characteristics

What this does (and doesn’t) measure

This query measures:

Physical space skew
Which tables are likely to stress:
- AMP memory
- Spool
- Redistribution
- Joins and aggregations

It does not directly measure:

Logical row skew
Join cardinality skew
Data access skew from predicates

That said, space skew is often the most actionable early warning signal on a Vantage system.

Why this approach works well on modern Vantage

Independent of table size
Stable across platform expansions
Works with columnar and compressed tables
Ideal for:
- Health checks
- Automation
- Dashboards
- Tooling and extension demos

Final thoughts

This approach doesn’t replace classic Teradata skew metrics—it complements them. Max/Avg still has value, but a normalised standard deviation gives you a clearer, more scalable way to identify true problem tables across an enterprise system.

If you’re building operational tooling, performing regular health checks, or demonstrating Teradata Vantage capabilities, this is a strong, modern pattern to have in your toolkit.

James Armitage