Identifying Table Skew on Teradata Vantage Using Normalised Standard Deviation

Skew remains one of the most common causes of performance and stability issues on Teradata Vantage. While classic metrics such as Max AMP / Avg AMP are still useful, they often fail to tell the full story—particularly on modern platforms with mixed workloads, columnar tables, compression, and varying table sizes.

In this post, we’ll look at a simple, robust, and scalable way to identify the most skewed tables using AMP-level space distribution and a normalised statistical measure.

Why look beyond Max/Avg?

The traditional skew check:

Max AMP space ÷ Average AMP space

is effective at spotting single hot AMPs, but it has limitations:

  • It ignores the distribution between min and max

  • It over-penalises very large tables

  • It’s hard to compare skew across tables of very different sizes

On a modern Vantage system, we often want to answer a slightly different question:

Which tables are most unevenly distributed relative to their size?

Using AMP-level space from DBC.TableSizeV

Teradata already provides exactly what we need.

DBC.TableSizeV exposes:

  • One row per table per AMP

  • vproc → AMP number

  • CurrentPerm → space used by that table on that AMP

SELECT vproc AS
"AMP", TableName (FORMAT 'X(20)'), CurrentPerm
FROM DBC.TableSizeV
WHERE DatabaseName = 'database'
ORDER BY TableName, "AMP" ;

Finding and Fixing Skewed Tables • Database Administration • Reader • Teradata Developers Portal

This allows us to treat AMP space usage as a distribution, which we can analyse statistically.

The skew metric: Coefficient of Variation

The metric we’ll use is:

STDDEV_POP(CurrentPerm) / AVG(CurrentPerm)

This is known as the coefficient of variation (CV).

In plain terms, it answers:

“On average, how far does an AMP deviate from its expected share of this table’s space?”

Why this works well:

  • STDDEV_POP measures spread across AMPs

  • AVG normalises for table size

  • The result is scale-independent

  • Tables of very different sizes become directly comparable

The query

Here’s the full query used to identify the top 5 most skewed tables on the system:

WITH t1 (databasename, tablename, "amp", currentperm) AS
(
    SELECT 
        DatabaseName,    
        TableName (FORMAT 'X(20)'), 
        vproc AS "AMP", 
        CurrentPerm
    FROM 
        DBC.TableSizeV
)
SELECT 
    databasename,
    tablename, 
    STDDEV_POP(CurrentPerm) / NULLIFZERO(AVG(CurrentPerm)) AS norm_stddev,
    DENSE_RANK() OVER (ORDER BY norm_stddev DESC) AS "rank"
FROM 
    t1
QUALIFY 
    "rank" <= 5
GROUP BY 
    1,2;

How the query works

Step 1: AMP-level data (CTE)

The CTE (t1) produces:

  • One row per database, table, and AMP

  • The space used by that table on each AMP

This is the correct grain for skew analysis.

Step 2: Aggregate to table level

In the outer query we:

  • Aggregate across AMPs

  • Calculate:

    • AVG(CurrentPerm) → ideal per-AMP space

    • STDDEV_POP(CurrentPerm) → distribution spread

Step 3: Normalise and rank

We then:

  • Divide stddev by average to produce norm_stddev

  • Rank tables using DENSE_RANK()

DENSE_RANK() is deliberate:

  • Tables with identical skew scores share the same rank

  • No gaps in ranking

  • Results are predictable and clean for reporting

Interpreting the results

Typical guidance for norm_stddev values:

norm_stddev Interpretation

  • 0.10 - Very well distributed

  • 0.10 – 0.25 Mild skew

  • 0.25 – 0.49 Noticeable skew

  • 0.50 - 0.99 Serious skew

  • 1.00 - Extreme skew / broken PI

Exact thresholds depend on:

  • AMP count

  • Compression

  • Table design

  • Workload characteristics

What this does (and doesn’t) measure

This query measures:

  • Physical space skew

  • Which tables are likely to stress:

    • AMP memory

    • Spool

    • Redistribution

    • Joins and aggregations

It does not directly measure:

  • Logical row skew

  • Join cardinality skew

  • Data access skew from predicates

That said, space skew is often the most actionable early warning signal on a Vantage system.

Why this approach works well on modern Vantage

  • Independent of table size

  • Stable across platform expansions

  • Works with columnar and compressed tables

  • Ideal for:

    • Health checks

    • Automation

    • Dashboards

    • Tooling and extension demos

Final thoughts

This approach doesn’t replace classic Teradata skew metrics—it complements them. Max/Avg still has value, but a normalised standard deviation gives you a clearer, more scalable way to identify true problem tables across an enterprise system.

If you’re building operational tooling, performing regular health checks, or demonstrating Teradata Vantage capabilities, this is a strong, modern pattern to have in your toolkit.

Previous
Previous

Installing Teradata Parallel Transporter (TPT) Kafka Connector in WSL

Next
Next

Teradata Express