Performance Improvement on Teradata Vantage

Teradata Vantage (TDV) teradata.com is a modern cloud data platform, that let’s us build and process data with comparisons drawn to Redshift/Big query which are similar MPP architectures. There are few differences though, but we will stick to creation of tables in Teradata Vantage that gives performance gains.

Optimizing the runtime of a job is primarily based on few factors.

Was the data distributed uniformly across the cluster (Nodes are called AMP’s in TDV) Were the joins of specific tables happening locally? Let’s look at these from creation of a table point of view, define Primary Index (PI) in such a way that the key or combination of keys selected would provide uniformity in distribution of data across the cluster.

Also we shall keep in mind of the joins happening on this table, since the primary index defines distribution, we need to ensure the join keys are part of Primary index, so that same keys routed to the same AMP’s and the joins will happen locally without shuffle.

Please be reminded that if there was no PI defined, most likely the first column is considered as PI and data will be distributed accordingly.

If most of the data sits unevenly, it will result in data skew, which can be defined by a factor ranging from 0-100. The higher the factor, the worse the distribution, factor of 100 is heavily skewed and your job will take long time to complete.

Optimizing the data for faster querying for the front end dashboards is critical and there are few things to consider

How’s the data sorted on the AMP? Does TDV able to do “partition-elimination” while retrieving data? Define secondary index (SI) on the table, that is based on the access patterns, for ex: RPT_YR_MTH or CUST_ID, that ensures that all of the data with in AMP is sorted and indexed for faster search.

You may define partitioning to ensure query plan skips logical blocks on the AMP and directly jump to the required partition by deploying “partition-elimination”

Please be noted that defining too many columns in SI may in fact not provide desirable results, also ensure lower cardinality columns defined first, followed by higher cardinality in your key combination.

I hope this article provided some insights, any feedback is appreciated !