TiDB Statistics: Understanding the Initialization Process

Statistics Statistics collection is a crucial process of modern database systems, forming the backbone of query optimization. In TiDB, statistics are indispensable, serving as the sole source of information for estimating query costs and selecting the most efficient execution plan. TiDB collects several types of statistics for each table, including: TopN values (most frequent values to reflect data skewness) Histograms (data distribution) Number of Distinct Values (NDV) Other statistical metrics These statistics will be stored in some system tables, such as mysql....

February 5, 2025 · 11 min · Rustin liu

Batch Dumping Statistics Delta

Background Recently, we have been tackling the challenge of supporting 3 million tables within a single TiDB cluster. One of the most significant hurdles we’ve faced is optimizing the performance of statistics collection. In its current implementation, TiDB gathers basic table information from all servers and consolidates it into a single system table. While functional, this approach becomes highly inefficient when managing millions of tables, consuming excessive CPU and taking a considerable amount of time....

December 14, 2024 · 8 min · Rustin liu