The Development and Implementation of ExtSQL by John Murtari [ Orig article from Jan 2007, based on 4.1.x version of MySQL ] Introduction MySQL is a popular and high performance Open Source database. My first exposure to it came in the web hosting business. It was easy to install, the documentation was excellent, and most importantly -- it ran like the Energizer bunny and never quit! A welcome attribute when you are on a pager and like to actually sleep at night... My only complaint was a lack of detailed performance monitoring. Yes, we could look at runtime info via SHOW STATUS or SHOW PROCESSLIST and get counts on a lot of activity -- but only at the server level. In a hosting environment it was not unusual to have over 50 different DBs on one server. We might see a spike in CPU usage and know it was MySQL -- but have a devil of a time tracking it down to a user. Some of you have been down this road as DBAs. When desperate we would actually log all queries for a short while, generating volumes of output which took forever to analyze. Of course, by the time we got ready to take a hard look -- the unusual event had passed! I did an extensive search about monitoring tools. A consistent answer came back, "MySQL just doesn't track to that level. The data is not available." Being a sceptic I eventually took a look at the server source, C and C++, and realized they were right. There was still hope. Newer versions of the server had a limited ability to set usage limits: GRANT USAGE ON * . * TO 'thebookdemo'@ '%' WITH MAX_QUERIES_PER_HOUR 500 MAX_CONNECTIONS_PER_HOUR 30 MAX_UPDATES_PER_HOUR 50 ; We considered using this until we realized there was no way to see how close people were to their limits. No log entry was generated when a user was blocked. Not very friendly if you want to provide happy service. Also, since we had no idea what their usage was -- how could we set an initial limit? The classic chicken and egg dilemmma! I began to explore the code to see if we could at least get a report on usage. This effort, described below, eventually produced a new addition known as MySQL Extended Statistics -- something to make every DBA happy! A new SHOW STATISTICS command gives a DBA the ability to monitor current and historical server usage to the minute, hour, or day and generate reports tracking activity to a specific database, originating user or host platform. ALL the parameters available in SHOW STATUS are available! The implementation of resource limits using GRANT I first searched for the code which implemented resource limits using GRANT. Since tracking was obviously happening there, it might be easy to extend and include more variables. This code is found in the sql directory (all references are from the top of source distribution), sql/sql_parse.cc The function check_mqh confirms the user has not exceed the allowed number of queries or updates/hour. It is invoked from a couple of locations. The data structure used to manage this information is defined in sql/structs.h typedef struct user_resources { uint questions, updates, connections, bits; } USER_RESOURCES; typedef struct user_conn { char *user, *host; uint len, connections, conn_per_hour, updates, questions, user_len; USER_RESOURCES user_resources; time_t intime; } USER_CONN; Thread structures are created or destroyed based on active connections (can't store this data there). The server records all unique user connections in hash_user_connections. This info is maintained even after a user thread terminates, e.g. after a server restart you begin a mysql session, a thread is started to process your request and an entry made in hash_user_connections. The number of queries you make is recorded there. When your connection ends the thread is destroyed, but your information is retained. If you come back a match is looked for and those counts are incremented. Each hour the numbers are all reset. It looked like custom code had been added in different places to increment the number of connections, updates, and questions -- a long way from the coverage of over 150 measurement points available in SHOW STATUS. It was a dead end. I decided to take a close look at how the statistics displayed in SHOW STATUS were stored. How MySQL currently tracks SHOW STATUS variables In a MySQL source distribution the files for mysqld (the server process) are also in the sql subdirectory. The file sql/mysqld.cc has a data structure which calls out all the performance variables in SHOW STATUS (the excerpts included in this article are just selected fragments, consult the source for a complete listing): struct show_var_st status_vars[]= { {"Bytes_received", (char*) &bytes_received, SHOW_LONG}, {"Bytes_sent", (char*) &bytes_sent, SHOW_LONG}, {"Com_insert", (char*) (com_stat+(uint) SQLCOM_INSERT),SHOW_LONG}, {"Com_select", (char*) (com_stat+(uint) SQLCOM_SELECT),SHOW_LONG}, }; It is basically a listing of names and addresses. You can see the function that produces the output for SHOW STATUS in sql/sql_show.cc It is just a large loop that traverses the whole structure and outputs the data using the specified formatting. At this point I should interject how pleased I was with the source code layout. It was separated into logical collections, the server in 'sql', common include files in 'include', and portablity libraries in 'mysys'. Variable and function names were meaningful. I'm normally the first to gripe when looking at someone else's files -- but I had few complaints here. Kudos to the MySQL staff! Digging a little farther I found the code that implements most of the statistical tracking: include/my_pthread.h: #ifdef SAFE_STATISTICS #define statistic_increment(V,L) thread_safe_increment((V),(L)); #define statistic_add(V,C,L) thread_safe_add((V),(C),(L)); #else #define statistic_increment(V,L) (V)++; #define statistic_add(V,C,L) (V)+=(C); #endif /* SAFE_STATISTICS */ The difference is the SAFE version does either a mutex lock before incrementing the value or an 'atomic_add'. I believe this is undefined in the default config for performance reasons and the lack of harm by potentially have the count slightly off. The invoking code is scattered throughout the source and has a form of: statistic_increment(com_other, &LOCK_status); // variable directly statistic_increment(com_stat[SQLCOM_SET_OPTION], &LOCK_status); // via an offset statistic_increment(com_stat[lex->sql_command],&LOCK_status); // via command offset The problem for detailed statistics. Although the code was already instrumented I began to see the scope of the problem confronting anyone trying to add detailed statistics to the server. At first it would seem a natural choke point would be to expand statistics_increment to record more information. Look at the invocations above. Very little info is available -- you don't even have a variable name, just an address. A LOT of hand edits would be required to add that information and I rejected the idea. It violated constraints I had established for my effort: * Not too much work. * Not too many changes to the MySQL source. The first constraint is self explanatory. The second was important if this new addition was to be accepted by the open source community. MySQL has a good reputation to maintain. I'm sure users would be skeptical of looking at hundreds and hundreds of diffs where tweaks had been. All it takes is one errant pointer to take down the server. A design that resulted in Extended Statistics. It was clear the simplest path would involve adding functionality to the existing statistic_increment, something like: #define statistic_increment(V,L) { (V)++; statistics_inc(&V,1); } Where the new function statistics_inc would encapsulate the additional processing. In this function several problems had to be overcome: * How to know which user, host, or db is involved with the increment? * How to map a variable address back to its name? * What data structure to use for storing the data? I can now describe how a solution was developed: * Which user, host, or db is involved? In the file sql/mysql_priv.h -- which contains many of the data structures unique to the server. A #define gives easy access to the current thread information: inline THD *_current_thd(void) { return my_pthread_getspecific_ptr(THD*,THR_THD); } #define current_thd _current_thd() This in turn (defined in mysys/my_pthread.h) used the POSIX pthread_getspecific() when supported. This is implemented as a MACRO, does not involve a trap into the kernel, so overhead was acceptable. The thread class itself (THD), one of the main structures used in the server, is defined in: sql/sql_class.h It is a large class with many member functions and data items, of interest to us are: class THD { char *host,*user,*db; } That was easy. We know the context in which the variable was being incremented -- the only remaining issue was how to identify and track it. Quickly! * A variable address back to the name and what data structure? The large data structure status_vars, defined in sql/mysqld.cc had the mapping between addresses and names. I could loop through over 150 entries looking for a match! A terrible waste of time. At this point a simplifying assumption was made about the data to be recorded. We would not provide Extended Statistics for everything the server had available -- the user would specify a list of variables they were interested in. Easy to implement via the /etc/my.cnf server configuration file. Since over 90% of DBAs would be interested in less than 10% of the variables -- it captured the essential work saving compromise! -- Terminology & Design As I began to think about how this data would be recorded I needed something more descriptive than just 'data'. Not to be accused of being an innovator I decided to borrow from Object Oriented Methodology, which we know is always a 'good thing.' I wanted to make the system flexible and struggled for a while at how to look at the problem. This was the final description: The system would record information on various 'Classes' of objects, e.g. which db, which host, which user was involved with the server. Within a 'Class' individual members would be referred to as 'Instances', e.g. in the user class we have Melanie and Domenic. For each instance we track a predefined list of server variables 'Vars', e.g. the items Bytes_sent and Com_select from SHOW STATUS. --- Managing Time In the first design cut my goal was just to duplicate the output of SHOW STATUS for each Instance of a Class, e.g. show the number of select statements issued by user Melanie. Given that datum it would be easy to design external tools that would periodically query the server, store prior results, and provide rate and historical information. But once the core worked it was easy to add another counter in a circular buffer configurable for a number of hours, .e.g. the user could specify that for Class db, they wanted to store data for all tracked Vars for a period of 24 hours. During internal testing I grew impatient waiting for hours to roll over. I was using some very simple code: time_t now_sec = time(NULL); localtime_r(&now_sec, &now_date); int now_time = now_date.tm_hour; if (now_time != last_time) { // do roll over stuff } Being a brilliant programmer, I realized I could just change the call from now_date.tm_hour to now_date.tm_min -- and my hours just became minutes! Testing went a lot faster. It was easy to look at time as just a binning unit. The user was allowed to specify the timeUnit used for each class: minutes, hours, or days. The data structure expanded to include: int maxTime; // max number of units we will store int time; // the current unit we are logging to int lastTime; // the last clock unit we saw for data logged char timeUnits; // d - day, h - hour, m - minute ---- Making it all happen FAST!!!! One key issue for speed was finding the correct Var location in a large 3-D data array. All statistical info is allocated dynamically when the server starts based on the number of recording Classes specified by the user, the max number of Instances, Vars, and time units. This was made fast by preloading some lookup information into the data structure when the system started and also lookup information for each new thread, e.g. once we knew the offsets to be used to find the instance of 'Domenic' from thread 5 in the data structure -- there was no need to recalculate again. Thanks to 'C' pointer arithmetic for making it easy and quick to increment through the various structures. The Extended Statistics implementation I can now take you through the critical execution threads used to support Extended Statistics processing. -- Server Initialization (performed once) The server reads the configuration data and stores the addresses of the variables of interest in a new data structure: typedef struct stat_vars { char *name; char *addr; } STAT_VAR, *pSTAT_VAR; This is part of a larger structure holding the statistical data for each Class: typedef struct stats_class_data { char name[STATS_MAX_NAME]; // class name ulong *data; // pointer to data area for that class // 3-d array [instance index][var index][hour index, 0-summary] int maxInstance; char **instanceIndex; // indexes for named instances int maxVar; STAT_VAR **varIndex; // indexes for vars, name & addr } STATS_CLASS_DATA, *pSTATS_CLASS_DATA; While flexibility is a good thing -- we require the user to specify the maximum number of Instances they wanted to record within each Class they were tracking. This was something any DBA would probably know. Finally THE variable that anchors the whole mess in memory is the single global statData. STATS_CLASS_DATA statData[STATS_MAX_CLASSES]; ---- Thread Initialization (once per connection) When a new thread begins in response to a connection the following function determines index values used later to increment statistics. The new thread potentially represents different Instances of Classes the user is tracking. We scan the Class structure for a match between the different Instances recorded there and the instance information represented in the new thread, e.g. This new thread is a connection belonging to the Class 'host' and the Instance 'ruler.thebook.com' -- have indexes in the statistics data already been assigned for this combination? statistics_thread_init(THD *thd) { // We scan through all the defined classes for (stats_class_data = statData, class_i = 0; stats_class_data->maxInstance && class_i < STATS_MAX_CLASSES; stats_class_data++, class_i++) { // given the Class name, determine our Instance switch (stats_class_data->name[0]) { case 'u': // user tmpName = thd->user; break; case 'h': // host tmpName = thd->host; break; } // We have the Instance, does it match something in existing data for (char **instPtr = (char **)stats_class_data->instanceIndex, i = 0; i < stats_class_data->maxInstance; instPtr++, i++) { // a new entry, record in data, and store index in thread if (!*instPtr) { *instPtr = my_strdup(tmpName, MYF(0); thd->statIndex[class_i] = i; // remember the offset of our instance break; } // end if - new value // an existing entry, record the assigned values if (!strcmp(*instPtr, tmpName)) { thd->statIndex[class_i] = i; found = 1; break; } } // end for - each instance ----- The increment algorithm (done constantly) The optimizations already performed allow the algorithm responsible for incrementing statistics to be more efficient. The number of loops required to find an item is limited and is controlled by the amount of statistics the end user is tracking. void statistics_inc(ulong *V, ulong C) { THD *thd = current_thd; int class_i = 0; time_t now_sec = time(NULL); struct tm now_date; localtime_r(&now_sec, &now_date); // check time for each Class, may be different for (pSTATS_CLASS_DATA stats_class_data = statData; stats_class_data->maxInstance; stats_class_data++) { switch (stats_class_data->timeUnits) { case 'm': now_time = now_date.tm_min; break; case 'h': now_time = now_date.tm_hour; break; } if (now_time != stats_class_data->lastTime) { // rolled over stats_class_data->lastTime = now_time; // increment time and do checks stats_class_data->time++; if (stats_class_data->time >= stats_class_data->maxTime) { // reset to start stats_class_data->time = 1; // zero index is used for totals } // reset counters for that time to zero for all Vars } // end if - new time } // end for - all classes // look for the Var being tracked in all classes STAT_VAR **var_ptr; class_i = 0; for (pSTATS_CLASS_DATA stats_class_data = statData; stats_class_data->maxInstance; stats_class_data++, class_i++) { var_ptr = stats_class_data->varIndex; for (int var = 0, maxVar = stats_class_data->maxVar, maxTime = stats_class_data->maxTime ; var < maxVar ; var_ptr++,var++) { if ((*var_ptr)->addr == (char *)V) { // we are tracking it i = thd->statIndex[(int)class_i]; // pull our instance index // increment it for the class for the 0 (summary time) and the current time stats_class_data->data[ARRAY_INDEX(i, var, maxVar, stats_class_data->time, maxTime)] += C; stats_class_data->data[ARRAY_INDEX(i, var, maxVar, 0, maxTime)] += C; } // end if - match } // end for - all vars } // end for - all classes } ------ Where to find, summary, and future direction I think Extended Statistics will be a welcome addition to MySQL and might even put it ahead of industry standards such as Oracle, DB2, and SQL Server for ease in collecting and reporting usage data. It certainly opened our eyes when we used it to monitor some of our active MySQL servers. We saw thousands of queries issued/hour for users we thought were making just nominal usage of the DB. We also enjoy the ability to quickly look at recent historical data even after a surge event has passed. We now feel in control of our servers and can provide better service to all. It may be a while before it is accepted into the main MySQL distribution. In the interim you are welcome to download the source from our web site at http://www.softwareworkshop.com/mysql Evaluation copies of precompiled mysqld binaries are also available at no cost for selected hardware/software platforms. Your additional thoughts and contributions are welcome! We do plan on become a focal point for development in this area. ==================== SIDE BAR - my.cnf config and new SQL syntax /etc/my.cnf example statistics_class_list="user, max-50, time-60, units-m, (QuestionsQ,Com_select,Com_update,Bytes_sent,Bytes_received), db, max-50, time-10, units-h, (Bytes_sent,Bytes_received,Com_select,Com_update,Com_insert), host, max-5, time-3, units-d, (Com_select,Com_update,Bytes_sent,Bytes_received,QuestionsQ)" SHOW STATISTICS ( * | Var list) FROM Class [WHERE var ( '<' | '>' | '=' ) num] [LIKE 'Instance pattern'] [ORDER BY Var] [HISTORY] [LIMIT num] =================== SIDE BAR - Example output mysql> SHOW STATISTICS * FROM user LIMIT 2; (limit of 2 rows) +-------------+------------+--------+--------+------------+----------------+ | user | Questions | select | update | Bytes_sent | Bytes_received | +-------------+------------+--------+--------+------------+----------------+ | thebookdemo | 23 | 8 | 0 | 27350 | 580 | | thebook | 32 | 1 | 0 | 28303 | 530 | +-------------+------------+--------+--------+------------+----------------+ 2 rows in set (0.00 sec) mysql> SHOW STATISTICS * FROM user HISTORY LIMIT 5; (limit # of hours) +-------------+---------+------------+--------+--------+------------+----------------+ | user | minutes | Questions | select | update | Bytes_sent | Bytes_received | +-------------+---------+------------+--------+--------+------------+----------------+ | thebookdemo | 0 | 1 | 0 | 0 | 0 | 48 | | thebookdemo | -1 | 2 | 0 | 0 | 8506 | 80 | | thebookdemo | -2 | 7 | 5 | 0 | 9151 | 200 | | thebookdemo | -3 | 6 | 3 | 0 | 4758 | 144 | | thebookdemo | -4 | 1 | 0 | 0 | 4057 | 30 | | thebook | 0 | 0 | 0 | 0 | 0 | 0 | | thebook | -1 | 0 | 0 | 0 | 0 | 0 | | thebook | -2 | 0 | 0 | 0 | 0 | 0 | | thebook | -3 | 0 | 0 | 0 | 0 | 0 | | thebook | -4 | 32 | 1 | 0 | 28303 | 530 | +-------------+---------+------------+--------+--------+------------+----------------+ 10 rows in set (0.00 sec) mysql> SHOW STATISTICS Questions, Bytes_sent FROM user WHERE Bytes_sent > 0 HISTORY LIMIT 5; +-------------+---------+------------+------------+ | user | minutes | Questions | Bytes_sent | +-------------+---------+------------+------------+ | thebookdemo | 0 | 3 | 8853 | | thebookdemo | -1 | 2 | 8506 | | thebookdemo | -2 | 7 | 9151 | | thebookdemo | -3 | 6 | 4758 | | thebookdemo | -4 | 1 | 4057 | | thebook | -4 | 32 | 28303 | +-------------+---------+------------+------------+ 6 rows in set (0.00 sec) mysql> SHOW STATISTICS * FROM db HISTORY limit 2; +-------------+-------+------------+----------------+--------+--------+--------+ | db | hours | Bytes_sent | Bytes_received | select | update | insert | +-------------+-------+------------+----------------+--------+--------+--------+ | thebookdemo | 0 | 48774 | 881 | 8 | 0 | 0 | | thebookdemo | -1 | 0 | 0 | 0 | 0 | 0 | | thebook | 0 | 28303 | 530 | 1 | 0 | 0 | | thebook | -1 | 0 | 0 | 0 | 0 | 0 | +-------------+-------+------------+----------------+--------+--------+--------+ 6 rows in set (0.00 sec) mysql> SHOW STATISTICS * FROM host LIKE '%thebook%' [ORDER BY Bytes_sent]**; +--------------------+--------+--------+------------+----------------+------------+ | host | select | update | Bytes_sent | Bytes_received | Questions | +--------------------+--------+--------+------------+----------------+------------+ | lathe.thebook.com | 8 | 0 | 53113 | 1007 | 29 | | anvil.thebook.com | 1 | 0 | 28379 | 616 | 32 | | hammer.thebook.com | 2 | 0 | 495 | 603 | 16 | +--------------------+--------+--------+------------+----------------+------------+ 3 rows in set (0.00 sec) **order by to be implemented soon!