MySQL High Performance

Syndicate content
Percona's Data performance and scalability blog
Updated: 12 hours 12 min ago

Online DDL and Percona XtraDB Cluster 5.6

Fr, 2015-10-09 14:11

I presented a talk at Percona Live Amsterdam a few weeks ago (simple recipes for your Galera cluster), and one of the topics was schema changes. I discussed the drawbacks and benefits of Total Order Isolation (TOI), Rolling Schema Upgrades (RSU) and pt-online-schema-change. One thing I forgot was online DDL from 5.6. Does it provide another way to perform online schema changes?


While the online DDL feature of MySQL 5.6 is not modified in PXC, it doesn’t work as one could expect because Galera doesn’t allow concurrent writes when an ALTER TABLE statement is run. So if you are using TOI, any ALTER TABLE will block all writes on all nodes, exactly as it did with PXC 5.5.

If you want an online ALTER TABLE, your best bet is pt-online-schema-change.

Online DDL with TOI

My test was simple: I set up a 3-node cluster, ran sysbench against node1 to insert records on a table and ran a concurrent ALTER TABLE on node2 (adding an index). Here is what the Com_insert status variable indicated during the run:

# node1 | Com_insert | 563 | | Com_insert | 75 | | Com_insert | 1532 | | Com_insert | 10 | | Com_insert | 4 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 11 | | Com_insert | 135 | # node2 | Com_insert | 12 | | Com_insert | 575 | | Com_insert | 307 | | Com_insert | 1231 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 1 | | Com_insert | 133 | # node3 | Com_insert | 559 | | Com_insert | 339 | | Com_insert | 736 | | Com_insert | 465 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 0 | | Com_insert | 5 | | Com_insert | 121 |

We can clearly see that while ALTER TABLE is running, all writes are stalled. What happened? If we refer to the online DDL documentation, we could expect that concurrent writes would be allowed.

The reason is that under TOI, Galera simply forbids applying concurrent transactions on the table that is being altered.

Do we see different results if we write to table t2 while changing the schema of table t1? No, we don’t. Again Galera prevents all writes during an ALTER TABLE statement.

As a sidenote, I’ve sometimes seen during my tests that concurrent writes were still working on node1, but not on other nodes. However after a few seconds, flow control kicked in and prevented all writes on the cluster. I’ve not been able to reproduce this behavior predictably, so I’m not sure at this point what happened in this case.

Online DDL with RSU

Online DDL with RSU is not as critical because in this case an ALTER TABLE statement will, at worst, block writes on the node where the schema change is running. And this node will not trigger flow control anyway.

The only good point of potentially allowing concurrent writes would be that replication will catch up faster. This can be important on busy clusters, when you need all nodes to handle the workload.

But again… an ALTER TABLE will block all concurrent writes, and you can’t benefit from the online DDL feature.


PXC 5.6 cannot benefit from the online DDL feature. So when you need to modify the schema of larger tables and when RSU is not an option, pt-online-schema-change is still your best friend.

The post Online DDL and Percona XtraDB Cluster 5.6 appeared first on MySQL Performance Blog.

When mysqld kills mysqld

Fr, 2015-10-09 13:40

The other day a colleague and friend of mine, Peter Boros, had a case where one of our clients had to track down the process shutting down MySQL. This blog is based on the discussion we had about that internally.

Our client wanted Peter to identify the culprit behind periodic shutdowns. This proved to be slightly more difficult than usual, for reasons that you might guess from the title of this blog.

Here is what Peter saw in the logs:

150928 15:15:33 [Note] /usr/sbin/mysqld: Normal shutdown 150928 15:15:36 [Note] Event Scheduler: Purging the queue. 0 events 150928 15:15:39 [Warning] /usr/sbin/mysqld: Forcing close of thread 411515 user: 'dashboard' 150928 15:15:40 InnoDB: Starting shutdown... 150928 15:15:59 InnoDB: Shutdown completed; log sequence number 4873840375 150928 15:16:00 [Note] /usr/sbin/mysqld: Shutdown complete

Some of you may recall that I wrote a blog post about tools that can help you identify other processes that send signals to mysqld. Peter chose SystemTap to track down the process. The script he used to trace it was from David Busby’s blog:

probe signal.send { if (sig_name == "SIGKILL" || sig_name == "SIGTERM") printf("[%s] %s was sent to %s (pid:%d) by %s uid:%dn", ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), uid() ) }

Using this SystemTap script Peter discovered that the “killer” was mysqld:

[Wed Sep 16 18:57:33 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497 [Wed Sep 16 18:57:34 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497 [Wed Sep 16 18:57:34 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497 [Wed Sep 16 18:57:40 2015] SIGTERM was sent to mysqld (pid:31191) by mysqld uid:497 [Mon Sep 28 15:15:31 2015] SIGTERM was sent to mysqld (pid:12203) by mysqld uid:497 [Mon Sep 28 15:15:31 2015] SIGTERM was sent to mysqld (pid:12203) by mysqld uid:497 [Mon Sep 28 15:16:00 2015] SIGTERM was sent to mysqld (pid:12203) by mysqld uid:497

As you can see above, mysqld received a SIGTERM from mysqld. How is this possible? Let’s try to interpret what happened here!

According to the manual, server shutdown can be initiated in different ways. For instance:

  • SIGTERM is sent to mysqld by a UNIX user
  • server is shut down administratively via mysqladmin shutdown by a privileged mysql user

Let’s assume that we are talking about the first example, where a privileged process/script sends a SIGTERM to mysqld. If that was the case we would get:

[root@centos7 ~]# kill -15 `pidof -x mysqld` [root@centos7 ~]#

[root@centos7 ~]# ./signals.stp [Thu Oct 1 17:56:36 2015] SIGTERM was sent to mysqld (pid:2105) by bash uid:0 [Thu Oct 1 17:56:37 2015] SIGTERM was sent to mysqld (pid:2105) by mysqld uid:995

The first line in the above output shows the client process (bash) that issued the TERM signal to MySQL. In response, MySQL started a signal handler thread and processed shutdown (COM_SHUTDOWN) using that thread. In turn the referenced function kill_mysqld() may send a signal to current_pid on behalf of the initiating process.

As a side note, in the above output you don’t see anything related to threads. You could get even more detail about MySQL’s operation if you were to modify the tapscript to include tgkill system calls and display related thread IDs as well:

#!/usr/bin/env stap probe signal.send, signal.systgkill { if (sig_name == "SIGKILL" || sig_name == "SIGTERM") printf("[%s] %s was sent to %s (pid:%d) by %s (pid: %d, tid:%d) uid:%dn", ctime(gettimeofday_s()), sig_name, pid_name, sig_pid, execname(), pid(), tid(), uid() ) }

While this might be useful to better comprehend how mysqld behaves, it is irrelevant in our search for the culprit process, so I’m not going to include the output of that script here – that exercise will be left to you, dear reader!

Now what happens if a MySQL user with administrative privileges initiates the shutdown via the console instead? We get:

[root@centos7 ~]# mysqladmin shutdown [root@centos7 ~]#

[root@centos7 ~]# ./signals.stp [Thu Oct 1 17:59:08 2015] SIGTERM was sent to mysqld (pid:3906) by mysqld uid:995 [Thu Oct 1 17:59:10 2015] SIGTERM was sent to mysqld (pid:3906) by mysqld uid:995

You see that this time the sender was mysqld, which thoroughly resembles the original output that Peter had. Thus, we now know that what we were looking for was a program called mysqladmin shutdown!

Unfortunately, this means that the program may not be local and the client could connect from a different host. A local recursive grep may or may not solve our problem. However, if we enable general-log with log-warnings=2 it might yield something like:

[root@centos7 ~]# tail -F /var/log/mysqld_generic.log 151001 17:07:57 5 Connect robert@ on 5 Shutdown

Thus, we now know where to run our recursive grep for that rogue mysqladmin shutdown (or a similar, API-leveraging) process! In my case it was running on remote host and connected as MySQL user ‘robert’.

Of course you could find alternative methods to track down MySQL connections but that’s beyond what I intended to include in this blog. Perhaps in the next?

The post When mysqld kills mysqld appeared first on MySQL Performance Blog.

Using Apache Spark and MySQL for Data Analysis

Mi, 2015-10-07 22:49

What is Spark

Apache Spark is a cluster computing framework, similar to Apache Hadoop. Wikipedia has a great description of it:

Apache Spark is an open source cluster computing framework originally developed in the AMPLab at University of California, Berkeley but was later donated to the Apache Software Foundation where it remains today. In contrast to Hadoop’s two-stage disk-based MapReduce paradigm, Spark’s multi-stage in-memory primitives provides performance up to 100 times faster for certain applications. By allowing user programs to load data into a cluster’s memory and query it repeatedly, Spark is well-suited to machine learning algorithms.

In contrast to popular belief, Spark does not require all data to fit into memory but will use caching to speed up the operations (just like MySQL). Spark can also run in standalone mode and does not require Hadoop; it can also be run on a single server (or even laptop or desktop) and utilize all your CPU cores.

Starting it in a distributed mode is really easy. Start the “master” first. You can run the “slave” on the same node:

root@thor:~/spark# ./sbin/ less ../logs/spark-root-org.apache.spark.deploy.master.Master-1-thor.out 15/08/25 11:21:21 INFO Master: Starting Spark master at spark://thor:7077 15/08/25 11:21:21 INFO Master: Running Spark version 1.4.1 15/08/25 11:21:21 INFO Utils: Successfully started service 'MasterUI' on port 8080. 15/08/25 11:21:21 INFO MasterWebUI: Started MasterWebUI at root@thor:~/spark# ./sbin/ spark://thor:7077

Then run Spark Worker on any additional nodes (make sure to add the hostname to /etc/hosts or use DNS):

root@d31:~/spark# ./sbin/ spark://thor:7077

Why Spark and Not MySQL?

There are a number of tasks where MySQL (out-of-the-box) does not show great performance. One of the MySQL limitations is: 1 query = 1 cpu core. It means that even if you have 48 fast cores and a large dataset to process (i.e. group by, sort, etc) it will not utilize the full computing power. Spark, on the contrary, will be able to utilize all your CPU cores.

Another difference between MySQL and Spark:

  • MySQL uses so called “schema on write” – it will need the data to be converted into MySQL. If our data is not inside MySQL you can’t use “sql” to query it.
  • Spark (and Hadoop/Hive as well) uses “schema on read” – it can apply a table structure on top of a compressed text file, for example, (or any other supported input format)  and see it as a table; then we can use SQL to query this “table.”

In other words, MySQL is storage+processing while Spark’s job is processing only, and it can pipe data directly from/to external datasets, i.e., Hadoop, Amazon S3, local files, JDBC (MySQL/other databases). Spark supports text files (compressed), SequenceFiles, and any other Hadoop InputFormat as well as Parquet Columnar storage. Spark is more flexible in this regard compared to Hadoop: Spark can read data directly from MySQL, for example.

The typical pipeline to load external data to MySQL is:

  1. Uncompress (typically the external data is in compressed text files)
  2. Load it into MySQL’s staging table with “LOAD DATA INFILE”
  3. Only then we can filter/group by and save the result in another table

That can cause additional overhead. In many cases we do not need the “raw” data but we still have to load it into MySQL.

Why Spark Together With MySQL

On the contrary, the result of our analysis (i.e. aggregated data) should be in MySQL. It does not have to be, but it is much more convenient to store the result of your analysis in MySQL. Let’s say you want to analyze a big dataset (i.e. year to year sales comparison) and you will need to present it in the form of a table or graph. The result set will be significantly smaller as it will be aggregated and it will be much easier to store it in MySQL as many standard applications will work with that.

Real World Test Case

One of interesting free datasets is Wikipedia Page Counts. (>1TB compressed, available since 2008). This data can be downloaded (as gzipped space delimited text files) and is also available (limited dataset) on AWS. The data is aggregated by the hour and has the following fields:

  • project (i.e. “en”, “fr”, etc, which is usually a language)
  • title of the page (uri), urlencoded
  • number of requests
  • size of the content returned

(the date field is encoded inside the filename, 1 file per hour)

Our goal will be to find the top 10 pages by the number of requests per day in English Wikipedia, but also to support searching for an arbitrary word so we can show how, for example, the number of requests for the wikipedia article about “myspace” will compare to the article about “facebook” (2008 to 2015).

To do that in MySQL we will have to load it as is into MySQL. The files are distributed with the date part encoded. The uncompressed size of all files is > 10TB. Here are the possible steps (as per our typical MySQL pipeline):

  1. Uncompress the file and run “LOAD DATA INFILE” into a staging (temporary) table:
    load data local infile '$file' into table wikistats.wikistats_full CHARACTER SET latin1 FIELDS TERMINATED BY ' ' (project_name, title, num_requests, content_size) set request_date = STR_TO_DATE('$datestr', '%Y%m%d %H%i%S');
  2. Aggregate with “insert into” a final table
    insert into wikistats.wikistats_by_day select date(request_date) as request_day, title, count(*), sum(num_requests) from wikistats.wikistats_full group by request_day, title;
  3. Somehow url decode the title (may be using UDF).

This is a big overhead. We will uncompress and transform the data into MySQL just to discard most if it.

According to my calculations it should table > 1 month to do the whole pipeline for 6 years of data (this time does not include the uncompress time and does not include the load time depreciation as the table get bigger and bigger and indexes need to be updated). There are a lots of things we can do here to speed it up of course, i.e., load into different MySQL instances, load into MEMORY table first, then group by into InnoDB, etc.

But one of the easiest ways here will be using Apache Spark and Python script (pyspark). Pyspark can read the original gziped text files, query those text files with SQL, apply any filters, functions, i.e. urldecode, group by day and save the resultset into MySQL.

Here is the Python script to perform those actions:

from pyspark import SparkContext sc=SparkContext() # sc is an existing SparkContext. from pyspark.sql import SQLContext, Row sqlContext = SQLContext(sc) import urllib from datetime import timedelta, date def load_day(filename, mydate): # Load a text file and convert each line to a Row. lines = sc.textFile(filename) parts = l: l.split(" ")).filter(lambda line: line[0]=="en").filter(lambda line: len(line)>3).cache() wiki = p: Row(project=p[0], url=urllib.unquote(p[1]).lower(), num_requests=int(p[2]), content_size=int(p[3]))) #wiki.count() # Infer the schema, and register the DataFrame as a table. schemaWiki = sqlContext.createDataFrame(wiki) schemaWiki.registerTempTable("wikistats") group_res = sqlContext.sql("SELECT '"+ mydate + "' as mydate, url, count(*) as cnt, sum(num_requests) as tot_visits FROM wikistats group by url") # Save to MySQL mysql_url="jdbc:mysql://thor?user=wikistats&password=wikistats" group_res.write.jdbc(url=mysql_url, table="wikistats.wikistats_by_day_spark", mode="append") # Write to parquet file - if needed group_res.saveAsParquetFile("/ssd/wikistats_parquet_bydate/mydate=" + mydate) mount = "/data/wikistats/" d= date(2008, 1, 1) end_date = date(2008, 2, 1) delta = timedelta(days=1) while d < end_date: print d.strftime("%Y-%m-%d") filename=mount + "wikistats//" + d.strftime("%d") + "-*.gz" print(filename) load_day(filename, d.strftime("%Y-%m-%d")) d += delta

In the script I used Spark to read the original gzip files (1 day at a time). We can use directory as “input” or a list of files. I will then use Resilient Data Set (RDD) transformations; python has lambda functions: map and filter which will allow us to split the “input files” and filter them.

The next step will be to apply the schema (declare fields); here we can also apply any other functions; i.e., I use urllib.unquote to decode the title (urldecode). Finally, we can register the temp table and then use familiar SQL to do the group by.

The script will normally utilize all cpu cores. In addition it is very easy to run it in distributed mode even without Hadoop: just copy the files to all machines in a Spark cluster or use NFS/external storage.

The script took about an hour on 3 boxes to process 1 month of data and load the aggregated data to MySQL (single instance). We can estimate that to load all 6 years (aggregated) to MySQL is ~3 days.

You may now ask, why is it significantly faster (and we still have the result loaded to the same MySQL instance)? The answer is, it is a different, more efficient pipeline. In our original MySQL pipeline (which will probably take months) we load the raw data to MySQL. Here we filter and group on read, and write only what we need to MySQL.

One question may also come up here: do we actually need this whole “pipeline?” Can we simply run our analytical queries on top of the “raw” data? Well, that is possible, but will probably require 1000 nodes Spark Cluster to do it efficiently as it will need to scan through 5TB of data (see “more reading” below).

Multi-treaded Performance for MySQL Inserts

When using group_res.write.jdbc(url=mysql_url, table=”wikistats.wikistats_by_day_spark”, mode=”append”) Spark will use multiple threads to insert into MySQL.

+------+-----------+------------+-----------+---------+------+--------+--------------------------------------------------------------------------------------------------------+-----------+---------------+ | Id | User | Host | db | Command | Time | State | Info | Rows_sent | Rows_examined | +------+-----------+------------+-----------+---------+------+--------+--------------------------------------------------------------------------------------------------------+-----------+---------------+ | 1050 | root | localhost | wikistats | Query | 0 | init | show processlist | 0 | 0 | | 2133 | wikistats | thor:40994 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Colegio+san+ignacio', 1, 1) | 0 | 0 | | 2134 | wikistats | thor:40995 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Miloš_Crnjanski', 2, 3) | 0 | 0 | | 2135 | wikistats | thor:40996 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Robert_Edgar', 6, 7) | 0 | 0 | | 2136 | wikistats | thor:40997 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Eastern_Orange_Tip', 6, 7) | 0 | 0 | | 2137 | wikistats | thor:40998 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Image:Dresden_Augustusbrücke_Al | 0 | 0 | | 2138 | wikistats | thor:40999 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Diamond_and_pearl', 11, 24) | 0 | 0 | | 2139 | wikistats | thor:41000 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Operation_polo', 2, 2) | 0 | 0 | | 2140 | wikistats | thor:41001 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Template_talk:Edit-first-section | 0 | 0 | | 2141 | wikistats | thor:41002 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Bertha_of_Artois', 1, 1) | 0 | 0 | | 2142 | wikistats | thor:41003 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'A Change of Pace', 1, 1) | 0 | 0 | | 2143 | wikistats | thor:41005 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'FAIRCHILD-REPUBLIC A-10 THUNDERB | 0 | 0 | | 2144 | wikistats | thor:41006 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Special:Recentchangeslinked/Wiki | 0 | 0 | | 2145 | wikistats | thor:41007 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', ' | 0 | 0 | | 2146 | wikistats | thor:41008 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'List_of_Fleet_Air_Arm_aircraft_s | 0 | 0 | | 2147 | wikistats | thor:41009 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Systemic_sclerosis', 17, 29) | 0 | 0 | | 2148 | wikistats | thor:41011 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'tataviam', 1, 1) | 0 | 0 | | 2149 | wikistats | thor:41010 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'The_Devil_Wears_Prada_(film)#_no | 0 | 0 | | 2150 | wikistats | thor:41013 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Seaford_High_School', 5, 7) | 0 | 0 | | 2151 | wikistats | thor:41014 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Talk:Shocker_(hand_gesture)', 3, | 0 | 0 | | 2152 | wikistats | thor:41015 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Paul_Szabo', 14, 23) | 0 | 0 | | 2153 | wikistats | thor:41016 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'ausgereift', 1, 1) | 0 | 0 | | 2154 | wikistats | thor:41017 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Category:March_2005_news', 1, 2) | 0 | 0 | | 2155 | wikistats | thor:41018 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Foot_Locker_Inc', 10, 10) | 0 | 0 | | 2156 | wikistats | thor:41019 | NULL | Query | 0 | update | INSERT INTO wikistats.wikistats_by_day_spark VALUES ('2008-01-04', 'Abbey_Park,_Nottinghamshire', 3, | 0 | 0 | +------+-----------+------------+-----------+---------+------+--------+--------------------------------------------------------------------------------------------------------+-----------+---------------+ 25 rows in set (0.00 sec)

Monitoring your jobs

Spark provides you with a web interface  to monitor and manage your job. Here is the example: I’m running the application:

Result: Using Parquet Columnar Format vs MySQL InnoDB table

Spark supports Apache Parquet Columnar format, so we can save RDD as a parquet file (it can be saved to a directory to HDFS):

group_res.saveAsParquetFile("/ssd/wikistats_parquet_bydate/mydate=" + mydate)

Here we save the result of our pipeline (aggregated data) into Spark. I also utilize partitioning by day (“mydate=20080101”) and Spark can auto discover partitions in this format.

When I have my results, I want to query it. Let’s say I want to find the top 10 most frequently queried wiki pages in January 2018. I can do this query with MySQL (I will need to filter out main page and search pages):

mysql> SELECT lower(url) as lurl, sum(tot_visits) as max_visits , count(*) FROM wikistats_by_day_spark where lower(url) not like '%special%' and lower(url) not like '%page%' and lower(url) not like '%test%' and lower(url) not like '%wiki%' group by lower(url) order by max_visits desc limit 10; +--------------------------------------------------------+------------+----------+ | lurl | max_visits | count(*) | +--------------------------------------------------------+------------+----------+ | heath_ledger | 4247338 | 131 | | cloverfield | 3846404 | 131 | | barack_obama | 2238406 | 153 | | 1925_in_baseball#negro_league_baseball_final_standings | 1791341 | 11 | | the_dark_knight_(film) | 1417186 | 64 | | martin_luther_king,_jr. | 1394934 | 136 | | deaths_in_2008 | 1372510 | 67 | | united_states | 1357253 | 167 | | scientology | 1349654 | 108 | | portal:current_events | 1261538 | 125 | +--------------------------------------------------------+------------+----------+ 10 rows in set (1 hour 22 min 10.02 sec)

Please note, here we are using our already aggregated (summary by data) table, not the “raw” data.

As we can see, the query took 1 hour 22 mins. I have also saved the same results to Parquet (see the script), so now I can use it with Spark-SQL:

./bin/spark-sql --master local

This will use a local version of spark-sql, using 1 host only.

spark-sql> CREATE TEMPORARY TABLE wikistats_parquet USING org.apache.spark.sql.parquet OPTIONS ( path "/ssd/wikistats_parquet_bydate" ); Time taken: 3.466 seconds spark-sql> select count(*) from wikistats_parquet; select count(*) from wikistats_parquet; 227846039 Time taken: 82.185 seconds, Fetched 1 row(s) spark-sql> SELECT lower(url) as lurl, sum(tot_visits) as max_visits , count(*) FROM wikistats_parquet where lower(url) not like '%special%' and lower(url) not like '%page%' and lower(url) not like '%test%' and lower(url) not like '%wiki%' group by lower(url) order by max_visits desc limit 10; heath_ledger 4247335 42 cloverfield 3846400 42 barack_obama 2238402 53 1925_in_baseball#negro_league_baseball_final_standings 1791341 11 the_dark_knight_(film) 1417183 36 martin_luther_king,_jr. 1394934 46 deaths_in_2008 1372510 38 united_states 1357251 55 scientology 1349650 44 portal:current_events 1261305 44 Time taken: 1239.014 seconds, Fetched 10 row(s)

That took ~20 minutes, which is much faster.


Apache Spark provides a great and easy way to analyze and aggregate data. What I love about Spark vs other big data and analytical frameworks:

  • Open-source and actively developed
  • No dependency on tools, i.e., the input data and output data does not have to be in Hadoop
  • Standalone mode for quick start, easy to deploy
  • Massively parallel, easy to add nodes
  • Support of variety of input and output format; i.e., it can read/write to MySQL (vs JDBC driver) and Parquet Columnar format

However, there are a number of drawbacks:

  • It is still new so you can expect some bugs and undocumented behavior. Many of the errors are hard to explain.
  • It requires Java; Spark 1.5 only supports Java 7 and higher.  That also means it will require additional memory, which is reasonable nowadays.
  • You will need to run jobs through “spark-submit”

I believe Apache Spark is a great tool and can complement MySQL for data analytical and BI purposes.

More reading

The post Using Apache Spark and MySQL for Data Analysis appeared first on MySQL Performance Blog.

Capture database traffic using the Performance Schema

Do, 2015-10-01 12:42

Capturing data is a critical part of performing a query analysis, or even just to have an idea of what’s going on inside the database.

There are several known ways to achieve this. For example:

  • Enable the General Log
  • Use the Slow Log with long_query_time = 0
  • Capture packets that go to MySQL from the network stream using TCPDUMP 
  • Use the pt-query-digest with the –processlist parameter

However, these methods can add significant overhead and might even have negative performance consequences, such as:

Now, sometimes you just need to sneak a peek at the traffic. Nothing fancy. In that case, probably the faster and easiest way to gather some traffic data is to use pt-query-digest with the –processlist. It doesn’t require any change in the server’s configuration nor critical handling of files. It doesn’t even require access to the server, just a user with the proper permissions to run “show full processlist”. But, and this is a significantly big “but,” you have to take into account that polling the SHOW PROCESSLIST command misses quite a number of queries and gives very poor timing resolution, among other things (like the processlist Mutex).

What’s the alternative? Use the Performance Schema. Specifically: The events_statements_* and threads tables.

First, we have to make sure that we have the correspondent consumers enabled:

mysql> select * from setup_consumers where name like 'events%statement%' and enabled = 'yes'; +--------------------------------+---------+ | NAME | ENABLED | +--------------------------------+---------+ | events_statements_current | YES | | events_statements_history | YES | | events_statements_history_long | YES | +--------------------------------+---------+ 3 rows in set (0.00 sec)

Additionally, for statistics to be collected for statements, it is not sufficient to enable only the final statement/sql/* instruments used for individual statement types. The abstract statement/abstract/* instruments must be enabled as well. This should not normally be an issue because all statement instruments are enabled by default.

If you can’t see the event_statements_* consumers on your setup_consumers tables, you’re probably running a MySQL version prior to 5.6.3. Before that version, the events_statements_* tables didn’t exists. MySQL 5.6 might not be more widely used, as was already pointed out in this same blog.

Before continuing, it’s important to note that the most important condition at the moment of capture data is that:

The statements must have ended.

If the statement is still being executed, it can’t be part of the collected traffic. For the ones out there that want to know what’s running inside MySQL, there’s already a detailed non-blocking processlist view to replace [INFORMATION_SCHEMA. | SHOW FULL] PROCESSLIST available with Sys Schema (that will come as default in MySQL 5.7).

Our options to capture data are: get it from one of the three available tables: events_statements_current, events_statements_history or events_statements_history_long.

First option: use the events_statements_current table, which contains current statement events. Since we only want to get statements that have ended, the query will need to add the condition END_EVENT_ID IS NOT NULL to the query. This column is set to NULL when the event starts and updated to the thread current event number when the event ends, but when testing, there were too many missing queries. This is probably because between iterations, the associated threads were removed from the threads table or simply because the time between END_EVENT_ID being updated and the row being removed from the table is too short. This option is discarded.

Second option: The events_statements_history table contains the most recent statement events per thread and since statement events are not added to the events_statements_history table until they have ended, using this table will do the trick without additional conditions in order to know if the event is still running or not. Also, as new events are added, older events are discarded if the table is full.

That means that this table size is fixed. You can change the table size by modifying the variable performance_schema_events_statements_history_size. In the server version I used (5.6.25-73.1-log Percona Server (GPL), Release 73.1, Revision 07b797f) the table size is, by default, defined as autosized (-1) and can have 10 rows per thread. For example: if you are running 5 threads, the table will have 50 rows.

Since it is a fixed size, chances are that some events might be lost between iterations.

Third option: The events_statements_history_long table, which is kind of an extended version of events_statements_history table. Depending on the MySQL version, by default it can hold up to 10000 rows or be autosized (also modifiable with the variable performance_schema_events_statements_history_long_size)

One major -and not cool at all- drawback for this table is that “When a thread ends, its rows are removed from the table”. So it is not history-history data. It will go as far as the oldest thread, with the older event still alive.

The logical option to choose would be the third one: use the events_statements_history_long table. I’ve created a small script (available here) to collect infinite iterations on all the events per thread between a range of event_id’s. The idea of the range is to avoid capturing the same event more than once. Turns out that the execute a query against this table is pretty slow, something between 0.53 seconds and 1.96 seconds. It can behave in a quite invasive way.

Which leave us with the second option: The events_statements_history table.

Since the goal is to capture data in a slow log format manner,  additional information needs to be obtained from the threads table, which has a row for each server thread. The most important thing to remember: access to threads does not require a mutex and has minimal impact on server performance.

Combined, these two tables give us enough information to simulate a very comprehensive slow log format. We just need the proper query:

SELECT CONCAT_WS( '','# Time: ', date_format(CURDATE(),'%y%m%d'),' ',TIME_FORMAT(NOW(6),'%H:%i:%s.%f'),'\n' ,'# User@Host: ',t.PROCESSLIST_USER,'[',t.PROCESSLIST_USER,'] @ ',PROCESSLIST_HOST,' []  Id: ',t.PROCESSLIST_ID,'\n' ,'# Schema: ',CURRENT_SCHEMA,'  Last_errno: ',MYSQL_ERRNO,'  ','\n' ,'# Query_time: ',ROUND(s.TIMER_WAIT / 1000000000000, 6),' Lock_time: ',ROUND(s.LOCK_TIME / 1000000000000, 6),'  Rows_sent: ',ROWS_SENT,'  Rows_examined: ',ROWS_EXAMINED,'  Rows_affected: ',ROWS_AFFECTED,'\n' ,'# Tmp_tables: ',CREATED_TMP_TABLES,'  Tmp_disk_tables: ',CREATED_TMP_DISK_TABLES,'  ','\n' ,'# Full_scan: ',IF(SELECT_SCAN=0,'No','Yes'),'  Full_join: ',IF(SELECT_FULL_JOIN=0,'No','Yes'),'  Tmp_table: ',IF(CREATED_TMP_TABLES=0,'No','Yes'),'  Tmp_table_on_disk: ',IF(CREATED_TMP_DISK_TABLES=0,'No','Yes'),'\n' , t.PROCESSLIST_INFO,';') FROM performance_schema.events_statements_history s JOIN performance_schema.threads t using(thread_id) WHERE t.TYPE = 'FOREGROUND' AND t.PROCESSLIST_INFO IS NOT NULL AND t.PROCESSLIST_ID != connection_id() ORDER BY t.PROCESSLIST_TIME desc;

The idea of this query is to get a Slow Log format as close as possible to the one that can be obtained by using all the options from the log_slow_filter variable.

The other conditions are:

  • t.TYPE = ‘FOREGROUND’: The threads table provides information about background threads, which we don’t intend to analyze. User connection threads are foreground threads.
  • t.PROCESSLIST_INFO IS NOT NULL: This field is NULL if the thread is not executing any statement.
  • t.PROCESSLIST_ID != connection_id(): Ignore me (this query).

The output of the query will look like a proper Slow Log output:

# Time: 150928 18:13:59.364770 # User@Host: root[root] @ localhost []  Id: 58918 # Schema: test  Last_errno: 0 # Query_time: 0.000112 Lock_time: 0.000031  Rows_sent: 1  Rows_examined: 1  Rows_affected: 0 # Tmp_tables: 0  Tmp_disk_tables: 0 # Full_scan: No  Full_join: No  Tmp_table: No  Tmp_table_on_disk: No INSERT INTO sbtest1 (id, k, c, pad) VALUES (498331, 500002, '94319277193-32425777628-16873832222-63349719430-81491567472-95609279824-62816435936-35587466264-28928538387-05758919296' , '21087155048-49626128242-69710162312-37985583633-69136889432');

And this file can be used with pt-query-digest to aggregate similar queries, just as it was a regular slow log output. I ran a small test which consists of:

  • Generate traffic using sysbench. This is the sysbench command used:
    sysbench --test=/usr/share/doc/sysbench/tests/db/oltp.lua --mysql-host=localhost --mysql-port=3306 --mysql-user=root --mysql-password= --mysql-db=test --mysql-table-engine=innodb --oltp-test-mode=complex --oltp-read-only=off --oltp-reconnect=on --oltp-table-size=1000000 --max-requests=100000000 --num-threads=4 --report-interval=1 --report-checkpoints=10 --tx-rate=0 run
  • Capture the data using slow log + long_query_time = 0
  • Capture data using pt-query-digest –processlist
  • Capture data from Performance Schema
  • Run pt-query-digest on the 3 files

The results were:

– Slow Log:

# Profile # Rank Query ID           Response time Calls  R/Call V/M   Item # ==== ================== ============= ====== ====== ===== ============== #    1 0x813031B8BBC3B329 47.7743 18.4%  15319 0.0031  0.01 COMMIT #    2 0x737F39F04B198EF6 39.4276 15.2%  15320 0.0026  0.00 SELECT sbtest? #    3 0x558CAEF5F387E929 37.8536 14.6% 153220 0.0002  0.00 SELECT sbtest? #    4 0x84D1DEE77FA8D4C3 30.1610 11.6%  15321 0.0020  0.00 SELECT sbtest? #    5 0x6EEB1BFDCCF4EBCD 24.4468  9.4%  15322 0.0016  0.00 SELECT sbtest? #    6 0x3821AE1F716D5205 22.4813  8.7%  15322 0.0015  0.00 SELECT sbtest? #    7 0x9270EE4497475EB8 18.9363  7.3%   3021 0.0063  0.00 SELECT performance_schema.events_statements_history performance_schema.threads #    8 0xD30AD7E3079ABCE7 12.8770  5.0%  15320 0.0008  0.01 UPDATE sbtest? #    9 0xE96B374065B13356  8.4475  3.3%  15319 0.0006  0.00 UPDATE sbtest? #   10 0xEAB8A8A8BEEFF705  8.0984  3.1%  15319 0.0005  0.00 DELETE sbtest? # MISC 0xMISC              8.5077  3.3%  42229 0.0002   0.0 <10 ITEMS>

– pt-query-digest –processlist

# Profile # Rank Query ID           Response time Calls R/Call V/M   Item # ==== ================== ============= ===== ====== ===== =============== #    1 0x737F39F04B198EF6 53.4780 16.7%  3676 0.0145  0.20 SELECT sbtest? #    2 0x813031B8BBC3B329 50.7843 15.9%  3577 0.0142  0.10 COMMIT #    3 0x558CAEF5F387E929 50.7241 15.8%  4024 0.0126  0.08 SELECT sbtest? #    4 0x84D1DEE77FA8D4C3 35.8314 11.2%  2753 0.0130  0.11 SELECT sbtest? #    5 0x6EEB1BFDCCF4EBCD 32.3391 10.1%  2196 0.0147  0.21 SELECT sbtest? #    6 0x3821AE1F716D5205 28.1566  8.8%  2013 0.0140  0.17 SELECT sbtest? #    7 0x9270EE4497475EB8 22.1537  6.9%  1381 0.0160  0.22 SELECT performance_schema.events_statements_history performance_schema.threads #    8 0xD30AD7E3079ABCE7 15.4540  4.8%  1303 0.0119  0.00 UPDATE sbtest? #    9 0xE96B374065B13356 11.3250  3.5%   885 0.0128  0.09 UPDATE sbtest? #   10 0xEAB8A8A8BEEFF705 10.2592  3.2%   792 0.0130  0.09 DELETE sbtest? # MISC 0xMISC              9.7642  3.0%   821 0.0119   0.0 <3 ITEMS>

– Performance Schema

# Profile # Rank Query ID           Response time Calls R/Call V/M   Item # ==== ================== ============= ===== ====== ===== ============== #    1 0x813031B8BBC3B329 14.6698 24.8% 12380 0.0012  0.00 COMMIT #    2 0x558CAEF5F387E929 12.0447 20.4% 10280 0.0012  0.00 SELECT sbtest? #    3 0x737F39F04B198EF6  7.9803 13.5% 10280 0.0008  0.00 SELECT sbtest? #    4 0x3821AE1F716D5205  4.6945  7.9%  5520 0.0009  0.00 SELECT sbtest? #    5 0x84D1DEE77FA8D4C3  4.6906  7.9%  7350 0.0006  0.00 SELECT sbtest? #    6 0x6EEB1BFDCCF4EBCD  4.1018  6.9%  6310 0.0007  0.00 SELECT sbtest? #    7 0xD30AD7E3079ABCE7  3.7983  6.4%  3710 0.0010  0.00 UPDATE sbtest? #    8 0xE96B374065B13356  2.3878  4.0%  2460 0.0010  0.00 UPDATE sbtest? #    9 0xEAB8A8A8BEEFF705  2.2231  3.8%  2220 0.0010  0.00 DELETE sbtest? # MISC 0xMISC              2.4961  4.2%  2460 0.0010   0.0 <7 ITEMS>

The P_S data is closer to the Slow Log one than the captured with regular SHOW FULL PROCESSLIST, but it is still far from being accurate. Remember that this is an alternative for a fast and easy way to capture traffic without too much trouble, so that’s a trade-off that you might have to accept.

Summary: Capture traffic always comes with a tradeoff, but if you’re willing to sacrifice accuracy it can be done with minimal impact on server performance, using the Performance Schema. Because P_S is enabled by default since MySQL 5.6.6 you might already be living with the overhead (if using 5.6). If you are one of the lucky ones that have P_S on production, don’t be afraid to use it. There’s a lot of data already in there.

The post Capture database traffic using the Performance Schema appeared first on MySQL Performance Blog.

Percona XtraDB Cluster 5.6.25-25.12 is now available

Mo, 2015-09-21 21:20

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on September 21st 2015. Binaries are available from downloads area or from our software repositories.

Based on Percona Server 5.6.25-73.1 including all the bug fixes in it, Galera Replicator 3.12, and on Codership wsrep API 25.11, Percona XtraDB Cluster 5.6.25-25.12 is now the current General Availability release. All of Percona’s software is open-source and free, and all the details of the release can be found in the 5.6.25-25.12 milestone at Launchpad.

New Features:

  • Percona XtraDB Cluster has implemented Support for PROXY protocol. The implementation is based on a patch developed by Thierry Fournier.
  • MTR coverage has been added to all tests in galera suite. Many bugs associated with mtr tests have been fixed.
  • A new variable gcache.keep_pages_count, analogous to gcache.keep_pages_size, has been added. The variable limits the number of overflow pages rather than the total memory occupied by all overflow pages. Whenever either of the variables are updated at runtime to a non-zero value, cleanup is called on excess overflow pages to delete them. This feature also fixes the bugs with integer overflow in the gcache module.
  • Updates have been made to wsrep code to ensure greater concordance with binary log and GTID so that failover of async slaves, among nodes of the cluster is seamless and consistent. To ensure this in #1421360, all FLUSH commands (except FLUSH BINARY LOG and FLUSH LOGS, and read lock-based flush such as FLUSH TABLES WITH READ LOCK and FLUSH TABLES FOR EXPORT), ANALYZE TABLE, Percona Server-specific flush statements for user statistics and page tracking bitmaps are executed under Total Order Isolation (TOI) so that they are replicated to other nodes in the cluster when they are written to binary log.
  • Percona XtraDB Cluster has temporarily disabled savepoints in triggers and stored functions. The reason is that even having fixed bug #1438990 and bug #1464468 we have found more cases where savepoints in triggers break binary logging and replication, resulting in server crashes and broken slaves. This feature will be disabled until the above issues are properly resolved.

Bugs Fixed:

  • SHOW STATUS LIKE ... and SHOW STATUS were taking time proportional to size of gcache.size. Bug fixed #1462674.
  • When disk space would get filled with files, Galera would crash when the next page file was created. Bug fixed #1488535.
  • XtraBackup SST didn’t clobber backup-my.cnf which caused SST to fail. Bug fixed #1431101.
  • Error from process::wait was not checked in joiner thread leading to joiner starting erroneously even when SST had failed. Bug fixed #1402166.
  • Due to an regression introduced in Percona XtraDB Cluster 5.6.24-25.11, update of the wsrep_cluster_address variable, following the update of wsrep_provider_options variable would cause the server to deadlock.
  • mysqldump SST could stall due to a regression in desync mutex introduced in Percona XtraDB Cluster 5.6.24-25.11 by fixing the bug #1288528.
  • mysql_tzinfo_to_sql sets wsrep_replicate_myisam variable at session scope so that statements are replicated correctly.
  • Percona-XtraDB-Cluster-devel-56 package was not included in the Percona-XtraDB-Cluster-full-56 metapackage on CentOS due to a conflict with upstream mysql package.
  • Running service mysql start and then service mysql@boostrap start afterwards would cause server shutdown.
  • NO_WRITE_TO_BINLOG / LOCAL for ANALYZE TABLE, OPTIMIZE TABLE, REPAIR TABLE, FLUSH commands will ensure it is not written to binary log (as in mysql async replication) and not replicated in wsrep.
  • FLUSH TABLES WITH READ LOCK failure (with non-existent tables) didn’t resume the galera provider, causing deadlock.
  • Percona XtraDB Cluster will not blocking DDL statements on tables which are used with ... FOR EXPORT or ... WITH READ LOCK, it will give return an error message about read lock.
  • Fixed the update of wsrep_slave_threads variable regarding the default value assignment, invalid value truncation, and error issued while threads are still being closed.
  • The mysql client in Percona XtraDB Cluster has been built with system readline instead of editline.
  • Bugs in 32-bit galera associated with statvfs in available_storage and integer overflow after multiplication in offset calculation have been fixed.
  • Galera 3 was failing to build on all non-intel platforms. Architecture specific CCFLAGS have been removed and provision for inheriting CCFLAGS, CFLAGS, CXXFLAGS and LDFLAGS have been added to SConstruct.
  • Non-global read locks such as FLUSH TABLES WITH READ LOCK and FLUSH TABLES FOR EXPORT paused galera provider but didn’t block commit globally which wsrep_to_isolation_begin (for DDL) was made aware of.

Following bug fixes have been ported from Percona Server 5.6.26-74.0: bug #1454441, bug #1470677, bug #1472256, bug #1472251, and bug #1464468.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.6.25-25.12 is now available appeared first on MySQL Performance Blog.

Containing your logical backups: mydumper in docker

Mo, 2015-09-21 20:54

Even with software like Percona Xtrabackup, logical backups remain an important component of a thorough backup strategy. To gather a logical backup in a timely fashion we rely on a tool called mydumper. In the Percona Managed Services department we’re lucky enough to have one of the project’s most active contributors and many of the latest features and bug fixes have been released to satisfy some of our own use cases. Compiling mydumper remains one of my personal bug bears and undoubtedly the highest barrier to entry for many potential adopters. There are a few conditions that make compiling it difficult, tiring or even prohibitive. The open source logical backup tool does not supply any official packages, however our friends over at TwinDB are currently shipping a package for CentOS/RHEL. So what if you’re running something else, like Debian, Ubuntu, Arch? Well recently I had a flash of inspiration.

Since I’ve been turning some ideas into docker containers, it dawned on me that it would be a trivial image to create and would add instant portability to the binaries. With a docker environment you can take an image and run a logical backup through to a mapped volume. It’s almost as easy as that.

So let me show you what I mean. I have built a docker image with the mydumper, myloader and mysql client libraries installed and it’s available for you on docker hub. This means that we can call a new container to make our backup without technically installing mydumper anywhere. This can get you from zero to mydumper very fast if there are hurdles in your way to deploying the open source backup tool into production.

With the grand assumption that you’ve got Docker installed somewhere, lets pull the image from the docker hub

docker pull mysqlboy/mydumper

Once all the layers download (it’s < 200MB) you’re all set to launch a backup using the mydumper binary. You can roll your own command but it could look similar to;

docker run --name mydumper --rm -v {backup_parent_dir}:/backups mysqlboy/mydumper mydumper --host={host_ipaddr} --user={mysql_username} --password={mysql_password} --outputdir=/backups --less-locking --compress --use-savepoints

If you’re unfamiliar with Docker itself; a very high level summary for you; Docker is a product intended to facilitate process isolation or ‘micro services’ known as containerization. It intends to be lighter and more efficient than Virtual Machines as we traditionally know them. There’s much more to this work flow than I intend to explain here but please see the further learning section in the footer.

Let me explain a little of the above call. We want to launch a mydumper run isolated to a container. We are giving the docker daemon the instruction to remove the container after it finishes it’s run (–rm), we are calling the container to be an ‘instance’ of the mysqlboy/mydumper image and we are passing a traditional mydumper command as the container’s instruction. We have mapped a location on the local filesystem into the container to ensure that the backup persists after the container is stopped and removed. The mydumper command itself will make a full backup of the instance you point it to (mydumper can make remote backups, pulling the data locally) and will use the less locking, savepoints and compression features.

What’s more, the beauty of containerizing the mydumper/myloader binaries mean that you can use this image in conjunction with docker machine to source logical backups from Mac and Windows where this process is typically difficult to assume.

I’m going to be filling in for my colleague Max Bubenick this week at Percona Live in Amsterdam talking about Logical Backups using Mydumper and if you’re planning to attend, the mydumper docker image will provide you with a quick path to trial. Thanks for reading and if you’re in Amsterdam this week don’t forget to say hello!

Further learning about docker:

The post Containing your logical backups: mydumper in docker appeared first on MySQL Performance Blog.

Percona Server for MongoDB 3.0.5-rc7 is now available

Mo, 2015-09-21 20:29

Percona is pleased to announce the release of Percona Server for MongoDB 3.0.5-rc7 on September 21, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Percona Server for MongoDB is an enhanced, open source, fully compatible, highly scalable, zero-maintenance downtime database supporting the MongoDB v3.0 protocol and drivers. Based on MongoDB 3.0.5, it extends MongoDB with RocksDB and PerconaFT storage engines, as well as features like external authentication and audit logging. Percona Server for MongoDB requires no changes to MongoDB applications or code.

New Features:

  • Auditing enables administrators to track and log user activity on a server. The server generates an audit log file with information about different user events, including authentication, authorization failures, and so on.
  • External authentication enables the MongoDB server to verify the client’s user name and password against a separate (external) service, such as OpenLDAP or Active Directory.

The post Percona Server for MongoDB 3.0.5-rc7 is now available appeared first on MySQL Performance Blog.

VividCortex Agent Benchmark

Fr, 2015-09-18 18:42


The purpose of this project was to measure the potential overhead of VividCortex Agent, which is used by database monitoring system. This benchmark is part of a consulting engagement with VividCortex and paid by the customer.

The assumption is that VividCortex agent uses CPU processing time, and we should see an impact on user queries when the workload is CPU-intensive (how much is to be measured). The impact on IO-bound should be small or insignificant.

Workload Description
For this, we use LinkBenchX benchmark in a combination of different options.


There are 3 different workloads we want to look into:

  • CPU bound. All data fits into memory; database performance is limited by CPU and memory speed. In this mode the server will be 100% CPU bound and we should see reduced performance when the agent is running.
  • CPU bound with limited CPUs. This mode is identical to previous, with a difference that only 4 CPUs are available for mysqld and vc-mysql-query processes. This to emulate “cloud-based” environment. In this workload even less CPUs are available, so we expect even bigger performance hit when running with agent.
  • IO-bound workload. In this workload the performance will be limited by storage IO performance, the impact from agent should be minimal.

LinkBenchX modes

We use two modes in LinkBenchX

  • Throughput mode. This mode allows to measure maximal throughput that database server can achieve.
  • “Request rate” mode. In this mode, LinkBenchX generates load with specified rate, so it allows to measure and compare response times for different configurations. Usually, we set request rate on the level of 75% of maximal throughput
    We measure (throughput & 99% response time) for operations ADD_LINKS (write operation) and GET_LINKS_LIST (range select operation) in 10 sec intervals

Agent modes
We wanted to compare the performance impact of VividCortex’s agent to PERFORMANCE_SCHEMA so we measured 4 combinations, of enabled/disabled in following ways:

  • Performance Schema disabled (OFF in my.cnf), marked as "NO-PS" in charts
  • Performance Schema enabled (ON in my.cnf, no additional probes enabled), marked as "with-PS" in charts
  • VividCortex agent is not active ("NO-vc-agent" in charts)
  • VividCortex agent is active ("vc-agent" in charts)

CPU bound

In the CPU bound workload, the impact from the enabling agent is the most significant.

Throughput impact:

  • By enabling PERFORMANCE_SCHEMA we see overhead 1.8% in throughput
  • By enabling vc-agent the overhead is 10.7% in throughput
  • By enabling PERFORMANCE_SCHEMA and vc-agent the overhead is 11.7% in throughput

It is worth to highlight, that in CPU-bound load, the vc-mysql-query agent’s CPU consumption is related to the amount of traffic it has to sniff and on CPU-bound workloads with high query traffic it can use up to a single CPU core. We did not benchmark a CPU-bound server with low query traffic. That’s why we decided to measure an impact of vc-agent in a case of limited amount of CPU available (say in a case of cloud or container server).

Response time impact:
on ADD_LINKS operation

  • PERFORMANCE_SCHEMA impact: added 20% to response time
  • vc-agent impact: added 42% to response time
  • PERFORMANCE_SCHEMA and vc-agent impact: added 63% to response time

on GET_LINKS_LIST operation

  • PERFORMANCE_SCHEMA impact: added 7.7% to response time
  • vc-agent impact: added 21.6% to response time
  • PERFORMANCE_SCHEMA and vc-agent impact: added 30.9% to response time

CPU bound (4 CPU cores are available)
In the CPU bound workload the impact from the enabling agent is the most significant.

  • By enabling PERFORMANCE_SCHEMA the overhead is 5% in throughput
  • By enabling, vc-agent the overhead is 13% in throughput
  • By enabling PERFORMANCE_SCHEMA and vc-agent, the overhead is 17.8% in throughput

Response time impact:
on ADD_LINKS operation

  • PERFORMANCE_SCHEMA impact: added 14% to response time
  • vc-agent impact: added 52% to response time
  • PERFORMANCE_SCHEMA and vc-agent impact: added 84% to response time

on GET_LINKS_LIST operation

  • PERFORMANCE_SCHEMA impact: added 10% to response time
  • vc-agent impact: added 51% to response time
  • PERFORMANCE_SCHEMA and vc-agent impact: added 74% to response time

IO Bound
There is no statistical difference in throughput and response time when running with PERFORMANCE_SCHEMA enabled and/or with vc-agent.

The impact from running vc-agent and/or PERFORMANCE_SCHEMA might be negligible or significant, depending on your workload. Numbers above (the impact on throughput and response time) are border cases and might be used as low and high limits for estimation of impact. Most likely for a real workload the overhead will be in the middle.
There is no measurable impact from vc-agent in IO bound workload, but in CPU-bound you may want to make sure you have spare CPU cycles for vc-agent, as it is computation intensive and may add visible overhead to response times.

Supporting graphs.

CPU bound

Throughput timeline

Throughput summary

Response time density chart

CPU bound (4 CPU available)

Throughput timeline

Throughput summary

Response time density chart

IO bound

Throughput timeline

Throughput summary

Response time density chart

The post VividCortex Agent Benchmark appeared first on MySQL Performance Blog.

Clarification on “Call me Maybe: MariaDB Galera Cluster”

Do, 2015-09-17 18:41

Recently Aphyr (Kyle Kingsbury) published

The article is technically valid, I am not going to dispute a conclusion Aphyr made, but it is also quite technically involved, so users who just jump to conclusion may get the wrong impression and we’re left with more questions than ever.

So, let me state what is the real conclusion of this article:
“Galera cluster does not support SNAPSHOT ISOLATION LEVEL, in contract to what was stated in the documentation”.
Following that conclusion is using Galera cluster may result in “corrupted” data.

I do not quite like the usage of the word “corrupted” here. For me, the more correct word be to use is “inconsistent”.

So with this clarification, the Aphyr’s conclusion that Galera Cluster (and it affects both MariaDB Galera and Percona XtraDB Cluster products)
does not support SNAPSHOT ISOLATION and may leave data in inconsistent state is valid.
But there I need to add quite IMPORTANT addition: it may leave data in inconsistent state
if you use SPECIAL TYPE of transactions in default isolation levels that Aphyr uses in his test.
Moreover, if we test the same workload on a simple single instance InnoDB, we will get the same result.

Before getting too scary of “inconsistent data”, let’s review what kind of transactions are used and what are practical implications.

Aphyr uses following logic:
Assume we have a table

CREATE TABLE `accounts` ( `account_id` int(11) NOT NULL, `balance` int(11) NOT NULL, PRIMARY KEY (`account_id`) ) ENGINE=InnoDB

We have N rows in table accounts, and each row populated with initial balance 100.
That results in SUM(balance) FROM accounts == 100*N

Application logic: Execute following transactions concurrently:

BEGIN (start transactions) // in application: renerate random $fromAccount in a range [1;N] SELECT balance FROM accounts WHERE account_id=$fromAccount // in application read balance into $from_balance variable // in application: renerate random $toAccount in a range [1;N], but != $fromAccount SELECT balance FROM accounts WHERE account_id=$toAccount // in application read balance into $to_balance variable // in application, generate random amount to move from one account to another, as $moveAmt // in application, calculate new balance for account1: $newBalance1=$from_balance-$moveAmt // in application, calculate new balance for account2: $newBalance2=$to_balance+$moveAmt // execute queries: UPDATE account SET balance=$newBalance1 WHERE account_id=$fromAccount UPDATE account SET balance=$newBalance2 WHERE account_id=$toAccount COMMIT;

As you see it includes some application logic, so on database side, the transactions looks like:
(assuming we move 25 from account 5 to 8)

BEGIN SELECT balance FROM accounts WHERE account_id=5 SELECT balance FROM accounts WHERE account_id=8 UPDATE account SET balance=75 WHERE account_id=5 UPDATE account SET balance=125 WHERE account_id=8 COMMIT;

Aphyr’s proves that these transactions executed concurrently should keep balances consistent (that is SUM(balances)==N*100) if database support SNAPSHOT ISOLATION or SERIALIZABLE isolation levels.
In his test he shows that running on Galera cluster these transactions executing concurrently results in inconsistent balance, therefore Galera cluster does not support SNAPSHOT ISOLATION level.

This however is totally expected.
Moreover, if you try this test on a single server against InnoDB in REPEATABLE-READ (default) mode,
you also will end up in inconsistent state (you can find my code here:

This is because how InnoDB handles REPEATABLE-READ mode (one may argue that InnoDB’s REPEATABLE-READ is weaker
than standard defined REPEATABLE-READ, and it is more closer to READ-COMMITED. This is a good opportunity for Asphyr to start another FUD “Call Me Maybe: InnoDB”). In simplified terms, InnoDB executes reads in repeatable-read mode, and writes or locked-read in read-committed mode.

What does it mean from practical standpoint?
From my opinion these transactions are little bit artificial (although are totally valid).

If you use this in a real life, the more obvious way to write these transactions is:

BEGIN SELECT balance FROM accounts WHERE account_id=5 SELECT balance FROM accounts WHERE account_id=8 UPDATE account SET balance=balance-25 WHERE account_id=5 UPDATE account SET balance=balance+25 WHERE account_id=8 COMMIT;

If you do this, it will NOT produce an inconsistent state.

Another way to handle this (for a single server InnoDB) is to use SERIALIZABLE isolation level (with an expected performance penalty).
Unfortunately Galera Cluster does not support SERIALIZABLE isolation level, as it does not pass read-set between nodes,
and node communication happens on COMMIT stage.

Third way, MySQL also provides an extension: SELECT .. FOR UPDATE statements to handle cases exactly like these.
So if you want to keep REPEATABLE-READ and original transactions, you will need to rewrite this as

BEGIN SELECT balance FROM accounts WHERE account_id=5 FOR UPDATE; SELECT balance FROM accounts WHERE account_id=8 FOR UPDATE; UPDATE account SET balance=75 WHERE account_id=5 UPDATE account SET balance=125 WHERE account_id=8 COMMIT;

This will result in a consistent state for table accounts and will work for both single InnoDB and multi-node Percona XtraDB Cluster deployments.

One thing to remember, that with Percona XtraDB Cluster you may get a DEADLOCK error trying to execute "COMMIT" statement, so your application should be ready to handle this error, rollback and repeat the transaction if needed.

So in conclusion of my post:

Using transactions described in may result in inconsistent state (not in data corrupted state!), for both Galera Cluster and single instance InnoDB. But this is because these transactions do not use properly InnoDB’s REPETABLE-READ isolation level. To reflect InnoDB’s requirement we need to use “SELECT ... FOR UPDATE” or to rewrite transactions in a described way.

UPDATE 18-Sep-2015.
Based on Twitter comments and comments from, I would like to add following.

The post Clarification on “Call me Maybe: MariaDB Galera Cluster” appeared first on MySQL Performance Blog.

Webinar: Introduction to MySQL SYS Schema follow up questions

Mi, 2015-09-16 21:57

Thanks to all who attended my webinar Introduction to MySQL SYS Schema. This blog is for me to address the extra questions I didn’t have time to answer on the stream.

Can i have the performance_schema enabled in 5.6 and then install the sys schema? Or they are one and the same?

You need to have enabled the performance_schema in order to use it through the sys schema. They are different entities. In general, performance_schema collects and stores the data, and sys schema reads and presents the data.

The installation of sys schema on primary database will be replicated to the slaves?

By default, no. If you wish that the Sys Schema replicates to the slaves, you can modify the before_setup.sql ( to skip the “SET sql_log_bin = 0;

Can MySQL save the slow running query in any table?

Yes it does:

How to see the query execution date & time from events_statements_current/history views in performance_schema?

You can check the performance_schema.events_statements_summary_by_digest table, that have the fields FIRST_SEEN and LAST_SEEN which are both a datetime values.

When the Sys Schema views show certain stats for the queries, is there a execution time range for queries under evaluation or is it like all the queries executed until date?

It’s all the queries executed until date, except when using some of the stored procedures that receive as a parameter a run time value, like “diagnostics” or “ps_trace_statement_digest”

I want to write the automated script to rebuild table or index. How to determine which table(s) or index(es) need to be rebuilt because of high fragmentation ratio?

For this you need to use something completely different. To know the fragmentation inside an InnoDB table i’ll recommend you to use XtraBackup with the –stats parameter

Downside to using? Overhead?

The overhead is the one that comes with using Performance Schema. I like the perspective of this presentation ( Overhead is dynamic. Do  not  rely  on  other  people’s  benchmarks. Benchmark your application and find out what your overhead is.

What is the performance cost with regards to memory and io when using sys schema? Are there any tweaks or server variables with help the sys schema performing better?

Use only the instrumentation needed. This blog post have extensive info about the topic

For replicate how does sys schema record data?

It doesn’t until MySQL 5.7.2 where the Performance Schema provides tables that expose replication information: and talking about the sys schema, currently the only place where you can find info about replication is in the “diagnostics” procedure, but as you can imagine, it only get’s data if the MySQL version is 5.7.2 or higher.

Is sys schema built into any o the Percona releases?

At the moment, no.

Is it possible to use SYS schema in Galera 3 nodes cluster?

Yes, since the only requirement is to have performance_schema, which is also available on PXC / Galera Cluster

Can you create trending off information pulled from the Sys Schema? Full table scans over time, latency over time, that kind of thing?

Yes, you can use procedures like, for example, “diagnostics

How do I reset the performance data to start collecting from scratch?

By calling the ps_truncate_all_tables procedure. Truncate a performance_schema table equals to “clear collected events”. TRUNCATE TABLE can also be used with summary tables, but the effect is to reset the summary columns to 0 or NULL, not to remove rows.

Can we install SYS schema before 5.6?

You can use the ps_helper on 5.5

Does sys support performance_schema from 5.0?

Unfortunately, MySQL 5.0 doesn’t have performance_schema. P_S is available since MySQL 5.5.3

If you install the sys schema on one node of a Galera cluster will all the nodes get the Sys schema? Also, is the Sys schema cluster aware or does it only track the local node?

For PXC 5.6 with Galera 3, the answer is: yes, it will be replicated to all the nodes. And the performance schema will always only collect data of the local node.

The post Webinar: Introduction to MySQL SYS Schema follow up questions appeared first on MySQL Performance Blog.

Percona Server 5.6.26-74.0 is now available

Di, 2015-09-15 15:25

Percona is glad to announce the release of Percona Server 5.6.26-74.0 on September 15, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.26, including all the bug fixes in it, Percona Server 5.6.26-74.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.26-74.0 milestone on Launchpad.

New Features:

  • TokuDB storage engine source has been merged into the Percona Server code. TokuDB storage engine tokudb_version variable now has the same value as the Percona Server version variable.
  • TokuDB Hot Backup has been renamed to Percona TokuBackup and it is now open source. Source code has been integrated into Percona Server code as a git submodule. TokuDB Hot Backup plugin source code has been merged into Percona Server code.
  • Tokutek Fractal Tree has been renamed to Percona FT and its source code has been integrated into Percona Server code as a git submodule.
  • TokuDB tests for Percona Server 5.6 have been merged into Percona Server 5.6 code.
  • Google SNAPPY compression/decompression algorithm is now available as TokuDB Compression table format.
  • Percona Server now supports changing the server_id variable per session, by implementing the new pseudo_server_id variable. This feature is also fixing upstream bug #35125.
  • Percona Server has temporarily disabled savepoints in triggers and stored functions. The reason is that even having fixed bug #1438990 and bug #1464468 we have found more cases where savepoints in triggers break binary logging and replication, resulting in server crashes and broken slaves. This feature will be disabled until the above issues are properly resolved.
  • LOCK TABLES FOR BACKUP now flushes the current binary log coordinates to InnoDB. Thus, under active LOCK TABLES FOR BACKUP, the binary log coordinates in InnoDB are consistent with its redo log and any non-transactional updates (as the latter are blocked by LOCK TABLES FOR BACKUP). It is planned that this change will enable Percona XtraBackup to avoid issuing the more invasive LOCK BINLOG FOR BACKUP command under some circumstances.
  • innodb_stress has been added to the list of default MTR suites. For most supported systems satisfying the newly added dependencies is straightforward, but on CentOS 5, the default Python is too old. Thus python26 and python26-mysqldb packages should be installed there and python26 should be made the default python for the testsuite environment.
  • Three new TokuDB variables, tokudb_client_pool_threads, tokudb_cachetable_pool_threads, and tokudb_checkpoint_pool_threads, have been implemented to improve the controlling of thread pool size.
  • Percona Server has implemented new tokudb_enable_partial_eviction option in TokuDB to allow disabling of partial eviction of nodes.
  • Percona Server has implemented new tokudb_compress_buffers_before_eviction option in TokuDB which allows the evictor to compress unused internal node partitions in order to reduce memory requirements as a first step of partial eviction before fully evicting the partition and eventually the entire node.

Bugs Fixed:

  • Querying GLOBAL_TEMPORARY_TABLES table would crash threads working with internal temporary tables used by ALTER TABLE. Bug fixed #1113388.
  • Selecting from GLOBAL_TEMPORARY_TABLES table while running an online ALTER TABLE on a partitioned table in parallel could lead to a server crash. Bug fixed #1193264.
  • Kill Idle Transactions feature could cause an assertion on a debug build due to a race condition. Bug fixed #1206008.
  • libmylsqclient_16 symbols were missing in Percona Server shared library package on RHEL/CentOS 7. Bug fixed #1420691.
  • Prepared statements in stored procedures could crash Response Time Distribution plugin. Bug fixed #1426345.
  • When variable innodb_corrupt_table_action is set to Warn/Salvage then server could crash on updating table statistics during query execution on affected tables. Bug fixed #1426610.
  • A sequence of failing TRUNCATE TABLE, then insert to that table, and CHECK TABLE would crash the server. Bug fixed #1433197.
  • When InnoDB change buffering was enabled and used, executing a FLUSH TABLE ... FOR EXPORT would cause a server hang and SHOW PROCESSLIST would show that table in a System Lock state. Bug fixed #1454441 (upstream #77011).
  • FLUSH INDEX_STATISTICS / FLUSH CHANGED_PAGE_BITMAPS and FLUSH USER_STATISTICS / RESET CHANGE_PAGE_BITMAPS pairs of commands were inadvertently joined, i.e. issuing either command had the effect of both. The first pair, besides flushing both index statistics and changed page bitmaps, had the effect of FLUSH INDEX_STATISTICS requiring SUPER instead of RELOAD privilege. The second pair resulted in FLUSH USER_STATISTICS destroying changed page bitmaps. Bug fixed #1472251.
  • Enabling super_read_only together with read_only in my.cnf would result in server crashing on startup. The workaround is to enable super_read_only dynamically on a running server.Bug fixed #1389935 ( the fix was ported from Facebook patch #14d5d9).
  • Enabling super_read_only as a command line option would not enable read_only. Bug fixed #1389935 ( the fix was ported from Facebook patch #14d5d9).
  • If a new connection thread was created while a SHOW PROCESSLIST command or a INFORMATION_SCHEMA.PROCESSLIST query was in progress, it could have a negative TIME_MS value returned in the PROCESSLIST output. Bug fixed #1379582.
  • With support for Fake Changes enabled, a write to an InnoDB table that would cause B-tree reorganization could lead to server assertion with the unknown error code 1000. Bug fixed #1410410.
  • Running ALTER TABLE ... DISCARD TABLESPACE with support for Fake Changes enabled would lead to a server assertion. Bug fixed #1372219.
  • ALTER TABLE did not allow to change a column to NOT NULL if the column was referenced in a foreign key. Bug fixed #1470677 (upstream #77591).
  • DROP TABLE IF EXISTS which fails due to a foreign key presence could break replication if slave had replication filters. Bug fixed #1475107 (upstream #77684).
  • Enabling Log Archiving for XtraDB when --innodb-read-only option was enabled would cause server to crash. Bug fixed #1484432.
  • LRU manager thread flushing was being accounted to buffer_flush_background InnoDB metrics which was wrong and redundant. Bug fixed #1491420.
  • Fixed a typo in the cleaner thread loop where n_flushed is added to, instead of reset, by the idle server flushing. This may cause a cleaner thread sleep skip on a non-idle server. Bug fixed #1491435.
  • Running TokuDB for a long time with lots of file open and close operations could lead to a server crash due to server incorrectly setting a reserved value. Bug fixed #690.
  • Fixed TokuDB memory leak due to data race in context status initialization. Bug fixed #697.
  • Removed unnecessary calls to malloc_usable_size() function in PerconaFT library to improve the performance. Bug fixed #682.

Other bugs fixed: #1370002, #1464468, #1287299, #1472256, #867, #870, #878, #658, #661, #663, #665, #687, #696, #698, #870, #685, and #878.

Release notes for Percona Server 5.6.26-74.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.26-74.0 is now available appeared first on MySQL Performance Blog.

MongoDB and Percona TokuMX Security Guidelines

Mo, 2015-09-14 19:31

Several reports we’re published in the news about how easy it is to access data stored in some NoSQL systems, including MongoDB. This is not surprising because security was rather relaxed in earlier versions of MongoDB . This post lists some of the common vulnerabilities in MongoDB and Percona TokuMX.

Network Security

One key point is to ensure that the bind_ip setting is correctly adjusted: in MongoDB 2.4 and Percona TokuMX, it is not set which means that the server will listen to all available network interfaces. If proper firewall rules (iptables, Security Groups in AWS, …) are not in place, your dataset could easily be queried from anywhere in the world!

In MongoDB 2.6+, bind_ip is set by default to in the official .deb and .rpm packages. This is great from a security point of view, but remember that you’ll still have to adjust the setting if the application servers are not running on the same host.

MongoDB has an HTTP interface that can display statistics, it is available on a port that is 1000 higher than the port used for regular connections. So if mongod is listening on port 27017, the HTTP interface can be reached on port 28017. Although this interface is read only, the stats that are exposed should not be readable by anyone. The best option is then to disable this interface for production systems. This is done by default in MongoDB 2.6+ (and set nohttpinterface: false if you want to use this HTTP interface).

Operating System Security

The mongod and mongos binaries should be run with a dedicated user that has limited privileges. This is done automatically if you have installed MongoDB/Percona TokuMX from a package, but that may not be the case if you have installed from a tarball.

Also make sure that the permissions of the configuration file do not allow any user to make any modifications that would take effect next time MongoDB is restarted.

User Security

MongoDB does not enable authentication by default, this is very handy for development but it is of course not suitable for production servers.

For standalone instances, you will have to set security.authorization: enabled in MongoDB 2.6+ or use the auth setting in MongoDB 2.4/Percona TokuMX.

For replica sets and sharded cluster, you will have to use a key file (security.keyFile or keyFile depending on the version). In this case security.authorization/auth is implied.

Then you will need to create users: start with an administrator user, and then application user(s). This is well explained in the documentation. Note that you will define users and roles in slightly different ways depending on the MongoDB version. It is always good to check the documentation for your specific version.

Also note that the localhost exception allows you to enable authentication before you create the first user.


As you can see, it is not that difficult to get a decently secured MongoDB deployment. However the trap is that some critical pieces (like authentication) are not enabled by default, so if you’re not careful or if you are under heavy time constraints, it can be easy to forget necessary configurations which can result in a weak setup. Don’t let this happen to you! And of course a good place to start is the official documentation.

The post MongoDB and Percona TokuMX Security Guidelines appeared first on MySQL Performance Blog.

Webinar: Writing Application Code for MySQL High Availability Followup Questions

Mo, 2015-09-14 17:57

Thanks to all who attended my webinar last week on Writing Application Code for MySQL High Availability.  This blog is for me to address the extra questions I didn’t have time to answer on the stream.

What do you think about using Galera Cluster but writing to a single Node with LVS ?

Whatever HA strategy you like that can present  a layer 3 or layer 4 to your application tier is fine.  A lot of people using PXC us it in a single-writer (master/slave) kind of way.

Is there any way we can determine slave lag and then decide to use weather master or slave? for e.g. instead of using query to find if data is available in slave then …. use if lag_time < xyz?

One of my main points was that this is usually more expensive to implement inside your application code than it is worth, particularly if such a check is done synchronously with user requests.  I think it’s (typically) something that would be better left to the your load balancer or whatever HA abstraction you have between your apps and database slaves.  For example, the health check could monitor replication lag and “fail” out laggy slaves, but then you have to consider what happens if all slaves show lag simultaneously.

As far as the second part of your question, how would you measure lag time? Even if you want to trust Seconds_behind_master, it’s still a database query (of some kind) to determine its value.

Error handling in JavaScript running in browser  Or in webservice on server?

I specifically talked about database interaction, which I doubt is a good idea directly from your client’s web browsers.  However, assuming we’re talking about client-side JS accessing your web service, all the same principles apply in that case.

I guess with GO, you have lot of options to do like, putting thread in wait mode or spanning another thread. But with other languages like java, readability of code is not that Great

I don’t want to start a language holy war, but I’d agree some languages make error handling easier than others.

For example, I prefer Go’s model of passing errors back from functions as a distinct return value over throwing exceptions.  Further, Go will complain if you don’t use the error value in the code, or else you must explicitly tell Go you won’t use that value.  Either way, I’m encouraged to handle errors and I like that in a language.

Also, how I handled errors in Go isn’t necessarily the best way, it is simply one way.  I’m sure people doing more Go work than I do regularly have better patterns.

However, I do feel there are appropriate ways to do everything I described in most any language, one way or another.  I’d expect the style for this to be dictated by software architects on larger projects and consistency for error handling to be enforced in a mature development org.


The post Webinar: Writing Application Code for MySQL High Availability Followup Questions appeared first on MySQL Performance Blog.

Percona TokuMX 2.0.2 is now available

Do, 2015-09-10 20:29

Percona is glad to announce the release of Percona TokuMX 2.0.2 on September 10th 2015. Downloads are available here and from the Percona Software Repositories.

Based on MongoDB 2.4.10 including all the bug fixes in it, Percona TokuMX 2.0.2 is current GA release in Percona TokuMX 2.0 series.

New Features

  • Percona TokuMX Enterprise has been open-sourced and it’s now replacing Percona TokuMX Community edition. To install or upgrade this new release use the tokumx-enterprise package. Packages are available from Percona Software Repositories, this means that you’ll need to replace the Tokutek‘s S3 repository with Percona‘s in order to see the new packages.
  • Packages for Fedora and Arch Linux are not available anymore.

Bugs Fixed

  • Fixed long replica node startup times: The entire op log is no longer read during startup, it is now read from the point of the last successfully replicated GTID. Git commit #1d5afda.
  • Fixed rare multiple primary issue: In certain scenarios, replicas would attempt to declare themselves as primary when another node had already been declared as primary. A node now performs an extra check against its role during remote primary notification. Git commit #5909737.

The post Percona TokuMX 2.0.2 is now available appeared first on MySQL Performance Blog.

Percona Live Keynote Speakers and Topics!

Do, 2015-09-10 18:57

The countdown for the annual Percona Live Data Performance Conference and Expo in Europe continues with today’s announcement of our keynote speakers!

This three-day conference focuses on the latest trends, news and best practices in the MySQL, NoSQL and data in the cloud markets, while looking forward to what’s on the long-term horizon within the global data performance industry. Attendees will get briefed on the hottest topics, learn about building and maintaining high-performing deployments and hear from technical experts. Whether you are new to data performance or a senior DBA with years of experience, there are tracks for you. With 84 breakout sessions, 8 tutorial sessions and 6 keynotes there will be no lack of great content presented by top industry professionals. You’ll return home equipped with new tools and ideas to make your business run better.

The Percona Live Europe 2015 keynote speakers and topics include:

  • MC Brown, VMware Continuent, “Tungsten Replicator, More Open Than Ever!”: This talk will explain why a heterogeneous data replication solution is necessary to effectively move data between both databases and datacenters.
  • Kristian Köhntopp, Syseleven GmbH, “The Virtues of Boring Technology”:
    This talk will rely on experiences at to highlight why “boring” technology makes sense for organizations that are busy managing growth.
  • Geir Hoydalsvik, Oracle, “MySQL 5.7: 20 years in the making!”: MySQL 5.7 is on its way to GA. This talk covers major developments including JSON, Optimizer Hints, Query Rewrite, Optimizer Cost Model, GIS, InnoDB, Group Replication, Fabric, Performance Schema, Sys Schema, Workbench and much more!
  • Mark Callaghan, Facebook, “MySQL and MongoDB for web-scale data management”:
    This talk will ignore debates over legacy vs. new, NoSQL/NewSQL vs. SQL or relational vs. document data models. I want a solution for sharded replica sets that is efficient, performant and manageable. I have a wish list for features. How close will MySQL and MongoDB get to making this easier in the next few years?
  • Peter Zaitsev, Percona, “Percona: Your Partner in Optimizing Data Performance”: This talk will focus on the growth of open source products in the enterprise and the importance of using the right open source tools and technology for each job rather than being locked into a proprietary solution. Learn how Percona is a true open source partner that helps you optimize your data performance to better run your business.
  • Keynote Panel featuring Jim Doherty, Percona (Moderator); Kenny Gorman, Rackspace; Mark Callaghan, Facebook; Ian Meyers, Amazon Web Services; and Peter Zaitsev, Percona, “The Next Disruptive Technology: What Comes After the Cloud and Big Data?”: With the Cloud and Big Data transitioning from “next big things” to broadly deployed operational technologies we think it’s a good time to look a few years out and speculate about what may prove to be the next big technology disruption. So what’s next?

Fun Featured Events!

  • Monday, September 21, SportPesa, Africa’s #1 sports betting platform, is hosting the Opening Party at the Delirium Café, following the day’s tutorial sessions, to kick off the conference and features giveaways, games, top prizes and beer!
  • Tuesday, September 22, Percona Live Diamond Sponsor will host the Community Dinner at the company’s headquarters in historic Rembrandt Square. After the day’s breakout sessions, attendees will be picked up outside the venue and taken to the dinner by canal boats! Space is limited so attendees should sign up as soon as possible.
  • Wednesday, September 23, the closing reception will be held in the Mövenpick Hotel exhibit hall.

So don’t forgetregister for the conference and sign up for the community dinner before space is gone!

See you in Amsterdam!

The post Percona Live Keynote Speakers and Topics! appeared first on MySQL Performance Blog.

Percona Server audit log plugin best practices

Do, 2015-09-10 07:00

Auditing your database means tracking access and changes to your data and db objects. The Audit Log Plugin has been shipped with Percona Server since 5.5.37/5.6.17, for a little over 12 months. Prior to the Audit Log Plugin, you had to work in darker ways to achieve some incarnation of an audit trail.

We have seen attempts at creating audit trails using approaches such as ‘sniffing the wire’, init files, in-schema ‘on update’ fields, triggers, proxies and trying to parse the traditional logs of MySQL (slow, general, binary, error). All of these attempts miss a piece of the pie, i.e. if you’re sniffing tcp traffic you’ll miss local connections, parsing binary logs you’re missing any reads. Your reasons for audit logging might be down to compliance requirements (HIPAA, PCI DSS) or you may need a way to examine database activity or track the connections incoming.

Over the past months I’ve met many support requests with the answer ‘install an audit plugin’. These requests have been varied but they have ranged from; finding out if a user is still active and if the impact of decommissioning it, the frequency of specific queries and checking if a slave is being written to name but a few.

So then, lets look at installation. In general we desire installation of the Audit Plugin on an existing instance. We discussed in previous Percona Blog posts, the installation of the plugin is trivial but lets recap. Lets perform a couple of basic checks before we run the install command from the client. First, query MySQL for the location of the plugins directory;

mysql> show global variables like 'plugin_dir'; +---------------+--------------------------+ | Variable_name | Value | +---------------+--------------------------+ | plugin_dir | /usr/lib64/mysql/plugin/ | +---------------+--------------------------+ 1 row in set (0.00 sec)

Once that’s known we’ll check that the audit log plugin shared library is present;

[moore@randy ~]$ ls -l /usr/lib64/mysql/plugin/audit* -rwxr-xr-x. 1 root root 42976 Jul 1 09:24 /usr/lib64/mysql/plugin/

Great, we are in good shape to move to the client and install;

mysql> install plugin audit_log soname ''; Query OK, 0 rows affected (0.00 sec) mysql> select * from mysql.plugin; +-------------------------------+--------------+ | name | dl | +-------------------------------+--------------+ | audit_log | | ... +-------------------------------+--------------+ 8 rows in set (0.00 sec)

Voila! It’s that simple. So, what does that provide us? Well now thanks to our default variables we’ve got the following options set;

mysql> show global variables like 'audit%'; +---------------------------+---------------+ | Variable_name | Value | +---------------------------+---------------+ | audit_log_buffer_size | 1048576 | | audit_log_file | audit.log | | audit_log_flush | OFF | | audit_log_format | OLD | | audit_log_handler | FILE | | audit_log_policy | ALL | | audit_log_rotate_on_size | 0 | | audit_log_rotations | 0 | | audit_log_strategy | ASYNCHRONOUS | | audit_log_syslog_facility | LOG_USER | | audit_log_syslog_ident | percona-audit | | audit_log_syslog_priority | LOG_INFO | +---------------------------+---------------+ 12 rows in set (0.00 sec)

So what we can tell from that output is that our audit plugin is enabled, it’s logging out to the default location ({datadir}/audit.log) and we’re grabbing all events (ALL) on the server and sending the output in XML format (OLD). From the list of variables above we’ve only got one dynamic variable. This means to change the logfile location or the format we need to put these options into our my.cnf and restart the instance. Not very convenient. Personally, it’s my preference to store the audit.log file away from my datadir.

I also dislike the XML formats in favour of the JSON log format. It is also advised, especially on busier systems, to enable the rotation options, audit_log_rotate_on_size and audit_log_rotations so that you don’t end up filling your disk with a huge audit log. Restarting your production instances isn’t extremely convenient but you’ll be happy to learn there is another way.

Let’s rewind to before we installed the plugin. We had checked the existence of our plugin shared library and were itching to run the install command. Now we can open our my.cnf file and add our preferred options prior to installation. Whilst it’s far from a secret, not many will know that the in the plugin installation phase, MySQL will re-read the my.cnf file to check for configuration relevant to the plugin. So let’s add some variables here;

## Audit Logging ## audit_log_policy=ALL audit_log_format=JSON audit_log_file=/var/log/mysql/audit.log audit_log_rotate_on_size=1024M audit_log_rotations=10

A quick review of the above. I intend to log all events in JSON format to the /var/log/mysql location. I will rotate each time the active log file hits 1G and this will circulate 10 files meaning I will not have more than 10G of audit logs on my filesystem.

Now with our predefined configuration in the my.cnf we can install the plugin from cold and begin with our preferred options;

mysql> show global variables like 'audit%'; Empty set (0.00 sec) mysql> install plugin audit_log soname ''; Query OK, 0 rows affected (0.00 sec) mysql> show global variables like 'audit%'; +---------------------------+--------------------------+ | Variable_name | Value | +---------------------------+--------------------------+ | audit_log_buffer_size | 1048576 | | audit_log_file | /var/log/mysql/audit.log | | audit_log_flush | OFF | | audit_log_format | JSON | | audit_log_handler | FILE | | audit_log_policy | ALL | | audit_log_rotate_on_size | 1073741824 | | audit_log_rotations | 10 | | audit_log_strategy | ASYNCHRONOUS | | audit_log_syslog_facility | LOG_USER | | audit_log_syslog_ident | percona-audit | | audit_log_syslog_priority | LOG_INFO | +---------------------------+--------------------------+ 12 rows in set (0.00 sec)

Something to remember; if you add these variables before installation of the plugin and you restart your instance or suffer a crash, your instance will not start.

[moore@randy ~]$ sudo systemctl restart mysql [moore@randy ~]$ sudo egrep 'ERROR' /var/log/mysqld.log 2015-09-02 11:55:16 8794 [ERROR] /usr/sbin/mysqld: unknown variable 'audit_log_policy=ALL' 2015-09-02 11:55:16 8794 [ERROR] Aborting

When all up and running we can check that the content is finding it’s way to our log file by opening it up and taking a look. Our JSON output will store a new line of JSON per event, here’s an example:

{"audit_record":{"name":"Query","record":"1067824616_2015-09-02T10:04:26","timestamp":"2015-09-02T10:54:53 UTC","command_class":"show_status","connection_id":"6","status":0,"sqltext":"SHOW /*!50002 GLOBAL */ STATUS","user":"pct[pct] @ localhost []","host":"localhost","os_user":"","ip":""}}

compare that with the ‘OLD’ XML output format that spans multiple lines making parsing a more difficult task:

<AUDIT_RECORD NAME="Query" RECORD="2745742_2015-09-02T21:12:10" TIMESTAMP="2015-09-02T21:12:22 UTC" COMMAND_CLASS="show_status" CONNECTION_ID="8" STATUS="0" SQLTEXT="SHOW /*!50002 GLOBAL */ STATUS" USER="pct[pct] @ localhost []" HOST="localhost" OS_USER="" IP="" />


One of the common assumptions of invoking the Audit Plugin is that it’s going to take an almighty hit on load. Logging all connections, queries and admin statements…surely? Well not so true. I spent some time observing the impact to the resources on a humbly specc’d home server. A small machine running quad core Xeon, 32G of RAM and a Samsung PRO SSD with a 72k rpm disk for the logs. Here are a collection of the graphs to illustrate that the impact of turning on the Audit Logging in asynchronous mode, as you will see the results are encouragingly showing little impact on activation of full logging. In each image, audit logging was set off and subsequently on.


We can install the Percona Audit plugin with our preferred options on a running system without interrupting it by adding our variables to the my.cnf. By performing this prior to the installing the plugin gives us best practice options without needing to restart the instance for static variables to take effect. Due to the lightweight nature of the audit plugin you can add this new log file to track access and changes to the data without the performance hit of the slow or general log. The audit log is a great aid to debugging and can serve as a security measure and malpractice deterrent.

The post Percona Server audit log plugin best practices appeared first on MySQL Performance Blog.

Testing MySQL partitioning with pt-online-schema-change

Mi, 2015-09-09 10:00

There are cases where we need to implement MySQL partitioning, or at least test if it is an option to solve an issue. However, how are we able to test it when the table on our production environment has hundreds of millions of rows, several composite indexes and/or the size on disk is hundreds of gigabytes?

Testing environments usually don’t have all the data that production has and if they have, probably you are not testing all the use-cases at a database level. Therefore, is it possible to test MySQL Partitioning on production impacting as less as possible?

When we execute pt-online-schema-change, it creates a table, triggers, and then copies the data. As we are going to test partitioning we are going to need both tables – with and without partitioning – and we are going to use triggers to keep both tables consistent. A good thing about changing a table to use partitioning is that, usually, you won’t need to change the structure of the row, which means that you are able to use practically the same statement to insert, update or delete on both tables.

Let’s suppose that we have this sysbench table:

CREATE TABLE `sbtest` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`), KEY `k_1` (`k`) ) ENGINE=InnoDB;

If we want to partition it, we execute:

ALTER TABLE percona.sbtest PARTITION BY HASH(id) partitions 4;

We will execute pt-online-schema-change like this:

pt-online-schema-change h=localhost,D=percona,t=sbtest --recursion-method none --execute --alter "PARTITION BY HASH(id) partitions 4"

But as we are going to test partitioning, we want to:

  • keep with the original table
  • do not swap the tables
  • do not drop the triggers

That is why we are going to execute pt-online-schema-change like this:

pt-online-schema-change h=localhost,D=percona,t=sbtest --recursion-method none --execute --no-swap-tables --no-drop-old-table --no-drop-new-table --no-drop-triggers --alter "PARTITION BY HASH(id) partitions 4"

At the end we are going to have 2 tables, sbtest, which is not partitioned and _sbtest_new which is partitioned:

The next step that pt-osc was going to do was to swap the tables, but we used –no-swap-tables, so we are going to do it manually. But first, we are going to add the triggers to _sbtest_new, so that it can load the data to sbtest, which will be renamed to _sbtest_old. However, we need to create the trigger now, which are going to be very similar to the ones that already exists, but with the table name _sbtest_old, and that will end up in an error as _sbtest_old doesn’t exist yet. That is why we create the triggers handling the error:

delimiter // CREATE DEFINER=`root`@`%` TRIGGER `pt_osc_percona__sbtest_new_ins` AFTER INSERT ON `percona`.`_sbtest_new` FOR EACH ROW begin DECLARE CONTINUE HANDLER FOR SQLSTATE '42S02' BEGIN END; REPLACE INTO `percona`.`_sbtest_old` (`id`, `k`, `c`, `pad`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`); end; // CREATE DEFINER=`root`@`%` TRIGGER `pt_osc_percona__sbtest_new_upd` AFTER UPDATE ON `percona`.`_sbtest_new` FOR EACH ROW begin DECLARE CONTINUE HANDLER FOR SQLSTATE '42S02' BEGIN END; REPLACE INTO `percona`.`_sbtest_old` (`id`, `k`, `c`, `pad`) VALUES (NEW.`id`, NEW.`k`, NEW.`c`, NEW.`pad`); end; // CREATE DEFINER=`root`@`%` TRIGGER `pt_osc_percona__sbtest_new_del` AFTER DELETE ON `percona`.`_sbtest_new` FOR EACH ROW begin DECLARE CONTINUE HANDLER FOR SQLSTATE '42S02' BEGIN END; DELETE IGNORE FROM `percona`.`_sbtest_old` WHERE `percona`.`_sbtest_old`.`id` <=> OLD.`id`; end; // delimiter ;

The schema is now:

We are going to create a table _sbtest_diff which will be the table that is going to be renamed to _sbtest_new. It doesn’t need to have indexes or be partitioned, so that it is simple:

CREATE TABLE `_sbtest_diff` ( `id` int(10) unsigned NOT NULL AUTO_INCREMENT, `k` int(10) unsigned NOT NULL DEFAULT '0', `c` char(120) NOT NULL DEFAULT '', `pad` char(60) NOT NULL DEFAULT '', PRIMARY KEY (`id`) ) ENGINE=InnoDB;

At this point we are able to swap the tables, the command to execute will be:

RENAME TABLE sbtest TO _sbtest_old, _sbtest_new TO sbtest, _sbtest_diff TO _sbtest_new;

The rename table will do this:

Now you can test performance on the table.  If we want to return to the previous stage, we just execute:

RENAME TABLE _sbtest_new TO _sbtest_diff, sbtest TO _sbtest_new, _sbtest_old TO sbtest;

With this two “RENAME TABLE” commands, we are able to back and forth to partition and non-partition table. Once you are satisfied with your testing, the remaining task is to clean up the triggers and the tables. At the end, there are 2 possible outcomes:

  • The partitioned table is working as expected. The cleanup commands will be:

DROP TRIGGER `pt_osc_percona__sbtest_new_ins`; DROP TRIGGER `pt_osc_percona__sbtest_new_upd`; DROP TRIGGER `pt_osc_percona__sbtest_new_del`; DROP TABLE _sbtest_new; DROP TABLE _sbtest_old;

  • We decided to keep the original table, which implies execute:

DROP TRIGGER `pt_osc_percona_sbtest_ins`; DROP TRIGGER `pt_osc_percona_sbtest_upd`; DROP TRIGGER `pt_osc_percona_sbtest_del`; DROP TABLE _sbtest_new; DROP TABLE _sbtest_old;


With this procedure, you will have both tables – with and without partitioning – synchronized and you will be able to swap between them until you decide to keep one of them.

The post Testing MySQL partitioning with pt-online-schema-change appeared first on MySQL Performance Blog.

super_read_only and GTID replication

Di, 2015-09-08 07:00

Percona Server 5.6.21+ and MySQL 5.7.8+ offer the super_read_only option that was first implemented in WebscaleSQL. Unlike read_only, this option prevents all users from running writes (even those with the SUPER privilege). Sure enough, this is a great feature, but what’s the relation with GTID? Read on!


Enabling super_read_only on all slaves when using GTID replication makes your topology far less sensitive to errant transactions. Failover is then easier and safer because creating errant transactions is much harder.

GTID replication is awesome…

For years, all MySQL DBAs in the world have been fighting with positioning when working with replication. Each time you move a slave from one master to another, you must be very careful to start replicating at the correct position. That was boring and error-prone.

GTID replication is a revolution because it allows auto-positioning: when you configure server B to replicate from A, both servers will automatically negociate which events should be sent by the master. Of course this assumes the master has all missing events in its binlogs. Otherwise the slave will complain that it can’t get all the events it needs and you will see an error 1236.

… but there’s a catch

Actually GTID replication has several issues, the main one in MySQL 5.6 being the inability to switch from position-based to GTID-based replication without downtime. This has been fixed since then fortunately.

The issue I was thinking of is errant transactions. Not familiar with this term? Let me clarify.

Say you have a slave (B) replicating from a master (A) using the traditional position-based replication. Now you want to create a new database. This is easy: just connect to B and run:

mysql> CREATE DATABASE new_db;

Ooops! You’ve just made a big mistake: instead of creating the table on the master, you’ve just created it on the slave. But the change is easy to undo: run DROP DATABASE on B, followed by CREATE DATABASE on A.

Nobody will ever known your mistake and next time you’ll be more careful.

However with GTID-replication, this is another story: when you run a write statement on B, you create an associated GTID. And this associated GTID will be recorded forever (even if the binlog containing the transaction is purged at some point).

Now you can still undo the transaction but there is no way to undo the GTID. What you’ve created is called an errant transaction.

This minor mistake can have catastrophic consequences: say that 6 months later, B is promoted as the new master. Because of auto-positioning, the errant transaction will be sent to all slaves. But it’s very likely that the corresponding binlog has been purged, so B will be unable to send the errant transaction. As a result replication will be broken everywhere. Not nice…

super_read_only can help

Enter super_read_only. If it is enabled on all slaves, the above scenario won’t happen because the write on B will trigger an error and no GTID will be created.

With super_read_only, tools that were not reliable with GTID replication become reliable enough to be used again. For instance, MHA supports failover in a GTID-based setup but it doesn’t check errant transactions when failing over, making it risky to use with GTID replication. super_read_only makes MHA attractive again with GTID.

However note that super_read_only can’t prevent all errant transactions. The setting is dynamic so if you have privileged access, you can still disable super_read_only, create an errant transaction and enable it back. But at least it should avoid errant transactions that are created by accident.

The post super_read_only and GTID replication appeared first on MySQL Performance Blog.

Facebook’s Simon Martin on semi-synchronous replication

Fr, 2015-09-04 15:41

Facebook, with 1.49 billion monthly active users,  is one of the world’s top MySQL users. Simon Martin, a production engineer on Facebook’s MySQL Infrastructure team, has been working with MySQL for most of his career, starting from 2 servers built out of spare parts and moving through to one of the largest deployments in the world.

Simon will be sharing some of the challenges Facebook has tackled as it rolled out semi-synchronous replication across the company’s different services at Percona Live Amsterdam on Sept. 22. His talk is aptly titled, “The highs and lows of semi-synchronous replication.” I sat down, virtually, with Simon the other day. Our conversation is below, but first, as a special reward to my readers, save €20 on your Percona Live registration by entering promo code “BlogInterview” at registration. Please feel free to share this offer!

Tom: On a scale from 1-10, how important is MySQL to Facebook? And how does Facebook use MySQL?

Simon: 10. We have a sophisticated in memory caching layer that will serve most requests, but MySQL is the persistent store for our graph. This means all your profile data, all your friends, likes and comments and the same for pages, events, places and the rest are stored permanently in MySQL.

We rely on MySQL in this role for 3 key features. Firstly as the final store it needs to not lose data, and InnoDB is well proven in this space. It needs to be highly available, MySQL and InnoDB are both very stable and we use replication as well to provide redundancy. Finally, even with extensive caching, it needs to be performant, both in latency and throughput, MySQL is both and we can use replication again to spread the read traffic to slaves in remote regions to help here too.

Tom: What are some of the advantages of using Semi-Synchronous Replication at Facebook — and what are the challenges for deployments of that size when using it?

Simon: That’s a big question, I could probably talk for 50 minutes on it! We started looking at Semi-Synchronous as a solution to reduce downtime when a MySQL master, or the host it’s on, crashes. Historically, if you are running a replicated environment and the master crashes, you are faced with a choice. You could promote another slave right away to reduce downtime, but it’s impossible to be sure that any of your slaves got all the transactions off the master. At Facebook we cannot lose people’s data, so we always chose to recover the master and re-connect the slaves before promoting if required. The downside is recovering InnoDB on a busy host can be slow, and if the host is rebooted it will be even slower, giving us many minutes of downtime.

Now that we run Semi-Synchronous replication it means that a master will not commit a transaction until at least one slave has acknowledged receipt of the binary logs for that transaction. With this running when a master crashes we can be sure our most up-to-date slave has all the data, so once it’s applied by the SQL thread we can promote safely without waiting for crash recovery.

There are many challenges in this though. Firstly there is performance, we now need a network round trip for each transaction, so we need the acknowledging slaves to be very close. Slaves in a different data hall, let-alone a different region, will be too slow.

We also need to pay attention to slave availability, previously not having a slave connected to a master for a short time was not a problem, now this will cause writes to stop and connections pile up, so we need to be much more careful about how we manage our replication topology. A target of 99.999% uptime for a service now requires the same SLA on slaves being available and connected locally to acknowledge the commits.

On top of this running at “webscale” adds a layer of requirements of its own. Like the rest of our environment everything needs to be automated, anything that requires a human is not going to scale. So our automation needs to respond to any failure and heal the system without intervention in any circumstance. An edge case that has even a tiny chance of occurring on a given day needs to be handled automatically, to keep our SLA and to stop our engineers constantly having to fix things.

Tom: What are you looking forward to the most at this year’s conference (besides your own talk)?

Simon: I always enjoy the keynotes, they don’t all seem to be announced yet but it’s a great way to get a state of the community update. I’ll certainly stop by “Binlog Servers at,” it sounds like they might be doing the same kind of things we are for Semi-Synchronous replication, so it’ll be great to compare ideas. I’ll also be looking at the talks on MySQL 5.7 to get the scoop on what cool new stuff is coming down the pipeline!

The post Facebook’s Simon Martin on semi-synchronous replication appeared first on MySQL Performance Blog.

Percona Toolkit and systemd

Do, 2015-09-03 15:18

After some recent work with systemd I’ve realized it’s power and I can come clean that I am a fan. I realize that there are multitudes of posts out there with arguments both for and against systemd but let’s look at some nice ways to have systemd provide us with (but not limited to) pt-kill-as-a-service.

This brief post introduces you to a systemd unit file and how we can leverage them to enable pt-kill at startup, have it start after mysqld and ensure that MySQL is running by using the mysql service as a dependency of pt-kill. By using systemd to handle this we don’t have to complicate matters by ‘monitoring the monitor’ using hacky shell scripts, cron or utilities like monit.

So then, a quick primer on systemd, because lets face it, we’ve all been avoiding it. Systemd is not new but it made recent headlines in the Linux world due to some of the major distros announcing their intentions to migrate upcoming releases to systemd.

What is it? Well due to it’s depth it is best described as a suite of management daemons, libraries and tools that will replace the traditional init scripts. So essentially remember how you start a service, mount a volume or read the system logs…well start forgetting all of that because systemd is disrupting this space. With systemd comes some really neat tricks for administering your machines and I’m really only beginning to see the tip of this iceberg. There is admittedly a lot to learn with systemd but this should serve as pragmatic entrée.

Systemd what? When did this happen?Linux distributionDate released as defaultArch Linux000000002012-10-01-0000October 2012CoreOS000000002013-10-01-0000October 2013 (v94.0.0)Debian000000002015-04-01-0000April 2015 (v8 aka jessie)Fedora000000002011-05-01-0000May 2011 (v15)Gentoo LinuxN/AMageia000000002012-05-01-0000May 2012 (v2.0)openSUSE000000002012-09-01-0000September 2012 (v12.2)Red Hat Enterprise Linux000000002014-06-01-0000June 2014 (v7.0)SlackwareN/ASUSE Linux Enterprise Server000000002014-10-01-0000October 2014 (v12)Ubuntu000000002015-04-01-0000April 2015 (v15.04)

Lennart Poettering, the name frequently attached with systemd is seeking to modernize the most fundamental process(es) of the Linux startup system, bringing the paradigms of modern computing; concurrency, parallelism and efficiency. The dependency tree of processes and services is more intuitive and the structure of the underlying startup scripts are unified. I feel that the direction proposed by systemd is an evolutionary one which promotes consistency within the startup scripts enabling conventions that can be easier understood by a broader audience.

Systemd and Percona Toolkit

This post aims to show that we can rely on systemd to handle processes such as pt-kill, pt-stalk, and other daemonized scripts that we like to have running perpetually, are fired at startup and can be reinstated after failure.

The scenario is this; I want pt-kill to drop all sleeping connections from a certain application user, lets call them, ‘caffeinebob’, because they never close connections. Due to various reasons we can’t make changes in the application so we’re employing Percona Toolkit favourite, pt-kill, to do this for us. For convenience we want this result to persist across server restarts. In the olden days we might have some cron job that fires a shell script in combination with a sentinal file to ensure it’s running. I’m pretty sure that this kitty could be skinned many ways.

The systemd Unit File

After some research and testing, the below unit file will play nicely on a Centos 7 node with systemd at it’s core. In this example I am running Percona Server 5.6 installed using Percona’s yum repo with the mysql.service unit file generated at installation. I suspect that there could be some systemd deviation with other MySQL variants however, this configuration is working for me.

[Unit] Description = pt-kill caffeinebob mysql.service Requires=mysql.service [Service] Type = simple PIDFile = /var/run/ ExecStart = /usr/bin/pt-kill --daemonize --pid=/var/run/ --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print Restart=on-abort [Install]

Let’s examine the above and see what we’re working with. Systemd unit files have various biologies. The example above is a simple Service unit file. This means we are enacting a process controlled and supervised by systemd. The significance of the After directive is that this service will not attempt startup until after and mysql.service have been called. The Required directive is makes ptkill.service dependant on the mysql.service startup being successful.

The next part, the [Service] grouping, details the actions to be taken by the service. The Type can be one of many but as it’s a simple call to a script I’ve used the simple type. We are describing the command and the handling of it. The ExecStart is evidently the pt-kill command that we would usually run from the shell prompt or from within a shell script. This is a very corse example because we can opt to parameterize the command with the assistance of an Environment file. Note the use of the Restart directive, used so that systemd can handle a reaction should a failure occur that interrupts the process.

Finally under the [Install] grouping we’re telling systemd that this service should startup on a multi user system, and could be thought of as runlevel 2 or 3 (Multiuser mode).

So providing that we’ve got all the relevant paths, users and dependencies in place, once you reboot your host, mysql.service should in order, initiate mysqld and when that dependency is met, systemd will initiate pt-kill with our desired parameters to cull connections that meet the criteria stipulated in our configuration. This means you rely on systemd to manage pt-kill for you and you don’t necessarily need to remember to start this or similar processes when you restart you node.

Start up & enable

Now to envoke our service manually and add enable it to work on start up we should run the following systemctl commands;

[moore@localhost ~]$ sudo systemctl start ptkill.service [moore@localhost ~]$ sudo systemctl enable ptkill.service

No feedback but no errors so we can check the status of the service

[moore@localhost ~]$ sudo systemctl status ptkill -l ptkill.service - keep pt-kill persistent across restarts Loaded: loaded (/etc/systemd/system/ptkill.service; enabled) Active: active (running) since Wed 2015-08-12 02:39:13 BST; 1h 19min ago Main PID: 2628 (perl) CGroup: /system.slice/ptkill.service └─2628 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Perfect we can also instruct systemd to disable this and|or stop our service when the application is changed and caffeinebob close() all those open connections.

[moore@localhost ~]$ sudo systemctl stop ptkill.service [moore@localhost ~]$ sudo systemctl disable ptkill.service

Now after successful implementation we see that our process is running delightfully;

[moore@localhost ~]$ ps -ef | grep pt-kill root 2547 1 0 02:37 ? 00:00:00 perl /usr/bin/pt-kill --daemonize --pid=/var/run/ --interval=5 --defaults-file=/root/.my.cnf --log=/var/log/ptkill.log --match-user caffeinebob --busy-time 10 --kill --print

Catch me if I fall

Lets issue a kill signal to the process and observe it’s behaviour using journalctl

[moore@localhost ~]$ sudo kill -SEGV 2547

This will write similar entries into the system log;

[moore@localhost ~]$ sudo journalctl -xn -f Aug 12 02:39:13 localhost.localdomain sudo[2624]: moore : TTY=pts/1 ; PWD=/home/moore ; USER=root ; COMMAND=/bin/kill -SEGV 2547 Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service: main process exited, code=killed, status=11/SEGV Aug 12 02:39:13 localhost.localdomain systemd[1]: Unit ptkill.service entered failed state. Aug 12 02:39:13 localhost.localdomain systemd[1]: ptkill.service holdoff time over, scheduling restart. Aug 12 02:39:13 localhost.localdomain systemd[1]: Stopping keep pt-kill persistent across restarts... -- Subject: Unit ptkill.service has begun shutting down -- Defined-By: systemd -- Support: -- -- Unit ptkill.service has begun shutting down. Aug 12 02:39:13 localhost.localdomain systemd[1]: Starting keep pt-kill persistent across restarts... -- Subject: Unit ptkill.service has begun with start-up -- Defined-By: systemd -- Support: -- -- Unit ptkill.service has begun starting up. Aug 12 02:39:13 localhost.localdomain systemd[1]: Started keep pt-kill persistent across restarts. -- Subject: Unit ptkill.service has finished start-up -- Defined-By: systemd -- Support: -- -- Unit ptkill.service has finished starting up. -- -- The start-up result is done.

Pt-kill flaps after the kill signal but systemd has been instructed to restart on failure so we don’t see caffeinebob saturate our processlist with sleeping connections.

Another bonus with this workflow is use within orchestration. Any standardized unit files can be propagated to your fleet of hosts with tools such as Ansible, Chef, Puppet or Saltstack.

Closing note

I’d love to hear from the pragmatists from the systemd world to understand if this approach can be improved or whether there are any flaws in this example unit file that would require addressing. This is very much a new-school of thought for me and feedback is both welcome and encouraged.

Thank you for your time, happy systemd-ing.

The post Percona Toolkit and systemd appeared first on MySQL Performance Blog.