MySQL High Performance

Syndicate content
Percona's MySQL & InnoDB performance and scalability blog
Updated: 15 hours 9 min ago

OpenStack Trove Day 2014 Recap: MySQL and DBaaS

Do, 2014-08-28 08:00

OpenStack Trove Day

I just returned from a week in Cambridge, Massachusetts where I was attending the OpenStack Trove Day and the Trove mid-cycle meetup, both sponsored by the great folks at Tesora.

I am relatively new to the OpenStack and Trove arenas so this was a fantastic opportunity for me to learn more about the communities, the various components within OpenStack, and what part Trove plays. I found the entire event very worthwhile – I met a lot of key people in the community, learned more about Trove and its potential, and in general felt a great energy and excitement surrounding Trove and OpenStack as a whole.

There were more than 120 attendees at Trove Day. That is almost four times the initial estimate! I think I would call that a success. There were 7 very high quality topics that covered material ranging from new and coming features within Trove, to deep inspection of how it is currently used in several big name companies to an investor’s perspective of the OpenStack market. There were also 2 panel style discussions that covered a lot of ground with all participants being ‘guys on the ground’ actively working with OpenStack deployments including one of my fellow Perconians, Mr. Tim Sharp.

One of the main takeaways for me from the entire day was the forward looking adoption estimates for Trove. This came up over and over through the various talks and panels. There seems to be a tremendous amount of interest in Trove deployments for late 2014/2015 but very few actual live users today. There also seems to be a bit of a messaging issue and confusion amongst potential users as to what Trove really is and is not. Simply reading the Trove Mission Statement should quickly clarify:

The OpenStack Open Source Database as a Service Mission: To provide scalable and reliable Cloud Database as a Service provisioning functionality for both relational and non-relational database engines, and to continue to improve its fully-featured and extensible open source framework.

So allow me to expand on that a bit based on some specific comments or questions that I overheard:
- Trove is NOT a database abstraction layer nor any sort of database unification tool; all applications still communicate with their respective datastores directly through their native APIs.
- Trove is NOT a database monitoring, management or analysis tool; all of your favorite debugging and monitoring tools like Percona Toolkit will still work exactly as advertised, and yes, you do need a monitoring tool.
- Although Trove does have some useful backup scheduling options, Trove is NOT a complete backup and recovery tool that can accommodate every backup strategy; you may still use 3rd party options such as scripting your own around Percona XtraBackup or make your life a lot easier and sign up for the Percona Backup Service.
- Trove IS a very nice way to add resource provisioning for many disparate datastores and has some ‘smarts’ built in for each. This ensures a common user experience when provisioning and managing datastore instances.

To that final point, our friends at Tesora introduced their new Database Certification Program at Trove Day. This new program will ensure a high level of compatibility between the various participating database vendors and the Trove project. Of course, Percona Server has already been certified.

I see the future of Trove as being very bright with a huge potential for expansion into other areas, once it is stabilized. I am very excited to begin contributing to this project and watch it grow.

Until next time…

The post OpenStack Trove Day 2014 Recap: MySQL and DBaaS appeared first on MySQL Performance Blog.

Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL

Mi, 2014-08-27 15:15

Swanhart-Tools includes FlexCDC, a change data capture tool for MySQL. FlexCDC follows a server’s binary log and usually writes “changelogs” that track the changes to tables in the database. I say usually because the latest version of Swanhart-Tools (only in github for now) supports FlexCDC plugins, which allow you to send the updates to a remote data source, or to any other place of your liking.  You can find out more about FlexCDC basics in a previous blog post.

Please note that FlexCDC still needs to have source and destination instances defined in the configuration, even when using plugins.  This is because the FlexCDC state (how much into which binary log has FlexCDC progressed, and what tables are being captured) is stored in the “dest”.  Normally when using a plugin, the source and destination instances will be the same. FlexCDC will create a ‘flexviews’ database with a number of state tables in the destination instance.  This also means you still have to use the create_mvlog.php add_table.php or Flexview’s create_mvlog(…) procedure to mark which tables to capture!  See the previous blog post about FlexCDC.

When you create the mvlog, there will still be a changelog table created in the dest, just like when not using a plugin. This is because the INFORMATION_SCHEMA is used to get column datatypes and additional information (such as if an int is signed or unsigned) and this lookup is done against the table in the dest. The reason this is needed, is because mysqlbinlog, the utility used to scrape the binlog, produces strange output for large signed integers (it provides the signed and unsigned version), thus FlexCDC must figure out the right one to choose from the actual DDL of the changelog table. FlexCDC can’t look at the DDL of the source table though, because the consumer may be behind, and the current structure may not match the structure of the rows in the log.

The new plugin system allows you to do a lot of nifty things like:

  • Replicate to external databases
  • Publish changes to a message queue (this is like Facebook’s Wormhole)
  • Keep a remote cache server in sync
  • and more…

The latest version of Swanhart-Tools includes an Example plugin (in flexviews/consumer/include/example_plugin.php) that simply prints the events that come through it, not logging them into the changelog table at all. There is an example of the output at the end of the post.

The example plugin looks like this:

<?php class FlexCDC_Plugin { static function begin_trx($uow_id, $gsn) { echo "START TRANSACTION: trx_id: $uow_id, Prev GSN: $gsn"; } static function commit_trx($uow_id, $gsn) { echo "COMMIT: trx_id: $uow_id, Last GSN: $gsn"; } static function rollback_trx($uow_id) { echo "ROLLBACK: trx_id: $uow_id"; } static function insert($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: INSERT, AT: $gsn"; print_r($row); } static function delete($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: DELETE, AT: $gsn"; print_r($row); } static function update_before($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (OLD), AT: $gsn"; print_r($row); } static function update_after($row, $db, $table, $trx_id, $gsn) { echo "TRX_ID: $trx_id, Schema:$db, Table: $table, DML: UPDATE (NEW), AT: $gsn"; print_r($row); } }

Important Note: You must define all seven of these functions in your plugin, even if you do not plan to have actions for each of the callbacks – just leave the function body empty to do no action (the call is simply a noop that case.) Note that the plugin functions must be declared STATIC.  This is due to the way that FlexCDC calls the functions.

Transaction state callbacks
There are three callback functions which notify the plugin of changes in transaction state. Before I go into what they do, I want to note the $trx_id and $gsn parameters which are present in every callback. Each transaction is assigned a monotonically increasing transaction identifier. This sequence uniquely identifies each transaction that FlexCDC processes. In addition, each row change is assigned a unique sequence number which FlexCDC calls the Generic Sequence Number (or GSN).

As you can see, the start_trx(…) callback (called when a transaction starts) is passed both the new transaction number and also the highest GSN used in the previous transaction. This is called the GSN high water mark (GSNHWM). At transaction commit, the commit_trx(…) callback is called and the transaction id and the last GSN assigned in the transaction are passed into the callback. This same value will appear as the GSNHWM in the next start_trx(…) callback. Finally, at rollback any sequence numbers assigned in that transaction will be re-used, so no GSN is passed to the rollback callback, but a transaction id is, which lets you determine exactly which transaction is rolling back.

Row change callbacks

Each of the four row change callback functions capture a particular change to the data set. Each of the functions take five parameters. The first ($row) is an array which contains the row being acted upon. The second ($db) is the schema which contains the row. The third ($table) is the table that contains the row. Each callback also receives the transaction identifier, and of course, each row change is assigned a unique GSN.

For example:
An update will fire both update_before(…) and update_after(…) callbacks with the row images before and after the change, respectively. There is an example of this at the end of the post.

Configuring FlexCDC to use a plugin
FlexCDC uses a configuration file called consumer.ini by default.  To the [flexcdc] section add:

The plugin must be in the FlexCDC include/ directory.  You will find example_plugin.php in this directory, to serve as an example.

How it works
Flexviews uses mysqlbinlog to decode the binary log from the source server. It uses the –decode-rows=ROWS option to decode RBR into a format which can be parsed by an external utility. FlexCDC collects information about each transaction and the row changes that happen in the database (which means it requires ROW based binary logging to be used.)  When a plugin is defined the normal actions used by FlexCDC are overridden with the callback functions.

Here is the output from the example plugin, for an update that affected 3 rows (update test.t3 set c1 = c1 – 1):

START TRANSACTION: trx_id: 44, Prev GSN: 107 TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 108 Array ( [c1] => -3 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 109 Array ( [c1] => -4 [c2] => 1 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 110 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 111 Array ( [c1] => -6 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (OLD), AT: 112 Array ( [c1] => -5 [c2] => 2 ) TRX_ID: 44, Schema:test, Table: t3, DML: UPDATE (NEW), AT: 113 Array ( [c1] => -6 [c2] => 2 ) COMMIT: trx_id: 44, Last GSN: 113

One thing you should notice, is that FlexCDC provides column names for the data coming from the binary log. This is because the log table exists in the dest instance and FlexCDC can get the list of column names from there. When you use other CDC tools, like the C binlog API, you don’t get column names.

The post Trawling the binlog with FlexCDC and new FlexCDC plugins for MySQL appeared first on MySQL Performance Blog.

mysqld_multi: How to run multiple instances of MySQL

Di, 2014-08-26 14:42

The need to have multiple instances of MySQL (the well-known mysqld process) running in the same server concurrently in a transparent way, instead of having them executed in separate containers/virtual machines, is not very common. Yet from time to time the Percona Support team receives a request from a customer to assist in the configuration of such an environment. MySQL provides a tool to facilitate the execution of multiple instances called mysqld_multi:

“mysqld_multi is designed to manage several mysqld processes that listen for connections on different Unix socket files and TCP/IP ports. It can start or stop servers, or report their current status.”

For tests and development purposes, MySQL Sandbox might be more practical and I personally prefer to use it for my own tests. Both tools work around launching and managing multiple mysqld processes but Sandbox has, as the name suggests, a “sandbox” approach, making it easy to both create and dispose a new instance (including all data inside it). It is more usual to see mysqld_multi being used in production servers: It’s provided with the server package and uses the same single configuration file that people are used to look for when setting up MySQL. So, how does it work? How do we configure and manage the instances? And as importantly, how do we backup all the instances we create?

Understanding the concept of groups in my.cnf

You may have noticed already that MySQL’s main configuration file (or “option file“), my.cnf, is arranged under what is called group structures: Sections defining configuration options specific to a given program or purpose. Usually, the program itself gives name to the group, which appears enclosed by brackets. Here’s a basic my.cnf showing three such groups:

[client] port = 3306 socket = /var/run/mysqld/mysqld.sock user = john password = p455w0rd [mysqld] user = mysql pid-file = /var/run/mysqld/ socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [xtrabackup] target_dir = /backups/mysql/

The options defined in the group [client] above are used by the mysql command-line tool. As such, if you don’t specify any other option when executing mysql it will attempt to connect to the local MySQL server through the socket in /var/run/mysqld/mysqld.sock and using the credentials stated in that group. Similarly, mysqld will look for the options defined under its section at startup, and the same happens with Percona XtraBackup when you run a backup with that tool. However, the operating parameters defined by the above groups may also be stated as command-line options during the execution of the program, in which case they they replace the ones defined in my.cnf.

Getting started with multiple instances

To have multiple instances of MySQL running we must replace the [mysqld] group in the my.cnf configuration file by as many [mysqlN] groups as we want instances running, with “N” being a positive integer, also called option group number. This number is used by mysqld_multi to identify each instance, so it must be unique across the server. Apart from the distinct group name, the same options that are valid for [mysqld] applies on [mysqldN] groups, the difference being that while stating them is optional for [mysqld] (it’s possible to start MySQL with an empty my.cnf as default values are used if not explicitly provided) some of them (like socket, port, pid-file, and datadir) are mandatory when defining multiple instances – so they don’t step on each other’s feet. Here’s a simple modified my.cnf showing the original [mysqld] group plus two other instances:

[mysqld] user = mysql pid-file = /var/run/mysqld/ socket = /var/run/mysqld/mysqld.sock port = 3306 datadir = /var/lib/mysql [mysqld1] user = mysql pid-file = /var/run/mysqld/ socket = /var/run/mysqld/mysqld1.sock port = 3307 datadir = /data/mysql/mysql1 [mysqld7] user = mysql pid-file = /var/run/mysqld/ socket = /var/run/mysqld/mysqld7.sock port = 3308 datadir = /data/mysql/mysql7

Besides using different pid files, ports and sockets for the new instances I’ve also defined a different datadir for each – it’s very important that the instances do not share the same datadir. Chances are you’re importing the data from a backup but if that’s not the case you can simply use mysql_install_db to create each additional datadir (but make sure the parent directory exists and that the mysql user has write access on it):

mysql_install_db --user=mysql --datadir=/data/mysql/mysql7

Note that if /data/mysql/mysql7 doesn’t exist and you start this instance anyway then myqld_multi will call mysqld_install_db itself to have the datadir created and the system tables installed inside it. Alternatively from restoring a backup or having a new datadir created you can make a physical copy of the existing one from the main instance – just make sure to stop it first with a clean shutdown, so any pending changes are flushed to disk first.

Now, you may have noted I wrote above that you need to replace your original MySQL instance group ([mysqld]) by one with an option group number ([mysqlN]). That’s not entirely true, as they can co-exist in harmony. However, the usual start/stop script used to manage MySQL won’t work with the additional instances, nor mysqld_multi really manages [mysqld]. The simple solution here is to have the group [mysqld] renamed with a suffix integer, say [mysqld0] (you don’t need to make any changes to it’s current options though), and let mysqld_multi manage all instances.

Two commands you might find useful when configuring multiple instances are:

$ mysqld_multi --example

…which provides an example of a my.cnf file configured with multiple instances and showing the use of different options, and:

$ my_print_defaults --defaults-file=/etc/my.cnf mysqld7

…which shows how a given group (“mysqld7″ in the example above) was defined within my.cnf.

Managing multiple instances

mysqld_multi allows you to start, stop, reload (which is effectively a restart) and report the current status of a given instance, all instances or a subset of them. The most important observation here is that the “stop” action is managed through mysqladmin – and internally that happens on an individual basis, with one “mysqladmin … stop” call per instance, even if you have mysqld_multi stop all of them. For this to work properly you need to setup a MySQL account with the SHUTDOWN privilege and defined with the same user name and password in all instances. Yes, it will work out of the box if you run mysqld_multi as root in a freshly installed server where the root user can access MySQL passwordless in all instances. But as the manual suggests, it’s better to have an specific account created for this purpose:

mysql> GRANT SHUTDOWN ON *.* TO 'multi_admin'@'localhost' IDENTIFIED BY 'multipass'; mysql> FLUSH PRIVILEGES;

If you plan on replicating the datadir of the main server across your other instances you can have that account created before you make copies of it, otherwise you just need to connect to each instance and create a similar account (remember, the privileged account is only needed by mysqld_multi to stop the instances, not to start them). There’s a special group that can be used on my.cnf to define options for mysqld_multi, which should be used to store these credentials. You might also indicate in there the path for the mysqladmin and mysqld (or mysqld_safe) binaries to use, though you might have a specific mysqld binary defined for each instance inside it’s respective group. Here’s one example:

[mysqld_multi] mysqld = /usr/bin/mysqld_safe mysqladmin = /usr/bin/mysqladmin user = multi_admin password = multipass

You can use mysqld_multi to start, stop, restart or report the status of a particular instance, all instances or a subset of them. Here’s a few examples that speak for themselves:

$ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is not running MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi start $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is running $ mysqld_multi stop 7,0 $ mysqld_multi report 7 Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld7 is not running $ mysqld_multi report Reporting MySQL (Percona Server) servers MySQL (Percona Server) from group: mysqld0 is not running MySQL (Percona Server) from group: mysqld1 is running MySQL (Percona Server) from group: mysqld7 is not running

Managing the MySQL daemon

What is missing here is an init script to automate the start/stop of all instances upon server initialization/shutdown; now that we use mysqld_multi to control the instances, the usual /etc/init.d/mysql won’t work anymore. But a similar startup script (though much simpler and less robust) relying on mysqld_multi is provided alongside MySQL/Percona Server, which can be found in /usr/share/<mysql|percona-server>/mysqld_multi.server. You can simply copy it over as /etc/init.d/mysql, effectively replacing the original script while maintaining it’s name. Please note: You may need to edit it first and modify the first two lines defining “basedir” and “bindir” as this script was not designed to find out the good working values for these variables itself, which the original single-instance /etc/init.d/mysql does. Considering you probably have mysqld_multi installed in /usr/bin, setting these variables as follows is enough:

basedir=/usr bindir=/usr/bin

Configuring an instance with a different version of MySQL

If you’re planning to have multiple instances of MySQL running concurrently chances are you want to use a mix of different versions for each of them, such as during a development cycle to test an application compatibility. This is a common use for mysqld_multi, and simple enough to achieve. To showcase its use I downloaded the latest version of MySQL 5.6 available and extracted the TAR file in /opt:

$ tar -zxvf mysql-5.6.20-linux-glibc2.5-x86_64.tar.gz -C /opt

Then I made a cold copy of the datadir from one of the existing instances to /data/mysql/mysqld574:

$ mysqld_multi stop 0 $ cp -r /data/mysql/mysql1 /data/mysql/mysql5620 $ chown mysql:mysql -R /data/mysql/mysql5620

and added a new group to my.cnf as follows:

[mysqld5620] user = mysql pid-file = /var/run/mysqld/ socket = /var/run/mysqld/mysqld5620.sock port = 3309 datadir = /data/mysql/mysql5620 basedir = /opt/mysql-5.6.20-linux-glibc2.5-x86_64 mysqld = /opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe

Note the use of basedir, pointing to the path were the binaries for MySQL 5.6.20 were extracted, as well as an specific mysqld to be used with this instance. If you have made a copy of the datadir from an instance running a previous version of MySQL/Percona Server you will need to consider the same approach use when upgrading and run mysql_upgrade.

* I did try to use the latest experimental release of MySQL 5.7 (mysql-5.7.4-m14-linux-glibc2.5-x86_64.tar.gz) but it crashed with:

*** glibc detected *** bin/mysqld: double free or corruption (!prev): 0x0000000003627650 ***

Using the conventional tools to start and stop an instance

Even though mysqld_multi makes things easier to control in general let’s not forget it is a wrapper; you can still rely (though not always, as shown below) on the conventional tools directly to start and stop an instance: mysqld* and mysqladmin. Just make sure to use the parameter –defaults-group-suffix to identify which instance you want to start:

mysqld --defaults-group-suffix=5620

and –socket to indicate the one you want to stop:

$mysqladmin -S /var/run/mysqld/mysqld5620.sock shutdown

* However, mysqld won’t work to start an instance if you have redefined the option ‘mysqld’ on the configuration group, as I did for [mysqld5620] above, stating:

[ERROR] mysqld: unknown variable 'mysqld=/opt/mysql-5.6.20-linux-glibc2.5-x86_64/bin/mysqld_safe'

I’ve tested using “ledir” to indicate the path to the directory containing the binaries for MySQL 5.6.20 instead of “mysqld” but it also failed with a similar error. If nothing else, that shows you need to stick with mysqld_multi when starting instances in a mixed-version environment.


The backup of multiple instances must be done in an individual basis, like you would if each instance was located in a different server. You just need to provide the appropriate parameters to identify the instance you’re targeting. For example, we can simply use socket with mysqldump when running it locally:

$ mysqldump --socket=/var/run/mysqld/mysqld7.sock --all-databases > mysqld7.sql

In Percona XtraBackup there’s an option named  –defaults-group that should be used in environments running multiple instances to indicate which one you want to backup :

$ innobackupex --defaults-file=/etc/my.cnf --defaults-group=mysqld7 --socket=/var/run/mysqld/mysqld7.sock /root/Backup/

Yes, you also need to provide a path to the socket (when running the command locally), even though that information is already available in “–defaults-group=mysqld7″; as it turns out, only the Percona XtraBackup tool (which is called by innobackupex during the backup process) makes use of the information available in the group option. You may need to provide credentials as well (“–user” & “–password”), and don’t forget you’ll need to prepare the backup afterwards. The option “defaults-group” is not available in all versions of Percona XtraBackup so make sure to use the latest one.


Running multiple instances of MySQL concurrently in the same server transparently and without any contextualization or a virtualization layer is possible with both mysqld_multi and MySQL Sandbox. We have been using the later at Percona Support to quickly spin on new disposable instances (though you might as easily keep them running indefinitely). In this post though I’ve looked at mysqld_multi, which is provided with MySQL server and remains the official solution for providing an environment with multiple instances.

The key aspect when configuring multiple instances in my.cnf is the notion of group name option, as you replace a single [mysqld] section by as many [mysqldN] sections as you want instances running. It’s important though to pay attention to certain details when defining the options for each one of these groups, specially when mixing instances from different MySQL/Percona Server versions. Differently from MySQL Sandbox, where each instance relies on it’s own configuration file, you should be careful each time you edit the shared my.cnf file as a syntax error when configuring a single group option will prevent all instances from starting upon the server’s (re)initialization.

I hope to have covered the major points about mysqld_multi here but feel free to leave us a note below if you have something else to add or any comment to contribute.

The post mysqld_multi: How to run multiple instances of MySQL appeared first on MySQL Performance Blog.

OpenStack’s Trove: The benefits of this database as a service (DBaaS)

Mo, 2014-08-25 12:00

In a previous post, my colleague Dimitri Vanoverbeke discussed at a high level the concepts of database as a service (DBaaS), OpenStack and OpenStack’s implementation of a DBaaS, Trove. Today I’d like to delve a bit further into Trove and discuss where it fits in, and who benefits.

Just to recap, Trove is OpenStack’s implementation of a database as a service for its cloud infrastructure as a service (IaaS). And as the mission statement declares, the Trove project seeks to provide a scalable and reliable cloud database service providing functionality for both relational and non-relational database engines. With the current release of Icehouse, the technology has begun to show maturity providing both stability and a rich feature set.

In my opinion, there are two primary markets that will benefit from Trove: the first being service providers such as RackSpace who provide cloud-based services similar to Amazon’s AWS. These are companies that wish to expand beyond the basic cloud services of storage and networking and provide their customer base with a richer cloud experience by providing higher level services such as DBaaS functionality. The other players are those companies that wish to “cloudify” their own internal systems. The reasons for this decision are varied, ranging from the desire to maintain complete control over all the architecture and the cloud components to legal constraints limiting the use of public cloud infrastructures.

With Trove, much of the management of your database system is taken care of by automating a significant portion of the configuration and initial setup steps necessitated when launching a new server. This includes deployment, configuration, patching, backups, restores, and monitoring that can be administered from either a CLI interface, RESTful API’s or OpenStack’s Horizon dashboard. At this point, what Trove doesn’t provide is failover, replication and clustering. This functionality is slated to be implemented in the Kilo release of OpenStack due out in April/2015.

The process flow is relatively simple. The OpenStack Administrator first configures the basic infrastructure by installing the database service. He or she would then create an image for each type of database they wish to support such as MySQL or MongoDB. They would then import the images and offer them to their tenants. From the end users perspective only a few commandes are necessary to get up and running. First issuing the <trove create> command to create a database service instance, followed by <trove list> command to get the ID of the instance and finally trove show command to get the IP address of it.

For example to create a database, you first start off by creating a database instance. This is an isolated database environment with compute and storage resources in a single tenant environment on a shared physical host machine. You can run a database instance with a variety of database engines such as MySQL or MongoDB.

From the Trove client I can issue the following command to create a database instance called PS_troveinstance, with a volume size of 2 GB, a user called PS_user, a password PS_password and the MySQL datastore (or database engine):

$ trove create –size 2 –users PS_user:PS_password –datastore MySQL PS_troveinstance

Next I issue the following command to get the ID of the database instance:

$ trove list PS_troveinstance

And finally, to create a database called PS_trovedb, I execute:

$ trove database-create PS_troveinstance PS_trovedb

Alternatively, I could have just combined the above commands as:

$ trove create –size 2 —-database PS_trovedb users PS_user:PS_password –datastore MySQL PS_troveinstance

And thus we now have a MySQL database server containing a database called PS_trovedb.

In our next post on OpenStack/Trove, we’ll dig even further and discuss the software and hardware requirements, and how to actually set up Trove.

On a related note, Percona has several experts attending this week’s OpenStack Operations Summit in San Antonio, Texas. One of them is Matt Griffin, director of product management, who pointed out in a recent post that many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. “We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for November’s Paris OpenStack Summit,” Matt said. So stay tuned for related news from him, too, on that front.

The post OpenStack’s Trove: The benefits of this database as a service (DBaaS) appeared first on MySQL Performance Blog.

When (and how) to move an InnoDB table outside the shared tablespace

Fr, 2014-08-22 14:29

In my last post, “A closer look at the MySQL ibdata1 disk space issue and big tables,” I looked at the growing ibdata1 problem under the perspective of having big tables residing inside the so-called shared tablespace. In the particular case that motivated that post, we had a customer running out of disk space in his server who was looking for a way to make the ibdata1 file shrink. As you may know, that file (or, as explained there, the set of ibdata files composing the shared tablespace) stores all InnoDB tables created when innodb_file_per_table is disabled, but also other InnoDB structures, such as undo logs and data dictionary.

For example, when you run a transaction involving InnoDB tables, MySQL will first write all the changes it triggers in an undo log, for the case you later decide to “roll them back”. Long standing, uncommited transactions are one of the causes for a growing ibdata file. Of course, if you have innodb_file_per_table disabled then all your InnoDB tables live inside it. That was what happened in that case.

So, how do you move a table outside the shared tablespace and change the storage engine it relies on? As importantly, how does that affects disk space use? I’ve explored some of the options presented in the previous post and now share my findings with you below.

The experiment

I created a very simple InnoDB table inside the shared tablespace of a fresh installed Percona Server 5.5.37-rel35.1 with support for TokuDB and configured it with a 1GB buffer pool. I’ve used a 50G partition to host the ‘datadir’ and another one for ‘tmpdir’:

Filesystem Size Used Avail Use% Mounted on /dev/mapper/vg0-lvFernando1 50G 110M 47G 1% /media/lvFernando1 # datadir /dev/mapper/vg0-lvFernando2 50G 52M 47G 1% /media/lvFernando2 # tmpdir

Here’s the table structure:

CREATE TABLE `joinit` ( `i` int(11) NOT NULL AUTO_INCREMENT, `s` varchar(64) DEFAULT NULL, `t` time NOT NULL, `g` int(11) NOT NULL, PRIMARY KEY (`i`) ) ENGINE=InnoDB DEFAULT CHARSET=latin1;

I populated it with 134 million rows using the following routine:

INSERT INTO joinit VALUES (NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 ))); INSERT INTO joinit SELECT NULL, uuid(), time(now()), (FLOOR( 1 + RAND( ) *60 )) FROM joinit; # repeat until you reach your target number of rows

which resulted in the table below:

mysql> show table status from test like 'joinit'G *************************** 1. row *************************** Name: joinit Engine: InnoDB Version: 10 Row_format: Compact Rows: 134217909 Avg_row_length: 72 Data_length: 9783214080 Max_data_length: 0 Index_length: 0 Data_free: 1013972992 Auto_increment: 134872552 Create_time: 2014-07-30 20:42:42 Update_time: NULL Check_time: NULL Collation: latin1_swedish_ci Checksum: NULL Create_options: Comment: 1 row in set (0.00 sec)

The resulting ibdata1 file was showing to have 11G, which accounted in practice for 100% of the datadir partition use then. What follows next is a few experiences I did by converting that table to use a different storage engine, moving it outside the shared tablespace, compressing it, and dumping and restoring the database back to see the effects in disk space use. I haven’t timed how long running each command took and focused mostly on the generated files size. As a bonus, I’ve also looked at how to extend the shared table space by adding an extra ibdata file.

#1) Converting to MyISAM

Technical characteristics and features apart, MyISAM tables are know to occupy less disk space than InnoDB’s ones. How much less depends on the actual table structure. Here I made the conversion in the most simplest way:

mysql> ALTER TABLE test.joinit ENGINE=MYISAM;

which created the following files (the .frm file already existed):

$ ls -lh /media/lvFernando1/data/test/ total 8.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 16:21 joinit.frm -rw-rw---- 1 fernando.laudares admin 7.0G Jul 31 16:27 joinit.MYD -rw-rw---- 1 fernando.laudares admin 1.3G Jul 31 16:27 joinit.MYI

The resulting MyISAM files amounted for an additional 8.3G of disk space use:

/dev/mapper/vg0-lvFernando1 50G 19G 29G 40% /media/lvFernando1

I was expecting smaller files but, of course, the result depends largely on the data types of the columns composing the table. The problem (or the consequence) is that we end up with close to the double of the initial disk space being used:

As it happens with the other solutions presented in this section that migrate the target table outside the shared tablespace, the common/safest way to reclaim the freed (unused) space inside the ibdata1 file back to the operating system is by doing a dump & restore of the full database.

There’s an alternative approach with MyISAM though, which doesn’t involve dump & restore and only requires a MySQL restart. However, you need to convert all InnoDB tables to MyISAM, stop MySQL, delete all ib* files (there should be no remaining .ibd files after you’ve converted all InnoDB tables to MyISAM), and then restart MySQL again. Upon MySQL restart, ibdata1 will be re-created with it’s default initial size (more on this below). You can then convert the MyISAM tables back to InnoDB and if you have innodb_file_per_table enabled this time then the tables will be created with their own private tablespace file.

#2) Exporting the table to a private tablespace

Once you have innodb_file_per_table enabled you can move a table residing inside ibdata1 to it’s private tablespace (it’s own .ibd file) by either running ALTER TABLE or OPTIMIZE TABLE. Both commands create a “temporary” (though InnoDB, not MyISAM) table with it’s own tablespace file inside the database directory (and not in the tmpdir, as I believed it would happen), the rows from the target table being copied over there.

Here’s showing the temporary table (#sql-4f10_1) that was created while the process was still ongoing:

$ ls -lh /media/lvFernando1/data/test total 2.2G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 20:42 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 #sql-4f10_1.frm -rw-rw---- 1 fernando.laudares admin 2.2G Jul 30 23:12 #sql-4f10_1.ibd

and the resulting .ibd file from when the process completed:

$ ls -lh /media/lvFernando1/data/test total 9.3G -rw-rw---- 1 fernando.laudares admin 8.5K Jul 30 23:05 joinit.frm -rw-rw---- 1 fernando.laudares admin 9.3G Jul 30 23:35 joinit.ibd

Note that the new joinit.ibd file is about 9.3G. Again, the process resulted in the use of extra disk space:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#3) Dump and restore: looking at the disk space use

As pointed, the way to reclaim unused disk space inside ibdata1 back to the file system (and consequently making it shrink) is by dumping the full database to a text file and then restoring it back. I’ve started by doing a simple full mysqldump encompassing all databases:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump.sql

which created the following file:

-rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 00:02 dump.sql

I’ve then stopped MySQL, wiped out the full datadir and used the script mysql_install_db to (re-)create the system tables (though that’s not needed, I did it to level comparisons; once you have all InnoDB tables out of the system tablespace you can simply delete all ib* files along any .ibd and .frm files for related InnoDB tables), started MySQL again, and finally restored the backup:

$ mysql -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox < dump.sql

This resulted in:

/dev/mapper/vg0-lvFernando1 50G 9.4G 38G 21% /media/lvFernando1


-rw-rw---- 1 fernando.laudares admin 18M Jul 31 05:05 /media/lvFernando1/data/ibdata1

ibdata1 did shrink in the process (returning to the default* initial size) and we recovered back a couple of gigabytes that were not being used by the single InnoDB table that used to live inside it.

* The default value for innodb_data_file_path for the Percona Server release we used on the tests is “ibdata1:10M:autoextend”. However, the manual states that “the default behavior is to create a single auto-extending data file, slightly larger than 10MB“. Given that innodb_autoextend_increment is set to 8M by default what we get in practice is an initialized ibdata1 file of 18M, as seen above.

#4) Compressing the table

Back to the original scenario, with the test table still living in the shared tablespace, I’ve enabled innodb_file_per_table, set Barracuda as the innodb_file_format and then ran:

mysql> ALTER TABLE test.joinit ENGINE=INNODB ROW_FORMAT=Compressed;

This resulted in an .ibd file with half the size of an uncompressed one:

-rw-rw---- 1 fernando.laudares admin 4.7G Jul 31 13:59 joinit.ibd

The process was a lengthy one but resulted in considerable less disk space use than converting the table to MyISAM:

/dev/mapper/vg0-lvFernando1 50G 20G 28G 42% /media/lvFernando1

#5) Converting the table to TokuDB

I’ve used TokuDB version 7.1.6 (not the latest one), which was bundled in Percona Server 5.5.37-rel35.1.

mysql> ALTER TABLE test.joinit ENGINE=TOKUDB;

As was the case with converting the InnoDB table to it’s own tablespace, a temporary table was created to intermediate the process, with the table definition residing in the test database directory:

$ ls -lh /media/lvFernando1/data/test/ total 24K -rw-r----- 1 fernando.laudares admin 8.5K Jul 31 14:22 joinit.frm -rw-rw---- 1 fernando.laudares admin 8.5K Jul 31 14:28 #sql-7f51_1.frm

However, TokuDB created a temporary file in the main datadir to copy the data into, show here at some point during the process:

-rwxrwx--x 1 fernando.laudares admin 32K Jul 31 14:28 _test_sql_7f51_1_main_a_2_19.tokudb -rwxrwx--x 1 fernando.laudares admin 16K Jul 31 14:28 _test_sql_7f51_1_status_a_1_19.tokudb -rw------- 1 fernando.laudares admin 528M Jul 31 14:29 tokuldNk5W4v

Once the process completed the file tokuldNk5W4v disappeared and the data ended up in the _test_sql_7f51_1_main_a_2_19.tokudb file:

-rwxrwx--x 1 fernando.laudares admin 1.1G Jul 31 14:32 _test_sql_7f51_1_main_d_1_19_B_0.tokudb

The new .tokudb file is so small compared to the other files obtained in the previous approaches that it almost goes unnoticed when looking at the disk space use:

/dev/mapper/vg0-lvFernando1 50G 12G 36G 25% /media/lvFernando1

Curiously, that file retained the name/reference of the temporary table it used, even though the temporary table file description (#sql-7f51_1.frm) disappeared as well; only joinit.frm remained under the test database directory.

I decided to do a dump & restore of the whole database to see what that would result in this case:

$ mysqldump -S /tmp/mysql_sandbox5537.sock --user=msandbox --password=msandbox --all-databases > dump2.sql $ ls -lh dump2.sql -rw-r--r-- 1 fernando.laudares admin 8.1G Jul 31 15:07 /home/fernando.laudares/dump2.sql

Once it was restored we found the ibdata1 file back to its initial default size and the disk space used by the datadir was down to a scant 1.1G. Curiously (again), the dump & restore procedure did fixed the name of the TokuDB table file:

-rwxrwx--x 1 fernando.laudares admin 980M Jul 31 15:36 _test_joinit_main_21_1_19_B_0.tokudb

The process of converting the resident InnoDB table to TokuDB was faster than compressing it with ROW_FORMAT=Compressed and resulted in a much smaller file. This is not to say that using TokuDB is the best solution but to point that the process can be accomplished in less time and make use of lesser extra disk space than InnoDB does, which can simply come up handy if you don’t have much space left. Also, remember the test was done with a not very large (and simple in structure) test table; you may need to check if yours can be converted to TokuDB and what changes to indexes you might need to do (if any).

#6) Expanding the shared tablespace

As previously mentioned, I’ve been using the default value of  “ibdata1:10M:autoextend” for innodb_data_file_path for my tests. If you have another partition with unused space available and you look only for a solution to the “running out of disk space” problem and don’t mind keeping your big table inside the shared tablespace, then you can expand it. That can be done by means of adding a second ibdata file to the tablespace definition. Of course, that only works if you create it outside the partition already hosting ibdata1.

To do so, you need to stop MySQL and verify the size of the ibdata1 file you have (list size in bytes as “ls -lh” will round it here):

$ ls -l /media/lvFernando1/data/ibdata1 -rw-r----- 1 fernando.laudares admin 10957619200 Aug 1 16:20 /media/lvFernando1/data/ibdata1

I had available space on the /media/lvFernando2 partition so I decided to have ibdata2 created there. To do so, I’ve added the following 2 lines to my.cnf:

innodb_data_home_dir = innodb_data_file_path=/media/lvFernando1/data/ibdata1:10957619200;/media/lvFernando2/tmp/ibdata2:10M:autoextend

Then I restarted MySQL and confirmed that ibdata2 was created there upon initialization:

$ ls -lh /media/lvFernando2/tmp/ total 10M -rw-rw---- 1 fernando.laudares admin 10M Aug 1 16:30 ibdata2

Three important things to note on this approach:

  1. There can be only one ibdata file listed in innodb_data_file_path configured with “autoextend” - the last one in the list.
  2. You need to redefine ibdata1 in innodb_data_file_path using it’s current size. Any other size won’t work.
  3. From the moment you redefined ibdata1 with it’s current size and added ibdata2 configured with “autoexend”, ibdata1 won’t grow any bigger; the common tablespace will be increased through ibdata2.

What the 3rd point really means in practice is that you need to plan this maneuver accordingly: if you still have space available in the partition that hosts ibdata1 and you want to use it first then you need to delay these changes until you get to that point. From the moment you add ibdata2 to the common tablespace definition new data will start to be added into it.


The little experiment I did helped explain how some of the solutions proposed in my previous post for moving an InnoDB table outside the shared tablespace would work in practice and, most importantly, how much more disk space would be needed along the process. It was interesting to see that none of them made use of the tmpdir when they created temporary tables to intermediate the table conversion process; those were always created on the main datadir. The test table I’ve used had only a bit more than 10G, far away from a 1TB-sized table, so the results I’ve got may differ from what you would find in a larger environment.

As complementary information, MySQL 5.6 allows for the creation of InnoDB tables having their private tablespaces living outside the datadir (but that must be defined at table creation time and can’t be changed by means of an ALTER TABLE) as well as transportable tablespaces (which allow for the copy of private tablespaces from one database server to another).

The post When (and how) to move an InnoDB table outside the shared tablespace appeared first on MySQL Performance Blog.

A closer look at the MySQL ibdata1 disk space issue and big tables

Do, 2014-08-21 15:24

A recurring and very common customer issue seen here at the Percona Support team involves how to make the ibdata1 file “shrink” within MySQL. I can only imagine there’s a degree of regret by some of the InnoDB architects on their design decisions regarding disk-space management by the shared tablespace* because this has been a big frustration for many MySQL users over the years.

There’s a very old bug (“InnoDB ibdata1 never shrinks after data is removed,” Sept. 8 2003) documenting user dissatisfaction. Shortly before that issue celebrated its 10th anniversary, James Day, MySQL senior principal support engineer at Oracle, posted a comment explaining why things haven’t changed and he also offered possible alternative solutions. Maybe we’ll see it fixed in a future release of MySQL. We can only be sure that any new storage engine aiming to warrant the sympathy of MySQL users can’t make that same mistake again.

One general misunderstanding that exacerbates the problem is the belief that if we enable innodb_file_per_table then all InnoDB tables will live in their own tablespace (a “private” .ibd file), even though the manual is clear about the role this variable plays. The truth is that when you enable innodb_file_per_table it will only immediately affect how new InnoDB tables are created – it won’t magically export tables currently living in the shared tablespace into their own separate .ibd files. That can be manually accomplished at any time afterwards by running ALTER TABLE or OPTIMIZE TABLE on the target table. The “gotcha” here is that by doing so you’ll actually be using additional disk space – as much as the table size,  for the newly created .ibd file. ibdata1 won’t automatically shrink: the space previously used by that table will be marked as being “free” (for internal InnoDB use) but won’t be returned to the file system.

Note: Throughout this post I make a common reference between ibdata1 and the shared (also called system) tablespace. In fact, the latter can be composed by a list of file definitions (ibdata1;ibdata2, …). Each file can have a fixed size or be specified with the autoextend parameter and have a cap limiting how big they can grow (well, actually only the last file defined in that list can). The default setting for the variable ruling how the shared tablespace is defined has it living on a file located in the datadir, named ibdata1, and configured with autoextend, hence the popular reference of a “growing ibdata1″.

A big table scenario

The shared tablespace contains more than data and indexes from InnoDB tables created inside it, such as the rollback segment for running transactions (a.k.a. “undo logs”). My colleague Miguel Angel Nieto wrote a blog post last year that explains in detail what is stored inside ibdata files, why they can grow bigger than the sum of data from the tables it hosts and won’t “shrink,” and explains the only real way to reclaim unused disk space to the file system.

But there’s one scenario where the ibdata1 file grows in a “legitimate” way: When it’s storing a big table. If there’s a big table living inside the shared tablespace accounting for most of its size then there’s no “shrinking” it. We can argue this would run counter to best practices (or not) but let’s remember that innodb_file_per_table used to be disabled by default. The first releases of MySQL 5.5 had it enabled by default but then the later ones had it disabled. We find the exact opposite happening with MySQL 5.6 - it is enabled in the later versions.

If a developer who is not a database specialist much less a MySQL expert creates a successful product around a single table and was “unlucky” enough to be using a MySQL release that had innodb_file_per_table disabled by default, he or she might soon find a Terabyte table living inside ibdata1 (BTW, I recommend Bill Karwin’s excellent “How to Avoid Common (but Deadly) MySQL Development Mistakes” recorded webinar and slides for all developers using MySQL).

What to do then ?

We recently helped a customer with such a problem. They found themselves running out of disk space with a steadily growing ibdata1 file containing data from a 1.5 Terabyte table. The replicas weren’t able to keep up either, and binary logs where pilling up, thus contributing to more disk space use. They had access to a network mounted partition in their main master server but the storage was quite slow. We started looking for possible solutions to their case and the team came up with the following list of “solutions”:

1) Add one or more disks to the system. Hopefully they had the datadir lying inside a LVM partition using the XFS filesystem so the perspective of simply increasing it’s size by adding more disk space was a good one to contemplate.

2) Re-arranging files in different partitions. The binary logs were stored inside the datadir and got to use a considerable amount of disk space so moving them to another partition would free a few hundreds gigabytes already. The complicated part here was that the only available alternative storage was the slow network one, and that would contribute to make the replicas delay yet more. Later we contemplated adding a second file (ibdata2) to the shared tablespace to be located in a different partition but, again, the only one with space available was the slow network mount.

3) Archive data. Delete a large amount of old data from the table (possibly using pt-archiver) and then do a mysqldump dump and restore the table with less data; or simply leave it as is – even though the freed space won’t be reclaimed by the filesystem it will be made available internally for InnoDB to re-use it. Be aware that deleting data is a slow process; if you have a lot of old unused data then it might be more efficient to dump the data you want to preserve, truncate the table, and then import the data back to it.

4) Convert the table to MyISAM. This could be seem as a polemic solution: it certainly shouldn’t be taken lightly as a change of storage engine is a big deal. A giving application that works with InnoDB might simply not do with MyISAM. While we do not normally recommend MyISAM, it does take less space to store data than InnoDB. Be aware that MyISAM is not a transactional storage engine, and if you crash, you will have to perform a lengthy repair process on the table. Also keep in mind that after converting the table to MyISAM you would have to either do a dump & restore of the full database to have ibdata1 shrink OR make sure you have all InnoDB tables converted to MyISAM, stop MySQL, remove all ib* files, and start MySQL again to have it recreate ibdata1 with its default size.

5) Try compressing the table. Rebuild the InnoDB table with file per table enabled and ROW_FORMAT=Compressed. If replicas are already lagging behind, replication may not be able to keep up when compressed tables are used. Also, if you have a very high write volume, it probably won’t perform well enough. If that is the case, this solution is not a viable one. However, if you would consider switching to the new TokuDB storage engine you will count with much better compression than InnoDB overall, and also better performance for write intensive workloads.

Of course, from all the outlined solutions the first 2 look to be by far the least invasive ones. The problem is that not always the server has free slots waiting for the addition of an extra disk, plus the arrangement around RAID setups brings extra constraints. Moving the binary logs to a different partition could be handy but it’s not always practical or possible at all, as is the case of making more space available by deleting/archiving data and extending the shared tablespace by adding a new ibdata2 file residing in a different partition. Finally, the other solutions involving converting the table to a different storage engine and compressing were not an option in the case of this particular customer because they required the use of additional disk space in the process, which they didn’t had.

What the customer ended doing was dumping the whole data and importing it into another server, compressing the table in the process. And then re-importing it back into the main server. It took time to complete and it wasn’t an option they were keen to try at first. But sometimes there’s just no easy way out of a problem like this. They made changes on the replicas to allow them to keep up with replication even when using compressed tables and the problem was solved for now, even though they’re aware the disk space they recovered won’t be enough in the long term.


The most important take-away here is to never get yourself into the situation of getting too close to running out of disk space. The best solutions available to minimize disk space being used by InnoDB tables inside the shared tablespace usually require the (temporary, if you intend to do a dump & restore afterwards) use of yet more disk space. And the bigger the table the longer it will take for dump & restore and table conversions so if you’re running out of time it only make things more complicated.

This case made me curious to explore some of the other options mentioned above a little further. In a followup post I’ll share some of my findings.

The post A closer look at the MySQL ibdata1 disk space issue and big tables appeared first on MySQL Performance Blog.

How to use MySQL Global Transaction IDs (GTIDs) in production

Mi, 2014-08-20 08:00

Reconfiguring replication has always been a challenge with MySQL. Each time the replication topology has to be changed, the process is tedious and error-prone because finding the correct binlog position is not straightforward at all. Global Transaction IDs (GTIDs) introduced in MySQL 5.6 aim at solving this annoying issue.

The idea is quite simple: each transaction is associated with a unique identifier shared by all servers in a given replication topology. Now reconfiguring replication is easy as the correct binlog position can be automatically calculated by the server.

Awesome? Yes it is! However GTIDs are also changing a lot of things in how we can perform operations on replication. For instance, skipping transactions is a bit more difficult. Or you can get bitten by errant transactions, a concept that did not exist before.

This is why I will be presenting a webinar on Aug. 27 at 10 a.m. PDT: Using MySQL Global Transaction IDs in Production.

You will learn what you need to operate a replication cluster using GTIDs: how to monitor replication status or to recover from replication errors, tools that can help you and tools that you should avoid and also the main issues that can occur with GTIDs.

This webinar is free but you can register today to reserve your seat. And a recording will be available afterwards. See you next week!

The post How to use MySQL Global Transaction IDs (GTIDs) in production appeared first on MySQL Performance Blog.

5 great new features from Percona Cloud Tools for MySQL

Di, 2014-08-19 13:00

It’s been three months since we announced anything for Percona Cloud Tools, not because we’ve been idle but because we’ve been so busy the time flew by!  Here’s the TL;DR to pique your interest:

  • EXPLAIN queries in real-time through the web app
  • Query Analytics for Performance Schema
  • Dashboards: customizable, shared groups of charts
  • Install and upgrade the agent with 1 command line
  • Unified UI: same time range, same host wherever you go

Percona Cloud Tools for MySQL is a hosted service providing access to query performance insights for all MySQL uses. After a brief setup, unlock new information about your database and how to improve your applications. There’s a lot more, but let’s just look at these five new features…


EXPLAIN queries in real-time through the web app

Like many people, to get a query’s EXPLAIN plan you probably copy the query, ssh to the server, log in to MySQL, then paste the query after typing “EXPLAIN”.  With Percona Cloud Tools’ new real-time EXPLAIN feature you can simply click a button.  It’s a real timesaver.

The EXPLAIN plan is a vital part of diagnosing the query.  Now with Percona Cloud Tools you have a lot of powerful information in one place: the query, its metrics, its EXPLAIN plan, and more–and more to come, too!


Query Analytics for Performance Schema

The MySQL slow log is a wealth of indispensable data about queries that you cannot get anywhere else.  That’s why it’s the default for Percona Cloud Tools Query Analytics.  Like most things, however, it has tradeoffs: for one, it can be time-consuming to parse, especially on very busy servers.  Or, in the case of Amazon RDS, the slow log may simply not be available.  That’s ok now because with MySQL 5.6 or newer (including Percona Server 5.6 or newer) you can parse queries from the Performance Schema.  It’s not as data-rich as the slow log, but it has the basics and it’s a great alternative (and sometimes the only alternative) to the slow log.


Dashboards: customizable, shared groups of charts

Metrics Monitor has a default dashboard (a collection of charts) for MySQL.  The default dashboard is a great start because it’s created by us (Vadim, actually) so you know it’s relevant and meaningful for MySQL.  However, it presents only a fraction of the data that percona-agent collects, so we need more dashboards to organize and present other data.  Now you can create new dashboards which are accessible to everyone in the organization.  For example, Peter was recently benchmarking TokuDB, so he created a TokuDB-specific dashboard.


Install and upgrade the agent with 1 command line

As of percona-agent 1.0.6, you can install, upgrade, and uninstall the agent with a single command line, ran as root, like:

# curl -s | bash /dev/stdin -api-key <API KEY>

For many environments this is all you need for a first-time install of a new agent.  The install will auto-detect MySQL, configure, and run all services by default.  You can tweak things later in the web app.  This also means you can install percona-agent in an automated environment.


Unified UI: same time range, same host wherever you go

Like most projects, Percona Cloud Tools has evolved over time.  Consequently, certain parts of the web app were different than other parts.  These differences had workarounds, but now the system is unified. Pick a MySQL instance, pick a time range, then view whatever part of the app you want and these selections will stay the same.  This is, of course, a natural expectation because it allows you to see easily examine a specific system at a specific time range from different perspectives.

There’s a lot more, but we don’t want to take up too much of your time!

Percona Cloud Tools is still in free beta, but not for much longer, so be sure to sign up today!

The post 5 great new features from Percona Cloud Tools for MySQL appeared first on MySQL Performance Blog.

Measuring failover time for ScaleArc load balancer

Di, 2014-08-19 12:00

ScaleArc hired Percona to benchmark failover times for the ScaleArc database traffic management software in different scenarios. We tested failover times for various clustered setups, where ScaleArc itself was the load balancer for the cluster. These tests complement other performance tests on the ScaleArc software – sysbench testing for latency and testing for WordPress acceleration.

We tested failover times for Percona XtraDB Cluster (PXC) and MHA (any traditional MySQL replication-based solution works pretty much the same way).

In each case, we tested failover with a rate limited sysbench benchmark. In this mode, sysbench generates roughly 10 transactions each second, even if not all 10 were completed in the previous second. In that case, the newly generated transactions are queued.

The sysbench command we used for testing is the following.

# while true ; do sysbench --test=sysbench/tests/db/oltp.lua --mysql-host= --mysql-user=root --mysql-password=admin --mysql-db=sbtest --oltp-table-size=10000 --oltp-tables-count=4 --num-threads=4 --max-time=0 --max-requests=0 --report-interval=1 --tx-rate=10 run ; sleep 1; done

The command is run in a loop, because typically at the time of failover, the application receives some kind of error while the virtual IP is moved, or while the current node is declared dead. Well-behaving applications are reconnecting and retrying in this case. Sysbench is not a well-behaving application from this perspective – after failover it has to be restarted.

This is good for testing the duration of errors during a failover procedure – but not the number of errors. In a simple failover scenario (ScaleArc is just used as a regular load balancer), the number of errors won’t be any higher than usual with ScaleArc’s queueing mechanism.


In this test, we used MHA as a replication manager. The test result would be similar regardless of how its asynchronous replication is managed – only the ScaleArc level checks would be different. In the case of MHA, we tested graceful and non-graceful failover. In the graceful case, we stopped the manager and performed a manual master switchover, after which we informed the ScaleArc software via an API call for failover.

We ran two tests:

  • A manual switchover with the manager stopped, switching over manually, and informing the load balancer about the change.
  • An automatic failover where the master was killed, and we let MHA and the ScaleArc software discover it.
ScaleArc+MHA manual switchover

The manual switchover was performed with the following command.

# time ( masterha_master_switch --conf=/etc/cluster_5.cnf --master_state=alive --new_master_host= --orig_master_is_new_slave --interactive=0 && curl -k -X PUT -d '{"apikey": "0c8aa9384ddf2a5756299a9e7650742a87bbb550"}' ) {"success":true,"message":"4214 Failover status updated successfully.","timestamp":1404465709,"data":{"apikey":"0c8aa9384ddf2a5756299a9e7650742a87bbb550"}} real 0m8.698s user 0m0.302s sys 0m0.220s

The curl command calls ScaleArc’s API to inform the ScaleArc software about the master switchover.

During this time, sysbench output was the following.

[ 21s] threads: 4, tps: 13.00, reads/s: 139.00, writes/s: 36.00, response time: 304.07ms (95%) [ 21s] queue length: 0, concurrency: 1 [ 22s] threads: 4, tps: 1.00, reads/s: 57.00, writes/s: 20.00, response time: 570.13ms (95%) [ 22s] queue length: 8, concurrency: 4 [ 23s] threads: 4, tps: 19.00, reads/s: 237.99, writes/s: 68.00, response time: 976.61ms (95%) [ 23s] queue length: 0, concurrency: 2 [ 24s] threads: 4, tps: 9.00, reads/s: 140.00, writes/s: 40.00, response time: 477.55ms (95%) [ 24s] queue length: 0, concurrency: 3 [ 25s] threads: 4, tps: 10.00, reads/s: 105.01, writes/s: 28.00, response time: 586.23ms (95%) [ 25s] queue length: 0, concurrency: 1

Only a slight hiccup is visible at 22 seconds. In this second, only 1 transaction was done, and 8 others were queued. These results show a sub-second failover time. The reason no errors were received is that ScaleArc itself queued the transactions during the failover process. If the transaction in question were done from an interactive client, the queuing itself would be visible as increased response time – for example a START TRANSACTION or an INSERT command is taking longer than usual, but no errors result. This is as good as it gets for graceful failover. ScaleArc knows about the failover (and in the case of a switchover initiated by a DBA, notifying the ScaleArc software can be part of the failover process). The queueing mechanism is quite configurable. Administrators can set up the timeout for the queue – we set it to 60 seconds, so if the failover doesn’t complete in that timeframe transactions start to fail.

ScaleArc+MHA non-graceful failover

In the case of the non-graceful failover MHA and the ScaleArc software have to figure out that the node died.

[ 14s] threads: 4, tps: 11.00, reads/s: 154.00, writes/s: 44.00, response time: 1210.04ms (95%) [ 14s] queue length: 4, concurrency: 4 ( sysbench restarted ) [ 1s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 1s] queue length: 13, concurrency: 4 [ 2s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 2s] queue length: 23, concurrency: 4 [ 3s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 3s] queue length: 38, concurrency: 4 [ 4s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 4s] queue length: 46, concurrency: 4 [ 5s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 5s] queue length: 59, concurrency: 4 [ 6s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 6s] queue length: 69, concurrency: 4 [ 7s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 7s] queue length: 82, concurrency: 4 [ 8s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 8s] queue length: 92, concurrency: 4 [ 9s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 9s] queue length: 99, concurrency: 4 [ 10s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 10s] queue length: 108, concurrency: 4 [ 11s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 11s] queue length: 116, concurrency: 4 [ 12s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 12s] queue length: 126, concurrency: 4 [ 13s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 13s] queue length: 134, concurrency: 4 [ 14s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 14s] queue length: 144, concurrency: 4 [ 15s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 15s] queue length: 153, concurrency: 4 [ 16s] threads: 4, tps: 0.00, reads/s: 0.00, writes/s: 0.00, response time: 0.00ms (95%) [ 16s] queue length: 167, concurrency: 4 [ 17s] threads: 4, tps: 5.00, reads/s: 123.00, writes/s: 32.00, response time: 16807.85ms (95%) [ 17s] queue length: 170, concurrency: 4 [ 18s] threads: 4, tps: 18.00, reads/s: 255.00, writes/s: 76.00, response time: 16888.55ms (95%) [ 18s] queue length: 161, concurrency: 4

The failover time in this case was 18 seconds. We looked at the results and found that a check the ScaleArc software does (which involves opening an SSH connection to all nodes in MHA) take 5 seconds, and ScaleArc declares the node dead after 3 consecutive checks (this parameter is configurable). Hence the high failover time. A much lower one can be achieved with more frequent checks and a different check method – for example checking the read-only flag, or making MHA store its state in the databases.

ScaleArc+Percona XtraDB Cluster tests

Percona XtraDB Cluster is special when it comes to high availability testing because of the many ways one can use it. Many people write only to one node to avoid rollback on conflicts, but we have also seen lots of folks using all the nodes for writes.

Also, graceful failover can be interpreted differently.

  • Failover can be graceful from both Percona XtraDB Cluster’s and from the ScaleArc software’s perspective. First the traffic is switched to another node, or removed from the node in question, and then MySQL is stopped.
  • Failover can be graceful from Percona XtraDB Cluster’s perspective, but not from the ScaleArc software’s perspective. In this case the database is simply stopped with service mysql stop, and the load balancer figures out what happened. I will refer to this approach as semi-graceful from now on.
  • Failover can be completely non-graceful if a node is killed, where neither Percona XtraDB Cluster, nor ScaleArc knows about its departure.

We did all the tests using one node at a time, and using all three nodes. What makes these test sets even more complex is that when only one node at a time used, some tests (the semi-graceful and the non-graceful one) don’t have the same result if the node removed is the used one or an unused one. This process involves a lot of tests, so for the sake of brevity, I omit the actual sysbench output here – they look either like the graceful MHA and non-graceful MHA case – and only present the results in a tabular format. In the case of active/active setups, to remove the nodes gracefully we first have to lower the maximum number of connections on that node to 0.

Failover type1 node (active)1 node (passive)all nodesGracefulsub-second (no errors)no effect at allsub-second (no errors)Semi-graceful4 seconds (errors)no effect at all3 secondsNon-graceful4 seconds (errors)6 seconds (no errors)7 seconds (errors)

In the previous table, active means that the failed node did receive sysbench transactions and passive means that it didn’t.

All the graceful cases are similar to MHA’s graceful case.

If only one node is used and a passive node is removed from the cluster, by stopping the database itself gracefully with the

# service mysql stop

command, it doesn’t have an effect on the cluster. For the subsequent cases of the graceful failover, switching on ScaleArc will enable queuing similar to MHA’s case. In case of the semi-graceful, if the passive node (which has no traffic) departs, it has no effect. Otherwise, the application will get errors (because of the unexpected mysql stop), and the failover time is around 3-4 seconds for the cases when only 1 node is active and when all 3 are active. This makes sense, because ScaleArc was configured to do checks (using clustercheck) every second, and declare a node dead after three consecutive failed checks. After the ScaleArc software determined that it should fail over, and it did so, the case is practically the same as the passive node’s removal from that point (because ScaleArc removes traffic in practice making that node passive).

The non-graceful case is tied to suspect timeout. In this case, XtraDB Cluster doesn’t know that the node departed the cluster, and the originating nodes are trying to send write sets to it. Those write sets will never be certified because the node is gone, so the writes will be stalled. The exception here is the case when the active node failed. After ScaleArc figures out that the node dies (three consecutive checks at one second intervals) a new node is chosen, but because only the failed node was doing transactions, no write sets are in the remaining two nodes’ queues, which are waiting for certification, so there is no need to wait for suspect timeout here.


ScaleArc does have reasonably good failover time, especially in the case when it doesn’t have to interrupt transactions. Its promise of zero-downtime maintenance is fulfilled both in the case of MHA and XtraDB Cluster.

The post Measuring failover time for ScaleArc load balancer appeared first on MySQL Performance Blog.

Getting my hands dirty on an OpenStack lab

Mo, 2014-08-18 10:00

Like you all may know, OpenStack is currently one of the coolest open source projects, so I was thrilled when I was asked to manage the deployment of an OpenStack lab for internal Percona use. Starting from basically zero, I created tasks in our Jira and assigned them to a pool of volunteer consultants. As usual in a service company, billing is the priority so I ended up losing the 2 senior guys but fortunately most of my time was with a customer that wasn’t very demanding and I could easily multitask with the project and fill the gap. So, here it goes…


To deploy the OpenStack lab we were given 8 similar servers in our Durham, N.C. offices. The specs are:

  • CPU: 12 physical cores (24 with HT)
  • Disks: 1 sata 4 TB drive and one 480GB SSD drive
  • Nics: 2x GbE
  • OS: Centos 6

The hardware is recent and decent, a good start.

Deployment choices

Given the hardware we had, I picked the first to be the controller and jumphost, the second to be the network node (a bit overkill) and the remaining 6 nodes would become the compute nodes. I also wanted to use Ceph and RBD with 2 types of volumes, the default using SATA and a SSD type using the SSD drives. The servers only have a single GbE interface to use with Ceph, that’s not ideal but sufficient.

So, we basically followed: OpenStack doc for Centos and had our share of learning and fun. Overall, it went relatively well with only a few hiccups with Neutron and Ceph.


Neutron is probably the most difficult part to understand. You need to be somewhat familiar with all the networking tools and protocols to find your way around. Neutron relies on network namespaces, virtual network switches, GRE tunnels, iptables, dnsmasq, etc. For my part, I discovered network namespaces and virtual network switches.

The tasks of providing isolated networks to different tenants with their own set of IPs and firewalling rules, on the same infrastructure, is a challenging tasks. I enjoyed a lot reading Networking in too much detail, from there, things just started to make sense.

A first issue we encountered was that the iproute package on Centos is old and it does not support the network namespaces. It just needs to be replaced by a newer version. It took me some time to understand how things are connected, each tenant has its own Gre tunnel id and vlan. I can only recommend you read the above document and look at your setup. Don’t forget, you have one set of iptables rules per network namespace… The network node is mainly dealing with Natting, SNAT and DNAT while the security group rules are set on the compute nodes. The “ip netns exec …” and the “ovs-ofctl dump-flows …” have been my best friends for debugging.

Once I got things working, I realized the network performance was, to say the least, pretty bad. I switch “gro off” on the Nics but it made very little change. Finally, I found how the MTU of the VMs were too large, an easy fix by adding a configuration file for dnsmasq with “dhcp-option-force=26,1400″. With an MTU of 1400, less than the NIC MTU + the GRE header, packets were no longer split and performance went back to normal.

More on Ceph

The integration of Ceph happened to be more challenging than I first thought. First, let’s say the documentation is not as polished as the Openstack one, there are some rough edges but nothing unmanageable. The latest version, at the time, had no rpms for Centos but that was easy to work around if you know rpmbuild. Same for the Centos RPM for qemu-kvm and nova required a patch to support rbd devices. I succeeded deploying Ceph over the SATA drives, configured Glance and Cinder to use it and that was all good. The patch for nova allows to launch instances on clones of an image. While normally the image has to be copied to the local drive, an operation that can take some time if the image is large, with a clone, barely a few second after you started a VM, it is actually starting. Very impressive, I haven’t tested but I suppose the same can be accomplished with btrfs or ZFS, shared over iscsi.

Since the servers all have a SSD drive, I also tackled the task of setting up a SSD volume type. That has been a bit tricky, you need to setup a rule so that a given storage pool uses the SSD drives. The SSD drives must be placed all in their own branch in the ceph osd tree and then, you need to decompile the current rule set (crush map), modify it by adding a rule that use the ssd drives, recompile and then define a storage pool that uses the ssd rule. Having done this, I modified the cinder configuration for the “volume-ssd” type and finally, I could mount a ssd backed volumes, replicated, to a VM, quite cool.

The only drawback I found using Ceph is when you want to create a snapshot. Ideally, Ceph should handle the snapshot and it should be kept there as is. The way it works is less interesting, a snapshot is created but it is then copied to /tmp of the compute node, uncompressed…, and then copied back to a destination volume. I don’t know why it is done like that, maybe some workaround for limitations of other Cinder backends. The compute nodes have only 30GB available for /tmp and with Ceph, you must use raw images so that’s quite limiting. I already started to look at the code, maybe this could be my first contribution to the Openstack project.

Overall impressions

My first impression is that OpenStack is a big project with many moving, many moving parts. Installing OpenStack is not a beginners project, my experience saved me quite a few times. Logging, when you activate verbose and even debug in the configuration files is abundant and if you take your time, you can figure out what is going on and why something is failing.

Regarding the distribution, I am not sure Centos was the right fit, I think we had a rougher ride than we could have had using Ubuntu. My feeling is that the packages, not necessarily the OpenStack ones, were not as up to date as they should have compared to Ubuntu.

Ceph rbd backend is certainly a good point in my experience, I always wanted to touch Ceph and this project has been a very nice opportunity. The rbd backend works very well and the ability to launch instances almost instantaneously is just awesome. Another great plus is the storage redundancy (2 replica), like EBS, and the storage saving of working only on a clone of the original image. Also, the integration of the SSD backed pool adds a lot of flexibility. There’s only the instance snapshot issue that will need to be worked on.

So, that’s the current status of our OpenStack lab. I hope to be able to add a few more servers that are currently idle, allowing me to replace the over powerful network node and recycle it as a compute node Another thing I would really like to do is to mix the hypervisor types, I’d like to be able to use a lightweight container like LXC or docker. Also, although this is just a lab for the consultants, I’d like to see how to improve its available which is currently quite low. So, more fun to come!

The post Getting my hands dirty on an OpenStack lab appeared first on MySQL Performance Blog.

Make a difference! See the world! Speak at Percona Live London; Santa Clara!

Fr, 2014-08-15 07:00

Twice each year members of the global open-source community converge on London in November and Santa Clara in April to network with, and learn from, some of the world’s most experienced and accomplished system architects, developers and DBAs.

And now it’s time for YOU to give back to this diverse and growing MySQL community. The Call for Speakers for Percona Live London (Nov. 3-4) closes Aug. 17 and the deadline for submitting talks for the ever-growing Percona Live MySQL Conference and Expo (April 13-16, 2015) expires Nov. 9.

If you like putting things off until the last minute, then it’s time to get to work! Aug. 17 is just two days away and November will be here before you know it (think of how fast summer has gone by already).

Close your eyes and visualize the beauty of London in November (well, usually)…. And who doesn’t enjoy spring in Silicon Valley? Think BIG: Why not submit papers for BOTH conferences? Be a MySQL jetsetter! If your proposal(s) is approved by the Conference Committee then you’ll receive a complimentary full-conference pass. (You’ll have to pay for London’s fish ‘n’ chips and Santa Clara’s famed fish tacos, though. Sorry!)

I’ve said it before (yes, just up above, but it’s important): Your participation in Percona Live London and the Percona Live MySQL Conference and Expo are opportunities to make a difference… to give back to the MySQL community. Don’t worry about being fancy. Some of the most popular talks have been on simple topics. Many people come to the conferences to learn – and we need content for MySQL newcomers as well as the veterans.

Struggling to come up with an idea? Share what you know; share what you’ve learned… the good, the bad and the ugly. These stories are gold and you owe it to the MySQL community to share them – because doing so could save the day of those who hear your tale.

It’s GO time….

  • Click here to submit your speaker proposal (yes, more than one submission is OK) for Percona Live London (November 3-4 at the Millennium Gloucester Conference Center).
  • Click here to submit your speaker proposal (again, the more the merrier) for the Percona Live MySQL Conference & Expo 2015 (April 13-16, 2015 at The Hyatt Regency Santa Clara and Santa Clara Convention Center).

Questions? Need some advice or help with your submission? Let me know in the comments section and we can connect via email. I look forward to hearing from you! I also have five limited-edition Percona t-shirts to send to the first five people who submit a talk and let me know about it here in the comments section (and yes, I will check to confirm!).

One more (OK, more than one) thing:

  • Phone your colleagues, friends and family alerting them that Early Bird pricing for Percona Live London ends Aug. 31!
  • Super Saver registration discounts for the Percona Live MySQL Conference & Expo 2015 are available through Nov. 16 (at 11:30 p.m. PST)!
  • Sponsorship opportunities (this is HUGE!) for next April’s Percona Live MySQL Conference & Expo 2015 are available! Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who typically attend the event. Current (very cool) sponsors include:
    • Diamond Plus: Continuent
    • Gold: Pythian
    • Silver: Box and Yelp
    • Exhibit Only: Blackbird
    • Media Sponsor: Database Trends & Applications , Datanami, Linux Journal, and O’Reilly Media

Pssst! One more thing (yes, yet another!): These are the people you must impress when submitting a speaker proposal…. (the more you know!)

Meet the Conference Committees!

Percona Live London and the 2015 Percona Live MySQL Conference & Expo 2015 each features a dedicated Conference Committee.

Percona Live London’s Conference Committee:

  • Cédric Peintre, DBA at Dailymotion
  • David Busby, Remote DBA EMEA Regional Lead/RDBA Security Lead at Percona
  • Colin Charles, Chief Evangelist, MariaDB
  • Luis Motta Campos, DBA at ebay Classifieds Group
  • Nicolai Plum, Senior Systems Architect at
  • Morgan Tocker, MySQL Community Manager at Oracle
  • Art van Scheppingen, Head of database engineering at Spil Games

The Percona Live MySQL Conference & Expo 2015 Conference Committee:

  • Tamar Bercovici, Senior Engineering Manager at Box
  • Colin Charles, Chief Evangelist for MariaDB
  • Sean Chighizola, Senior Director of IT Operations at Big Fish Games
  • Jeremy Cole, Sr. Systems Engineer at Google
  • Harrison Fisk, MySQL Performance Engineer at Facebook
  • Patrick Galbraith, Principal Engineer at HP
  • Jay Janssen, Consulting Lead at Percona
  • Shlomi Noach, Senior Software Engineer at Outbrain (Conference Chairman)
  • Chris Schneider, Database Architect at Groupon
  • John Scott, Lead Database Administrator at Wellcentive
  • Gwen Shapira, Solutions Architect at Cloudera
  • Shivinder Singh, Database Architect and Engineer at Verizon Wireless
  • Calvin Sun, Senior Engineering Manager at Twitter
  • Morgan Tocker, MySQL Community Manager at Oracle
  • Peter Zaitsev, Co-founder and CEO of Percona

The post Make a difference! See the world! Speak at Percona Live London; Santa Clara! appeared first on MySQL Performance Blog.

Benchmarking IBM eXFlash™ DIMM with sysbench fileio

Di, 2014-08-12 12:00

Diablo Technologies engaged Percona to benchmark IBM eXFlash™ DIMMs in various aspects. An eXFlash™ DIMM itself is quite an interesting piece of technology. In a nutshell, it’s flash storage, which you can put in the memory DIMM slots. Enabled by Diablo’s Memory Channel Storage™ technology, this practically means low latency and some unique performance characteristics.

These devices are special, because their full performance potential is unlocked by using multiple devices to leverage the parallelism of the memory subsystem. In a modern CPU there is more than one memory controller (4 is typical nowadays) and spreading eXFlash™ DIMMs across them will provide maximum performance. There are quite some details about the device that are omitted in this post, they can be found in this redbook.

Diablo technologies also provided us a 785 GB FusionIO ioDrive 2 card (MLC). We will compare the performance of it to 4 and 8 of the eXFlash DIMMs. The card itself is a good choice for comparison because of it’s popularity, and because as you will see, it provides a good baseline.


CPU: 2x Intel Xeon E5-2690 (hyper threading enabled)
FusionIO driver version: 3.2.6 build 1212
Operating system: CentOS 6.5
Kernel version: 2.6.32-431.el6.x86_64

Sysbench command used:

sysbench --test=fileio --file-total-size=${size}G --file-test-mode=${test_mode} --max-time=${rtime} --max-requests=0 --num-threads=$numthreads --rand-init=on --file-num=64 --file-io-mode=${mode} --file-extra-flags=direct --file-fsync-freq=0 --file-block-size=16384 --report-interval=1 run

The variables above mean the following:
Size: the devices always had 50% free space.
Test_mode: rndwr, rndrw, rndrd for the different types of workloads (read, write, mixed)
Rtime: 4 hours for asynchronsous tests, 30 minutes for synchronous ones.
Mode: sync or async dependening on which test are we talking about.

From the tdimm devices, we created RAID0 arrays with linux md (this is the recommended way to use them).
In case of 4 devices:

mdadm --create /dev/md0 --level=0 --chunk=4 --raid-devices=4 /dev/td{a,c,e,g}

In case of 4 devices:

mdadm --create /dec/md0 --level=0 --chunk=4 --raid-devices=8 /dev/td{a,b,c,d,e,f,g,h}

The filesystem used for the tests was ext4, it was created on the whole device (md0 or fioa, no parititions), which the block size of 4k.

We tested these with read-only, write-only and mixed workloads with sysbench fileio. The read/write ratio with the mixed test was 1.5. Apart from the workload itself, we varied the IO mode as well (we did tests with synchronous IO as well as asynchronous IO). We did the following tests.

  • Asynchronous read workload (16 threads, 16k block size, 4 hour long)
  • Asynchronous write workload (16 threads, 16k block size, 4 hour long)
  • Asynchronous mixed workload (16 threads, 16k block size, 4 hour long)
  • Synchronous read workload (1..1024 threads, 16k block size, 0.5 hour long)
  • Synchronous write workload (1..1024 threads, 16k block size, 0.5 hour long)
  • Synchronous mixed workload (1..1024 threads, 16k block size, 0.5 hour long)

The asynchronous read tests lasted 4 hours, because we tested the consistency of the device’s performance with those over time.

Asynchronous IOReads

The throughput is fairly consistent. With 8 dimms, the peak throughput goes above 3 GB/s in case of 8 eXFlash DIMMs.

At the beginning of the testing we expected that latency would drop significantly in case of eXFlash DIMMs. We based this assumption on the fact that the memory is even “closer” to the processor than the PCI-E bus. In case of 8 eXFlash DIMMs, it’s visible that we are able to utilize the memory channels completely, which gives a boost in throughput as well as it gives lower latency.


The consistent asynchronous write  throughput is around 1GB/sec in case of eXFlash DIMM_8. Each case at the beginning has a spike up in throughput: that’s the effect of caching. In eXFlash DIMM itself, the data can arrive much faster on the bus than it could be written to the flash devices.

In case of write latency, the PCI-E based device is Somewhat better, and also the performance is more consistent compared to the case where we were using 4 eXFlash DIMMs. When using 8, similarly to the case where only reads were tested in isolation, both the throughput increased and the latency drops.


In the mixed workload (which is quite relevant to some databases), the eXFlash DIMM shines. The throughput in case of 8 DIMMs can go as high as 1.1 GB/s for reads, while it’s doing roughly 750 MB/s writes.

In case of mixed workload, the latency is the eXFlash DIMMs is lower and far more consistent (the more DIMMs we use, the more consistent it will be).

Synchronous IOReads

In case of synchronous IO, we reached the peak throughput at 32 threads used in case of the PCI-E device, but needed 64 and 128 threads for the eXFlash DIMM_4 and eXFlash DIMM_8 configurations respectively (in both cases, the throughput at 32 threads was higher than the PCI-E device’s).

The read latency degrades much more gradually in case of the eXFlash DIMMs, which makes it very suitable for workloads like linkbench, where the buffer pool misses are more frequent than writes (many large web application have this kind of read mostly and the database doesn’t fit into memory profile).In order to be able to see the latency differences at a lower number of threads, this graph has to be zoomed in a bit. The following graph is the same (synchronous read latency), but the y axis has a maximum value of 10 ms.

The maximum value is hit at 256 threads in case of the PCI-E device, and performance is already severely degraded at 128 threads (which is the optimal degree of parallelism for the eXFlash DIMM_8 configuration). Massively parallel, synchronous reads is a workload where the eXFlash DIMMs are exceptionally good.


Because of caching mechanisms, we don’t need too much writer threads in order to utilize the devices. This makes sense, writes are easy to cache: after a write, the device can tell the operating system that the write is completed, when in reality it’s only completed later. In case of reads, this is different, when the application requests a block to read, the device can’t give the data to the application, but only read it later. Because of mechanisms like this, consistency is very important in case of write tests. The PCI-E device delivers the most consistent, but the lowest throughput. The reason for this is NUMA’s effect to the MCS performance. Because sysbench was allowed to use any CPU core, it may got lucky by hitting the DIMM which is in the CPU’s memory bank it’s currently running on, or may got unlucky, and the QPI was involved in getting the data. One case is not more rare than the other, hence we see the more wide bars on the graphs.

The latency is better in case of eXFlash DIMMs, but not as much as in case of isolated reads. It only seems to be slightly less in case of eXFlash DIMM_4, so let’s do a similar zoom here as we did in case of writes.

It’s visible on the graph that in case of lower concurrency, the eXFlash DIMM_4 and fusionio measurement is roughly the same, but as the workload becomes more parallel, the eXFlash DIMM will degrade more gradually. This more gradual degradation is also present in the eXFlash DIMM_8 case, but it starts to degrade later.


In case of mixed throughput, the PCI-E device similarly shows more consistent, but lower performance. Using 8 DIMMs means a significant performance boost: the 1.1 GB/s reads and 750 MB/s writes like in case of asynchronous IO is doable with 256 threads and above.

Similarly to the previous cases, the response times are slightly better in case of using eXFlash DIMMs. Because the latency is quite high in case of 1024 threads in each case, a similar zoom like previous cases will show more details.

At a low concurrency, the latency is better on the PCI-E device, at least the maximum latency. We can see some fluctuation in case of the eXFlash DIMMs. The reason for that the NUMA, on a single thread, which CPU has the storage the benchmark is accessing matters a lot. As the concurrency gets higher, the response time degrades in case of the flash device, but in the case of eXFlash DIMMs, the maximum remains the same and the response time becomes more consistent. At 32 threads, the mean latency of the devices are very close, with the PCI-E device being slightly better. From there, the eXFlash DIMMs are degrading more gracefully.


Overall, I am impressed, especially with the read latency at massive parallelism. So far this is the fastest MLC device we have tested. I would be curious how would 16 or 32 DIMMs would perform in a machine which has enough sockets and memory channels for that kind of configuration. Like we moved one step closer to the CPU with PCI-E based flash, MCS is one step closer to the CPU compared to the PCI-E based flash.

The post Benchmarking IBM eXFlash™ DIMM with sysbench fileio appeared first on MySQL Performance Blog.

Release Candidate Packages for Red Hat Enterprise Linux 7 and CentOS 7 now available

Fr, 2014-08-08 13:56

The Percona team is pleased to announce the Release Candidate for Percona Software packages for Red Hat Enterprise Linux 7 and CentOS 7.

With more than 1 million downloads and thousands of production deployments at small, mid-size and large enterprises, Percona software is a industry leading distribution that enhances the performance, scale, management and diagnosis of MySQL deployments.

This new packages bring the benefits of Percona Software to developers and administrators that are using Red Hat Enterprise Linux 7 and CentOS 7.

The new packages are available from our testing repository. The packages included are: :

In addition, we have included systemd service scripts for Percona Server and Percona XtraDB Cluster.

Please try out the Percona Software for Red Hat Enterprise Linux 7 and CentOS 7 by downloading and installing the packages from our testing repository as follows:

yum install

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

NOTE: These packages aren’t Generally Available (GA) and thus are not recommended for production deployments.

The post Release Candidate Packages for Red Hat Enterprise Linux 7 and CentOS 7 now available appeared first on MySQL Performance Blog.

Percona Toolkit 2.2.10 is now available

Fr, 2014-08-08 13:07

Percona is glad to announce the release of Percona Toolkit 2.2.10 on August 8, 2014 (downloads are available here and from the Percona Software Repositories). This release is the current GA (Generally Available) stable release in the 2.2 series.

Bugs Fixed:

  • Fixed bug 1287253: pt-table-checksum would exit with error if it would encounter deadlock when doing checksum. This was fixed by retrying the command in case of deadlock error.
  • Fixed bug 1311654: When used with Percona XtraDB Cluster, pt-table-checksum could show incorrect result if --resume option was used. This was fixed by adding a new --replicate-check-retries command line parameter. If you are having resume problems you can now set --replicate-check-retries N , where N is the number of times to retry a discrepant checksum (default = 1 , no retries). Setting a value of 3 is enough to completely eliminate spurious differences.
  • Fixed bug 1299387: pt-query-digest didn’t work correctly due to a changed logging format when field Thread_id has been renamed to Id. Fixed by implementing support for the new format.
  • Fixed bug 1340728: in some cases, where the index was of type “hash” , pt-online-schema-change would refuse to run because MySQL reported it would not use an index for the select. This check should have been able to be skipped using --nocheck-plan option, but it wasn’t. Option --nocheck-plan now ignores the chosen index correctly.
  • Fixed bug 1253872: When running pt-table-checksum or pt-online-schema on a server that is unused, setting the 20% max load would fail due to tools rounding the value down. This has been fixed by rounding the value up.
  • Fixed bug 1340364: Due to incompatibility of dash and bash syntax some shell tools were showing error when queried for version.

Percona Toolkit is free and open-source. Details of the release can be found in the release notes and the 2.2.10 milestone at Launchpad. Bugs can be reported on the Percona Toolkit launchpad bug tracker.

The post Percona Toolkit 2.2.10 is now available appeared first on MySQL Performance Blog.

New in Percona Replication Manager: Slave resync, Async stop

Di, 2014-08-05 14:39

Percona Replication Manager (PRM) continues to evolve and improve, I want to introduce two new features: Slave resync and Async stop.

Slave resync

This behavior is for regular non-gtid replication.  When a master crashes and is no longer available, how do we make sure all the slaves are in a consistent state. It is easy to find the most up to date slave and promote it to be the new master based on the master execution position, the PRM agent already does that but how do we apply the missing transactions to the other slaves.

In order to solve that problem, I modified a tool originally written by Yelp, that outputs the MD5 sums of the payload (XID boundaries) and the commit positions of a binlog file. It produces an output like:

root@yves-desktop:/home/yves/src/PRM/percona-pacemaker-agents/tools/prm_binlog_parser# ./prm_binlog_parser.static.x86_64 /opt/prm/binlog.000382 | tail -n 5 53844618,E5A35A971445FC8E77D730932869F2 53846198,E37AA6704BE572727B342ABE6EFA935 53847779,B17816AC37424BB87B3CD55226D5EB17 53848966,A7CFF846D281DB363FE4192C38DD7743 53850351,A9C5B7FC24A1BA3B97C332CC362633CE

Where the first field is the commit position and the second field, the md5 sum of the payload. The only type of transaction that is not supported is “LOAD DATA INTO”. Of course, since it relies on XID values, this only works with InnoDB. It also requires the “log_slave_updates” to be enabled. The sources and some static binaries can be found in the tools directory on the PRM github, link is at the bottom.

So, if the agent detects a master crash and the prm_binlog_parser tool is available (the prm_binlog_parser_path primitive parameter), upon promotion, the new master will look at its binary logs and publish to the pacemaker CIB, the last 3000 transactions. The 3000 transactions limit is from bash, the command line must be below 64k. With some work this limit could be raised but I believe, maybe wrongly, that it covers most of the cases. The published data in the CIB attribute also contains the corresponding binlog file.

The next step happens on the slaves, when they get the post-promotion notification. They look for the MD5 sum of their last transaction in the relay log, again using the prm_binlog_parser tool, find the matching file and position in the last 3000 transactions the new master published to the CIB and reconfigure replication using the corrected file and position.

The result is a much more resilient solution that helps having the slave in a consistent state. My next goal regarding this feature will be to look at the crashed master and attempt to salvage any transactions from the old master binlog in the case MySQL crashed but Pacemaker is still up and running.


The async_stop feature, enabled by the “async_stop” primitive parameter, allows for faster failover. Without this feature, when stopping mysqld, PRM will wait until is confirmed stopped before completing a failover. When there’re many InnoDB dirty pages, we all know that stopping mysql can take many minutes. Jervin Real, a colleague at Percona, suggested that we should fake to Pacemaker that MySQL is stopping in order to proceed faster with failover. After adding some safe guards, it proved to be a very good idea. I’ll spare you of the implementation details but now, if the setting is enabled, as soon as mysqld is effectively stopping, the failover completes. If it happens that Pacemaker wants to start a stopping MySQL instance, a very usual situation, an error will be generated.

PRM agents, tools and documentation can be found here:

The post New in Percona Replication Manager: Slave resync, Async stop appeared first on MySQL Performance Blog.

Q&A: Putting MySQL Fabric to use

Mo, 2014-08-04 18:56

Martin Arrieta and I gave an online presentation last week on “Putting MySQL Fabric To Use.” If you missed it, you can find a recording and the slides here, and the vagrant environment we used plus a transcript of the commands we ran here (be sure to check out the ‘sharding’ branch, as that’s what we used during the webinar).

Thank you all for attending and asking interesting questions. We were unable to answer all of them in the scheduled time, so here are our replies to all the questions.

What is GTID? And how does it relate to MySQL Fabric?
GTID stands for Global Transaction Identifier, and MySQL Fabric requires MySQL replication to be configured to used this in order to work. This means that it won’t work with MySQL versions previous to 5.6. You can find more information in the manual, and in the upcoming Percona webinar, “Using MySQL Global Transaction IDs in Production,” that our colleague Stephane Combaudon will present on August 27.

During any issue, does MySQL Fabric internally do a switch over or failover?
MySQL Fabric will only detect failures and initiate failovers for a group that has been activated. Otherwise, any changes must be made manually through the mysqlfabric command.

For an alter table, how do you avoid replication lag?
We think using pt-online-schema for the alter table is a good way to avoid or minimize this problem.

Are there benchmarks available for MySQL Fabric?
We’re not aware of any, but once we wrap up our current series of blog posts on the topic, we plan to do some benchmarks ourselves. Our interest lies in the overhead the connectors can place vs a standard connector accessing a single MySQL Server, but if you have other ideas, let us know in the comments and we’ll consider adding them to our list.

Can MySQL Fabric be used to handle Big Data?
We think the HA features are usable regardless of data set size, with the caveat that MySQL Fabric does not (currently) handle provisioning the data of the managed instances. It just configures and points replication as required, but you’re responsible for making sure all servers have the same set of data.

We feel that the sharding features, however, may not be a good fit for working with very large data sets yet, specially because of the use of mysqldump/mysql on the shard split/move operations. However, since sharding usually goes hand in hand with big data (a data set size too large is one of the reasons to shard after all) we’re sure this will get improved in the future. Or, someone with some python skills can adjust the scripts to use a more efficient mechanism to move data from one server to another, like Percona XtraBackup.

Does sharding require many changes to application coding (database code on shards etc) at the MySQL Fabric level? Should we sit with the developers/architects to help them understand tables, their usage, and to plan sharding?
Sharding with MySQL Fabric is not transparent (and we’d say sharding, in general, is typically not transparent) so some application coding and DBAs/Devs interaction will be needed. But you should be working close to your developers and architects (or DBAs and sysadmins, depending on which side of the counter you stand) anyway

Would you say that MySQL Fabric is a good HA replacement removing the need for VIPs?
We don’t see them as mutually exclusive. Simplifying, the HA features of MySQL Fabric are just another way to manage replication between a set of MySQL servers. It doesn’t force any way to make writable and readable nodes available to clients. With some coding, you can use MySQL Fabric together with VIPs, or dynamic DNS entries, or a load balancer like HA Proxy.

Does global group mean fabric node?
If you did not attend the webinar, we’d recommend you take a look at slide 7 to get the required context for this question.

Global group means a group that has the schema for all tables, all the data for global tables, and no data for sharded tables. Since it’s a group, it can be one or more nodes, but at any given time, there will only be one PRIMARY node in the group, and the PRIMARY nodes on all shard groups will be replication slaves of this node.

Is the config file at the application side or on the MySQL Fabric node?
There is some configuration at both ends. The MySQL Fabric node (the ‘store’ server in our Vagrant setup) has some configuration, but applications need their own too. The separation is rather clean though. Typically, the application only needs to know how to reach the MySQL Fabric server (host, port, credentials), while the MySQL Fabric node needs to know how to connect to the nodes it manages, how long to wait before starting a failover, etc. The configuration on MySQL Fabric’s side is done by the fabric.cfg file, while on the application side, it depends on the connector used.

When setting a faulty to secondary, does it also automatically come back in replication as a slave?
Yes, though remember you’ll first have to set it to SPARE

When we set primary and secondary and it sets the replication, it never copies data and builds replication? It only starts replication from that side and we will lose all the data which is not available on secondary
Yes to the first question. As to the second, we reiterate the fact that MySQL Fabric does not handle provisioning data for instances. When you’re using it to manage nodes, it (currently) assumes the nodes all have a copy of the data. It is your responsibility to make sure that is the case, though this may change in the future. To give you a specific example, suppose you run the following commands:

mysqlfabric group create ha
mysqlfabric group add ha node1
mysqlfabric group add ha node2

Then, on node1′s MySQL’s CLI:

create database test;

and finally run:

mysqlfabric group promote ha --slave_id=<uuid_for_node2>

You’ll end up with node2 being PRIMARY, node1 being SECONDARY, without the ‘test’ database on the PRIMARY node.

Are there any limitations around adding shards? Is it possible to re-configure the sharding keys to handle the additional shard(s)?
We don’t know about any limitations, and yes, it is possible to add more shards and configure the mapping. We did not have enough time to do this at the webinar, but the transcript on the repo includes examples for splitting and moving shards. What you want in this case is to add a new group (can be a single node) and then split a shard (optionally providing a pivot value), keeping one part on the original group and the rest on the new one. Fernando encountered one problem when doing this on a busy server, though the bug is not yet verified so that may have been a user error on his side, but we’d recommend you do similar tests yourself before attempting this in production.

Is the addition of Connectors at the MySQL Fabric level or even at application side? Is an application code change a huge effort?
It’s at the application side. It’s difficult to estimate the effort for the code changes, but if your database access is reasonably modularized/abstracted from your application logic, it shouldn’t be huge. In fact, for HA, if you don’t want to split read and writes, you can migrate an application to a MySQL Fabric-aware connector with just a couple of lines changed (changing the connection string and then requesting a MODE_READWRITE connection always).

What is the minimum server requirements to setup mysqlfabric and does it use any multithreading if i have more than one core for each node. Can we get the transcripts with examples..
We don’t know of the official minimum requirements, but it is a python script which in our experience has been lightweight. Remember it does *not* act as a middle man/proxy between your app and the servers, it is only involved when a client establishes a new connection or when a change in status happens in a server. As for multithreading, we know it’s concurrent (i.e. multiple clients can establish connections at the same time) but we don’t know if it’s parallel.

The transcript with examples can be found here.

How does MySQL Fabric handle latency issues? Does latency increase with mysql traffic?
MySQL Fabric uses regular MySQL connections so latency will affect it the same as any other app using MySQL, and yes, it will increase with traffic (but not due to Fabric).

What if MySQL Fabric is hung? Will connections going to the primary stop? And how can we deduce if there is an issue at the fabric side?
If MySQL Fabric is hung, the following scenarios will be problematic:

  • New connections come in. Clients will try to connect to Fabric before trying to connect any hosts (they won’t know what hosts to connect to otherwise!) so if MySQL Fabric is down, no new connections can be established.
  • A failure happens in the managed servers. MySQL Fabric is responsible for monitoring for failures and taking the appropriate actions (including promoting another server, repointing replicas, etc), so if something fails while MySQL Fabric is down, there will be nobody to take action.

Are there any scenarios where a data loss can happen when promoting/demoting nodes (or nodes failure happens) in a production environment?
Given the use of gtid-enforce-consistency and the fact that MySQL Fabric won’t promote a node to PRIMARY until it has caught up with any pending changes, we feel this is unlikely, but we’re planning a blog post specifically focused on evaluating the potential of data loss during failover.

How to configure MySQL Fabric to use synchronous / semi-synchronous mechanisms?
We have not attempted this, but from this manual page, we think currently only async replication is supported. I would expect sync to be in the roadmap though. If this is something you’d like to see, another section on the same page has some suggestions on how to make it happen.

Thanks again for attending the webinar, and feel free to post any further questions in the comments section.

The post Q&A: Putting MySQL Fabric to use appeared first on MySQL Performance Blog.

Advanced MySQL Query Tuning (Aug. 6) and MySQL 5.6 Performance Schema (Aug. 13) webinars

Mo, 2014-08-04 13:56

I will be presenting two webinars in August:

This Wednesday’s webinar on advanced MySQL query tuning will be focused on tuning the “usual suspects”: queries with “Group By”, “Order By” and subqueries; those query types are usually perform bad in MySQL and add an additional load as MySQL may need to create a temporary table(s) or perform a filesort. New this year: I will talk more about new MySQL 5.6 features that can increase MySQL query performance for subqueries and “order by” queries.

Next week’s performance schema webinar will be focused on practical use of Performance Schema in MySQL 5.6. I will show real-world examples of MySQL monitoring and troubleshooting using MySQL 5.6 Performance Schema. Performance Schema can give you a lots of valuable information about MySQL internal operations. For example, if using a multi-tenant shared MySQL installation, Performance Schema can give you a lots of extensive information about your MySQL load: breakdown by user, queries, tables and schema (see more examples in my previous blog post: Using MySQL 5.6 Performance Schema in multi-tenant environments).



The post Advanced MySQL Query Tuning (Aug. 6) and MySQL 5.6 Performance Schema (Aug. 13) webinars appeared first on MySQL Performance Blog.

Paris OpenStack Summit Voting – Percona Submits 16 MySQL Talks

Fr, 2014-08-01 05:00

MySQL plays a critical role in OpenStack. It serves as the host database supporting most components such as Nova, Glance, and Keystone and is the most mature guest database in Trove. Many OpenStack operators use Percona open source software including the MySQL drop-in compatible Percona Server and Galera-based Percona XtraDB Cluster as well as tools such as Percona XtraBackup and Percona Toolkit. We see a need in the community to understand how to improve MySQL performance in OpenStack. As a result, Percona, submitted 16 presentations for the Paris OpenStack Summit.

Paris OpenStack Summit presentations are chosen by OpenStack member voting. Please vote for our talks by clicking the titles below that interest you. You must be an OpenStack Foundation member to vote. If you aren’t a member, sign up here – it’s free and only takes a minute. The deadline to vote is Wednesday, August 6, 2014!

Paris OpenStack Summit MySQL Talks Submitted by PerconaOpenStack Operations

MySQL Database Operations in the OpenStack World
Speaker: Stéphane Combaudon

MySQL High Availability Options for Openstack
Speakers: Stéphane Combaudon

Host and Guest Database Backup and Recovery for OpenStack Ops
Speakers: George Lorch, David Busby

Benchmarking the Different Cinder Storage Backends
Speaker: Peter Boros

MySQL and OpenStack Deep Dive
Speakers: Peter Boros, Jay Pipes (Mirantis)

Trove Performance Tuning for MySQL
Speaker: Alexander Rubin

Schema Management: Versioning and Automation with Puppet and MySQL Utilities
Speaker: Frederic Descamps

Deploying Databases for OpenStack
Speakers: Matt Griffin, Jay Pipes (Mirantis), Amrith Kumar (Tesora), Vinay Joosery (Severalnines)

Related Open Source Software Projects

Introduction to Percona XtraDB Cluster
Speaker: Kenny Gryp

Percona Server Features for OpenStack and Trove Ops
Speakers: George Lorch, Vipul Sabhaya (HP Cloud)

Products, Tools & Services

ClusterControl: Efficient and reliable MySQL Management, Monitoring, and Troubleshooting for OpenStack HA
Speakers: Peter Boros, Vinay Joosery (Severalnines)

Advanced MySQL Performance Monitoring for OpenStack Ops
Speaker: Daniel Nichter

Targeting Apps for OpenStack Clouds

Oars in the Cloud: Virtualization-aware Galera instances
Speaker: Raghavendra Prabhu

ACIDic Clusters: Review of contemporary ACID-compliant databases with synchronous replication
Speaker: Raghavendra Prabhu

Cloud Security

Security: It’s more than just your database you should worry about
Speaker: David Busby

Planning Your OpenStack Project

Infrastructure at Scale
Speaker: Michael Coburn

The Paris OpenStack Summit will offer developers, operators, and service providers with valuable insights into OpenStack. The Design Summit sessions will be filled with lively discussions driving OpenStack development including sessions defining the future of Trove, the DBaaS (database as a service) component near and dear to Percona’s heart. There will also be many valuable presentations in the main Paris OpenStack Summit conference about operating OpenStack, utilizing the latest features, complimentary software and services, and real world case studies.

Thank you for your support. We’re looking forward to seeing many Percona software users at the Paris OpenStack Summit in November.

The post Paris OpenStack Summit Voting – Percona Submits 16 MySQL Talks appeared first on MySQL Performance Blog.

MariaDB: Selective binary logs events

Do, 2014-07-31 21:29

In the first post in a series on MariaDB features we find interesting, we begin with selectively skipping replication of binlog events. This feature is available on MariaDB 5.5 and 10.

By default when using MySQL’s standard replication, all events are logged in the binary log and those binary log events are replicated to all slaves (it’s possible to filter out some schema). But with this feature, it’s also possible to bypass some events to be replicated on the slave(s) even if they are written in the binary log. Having those event in the binary logs is always useful for point-in-time recovery.

Indeed, usually when we need to not replicate an event, we set sql_log_bin = 0 and the event is bypassed: neither written into the binlog, neither replicated to slave(s).

So with this new feature, it’s possible to just set a session variable to tag events that will be written into the binary log and bypassed on demand on some slaves.

And it’s really easy to use, on the master you do:

set skip_replication=1;

and on the slave(s) having replicate_events_marked_for_skip='FILTER_ON_MASTER' or 'FILTER_ON_SLAVE' the events skipped on the master won’t be replicated.

The valid values for replicate_events_marked_for_skip are:

  • REPLICATE (default) : skipped events are replicated on the slave
  • FILTER_ON_SLAVE : events so marked will be skipped on the slave and not replicated
  • FILTER_ON_MASTER : the filtering will be done on the master so the slave won’t even receive it and then save network bandwidth

That’s a cool feature but when this can be very useful?

Use case:

For archiving this can be very interesting. Indeed most of the time when people is archiving data, they use something like pt-archiver that deletes the data and copy the removed data on an archive server.

Thanks to this feature, instead of having an archiving server where we copy the deleted data, it’s possible to have a slave where we won’t delete the data. This will be much faster (smarter?) and allows to have an archiving server always up to date. Of course in this case sql_log_bin = 0 would have worked (if we ignore the point-in-time recovery).
But with a Galera Cluster? Yes that’s where this feature is really cool, if we would have used sql_log_bin = 0 on a Galera Cluster node, all other nodes would have ignored the delete and the result would be inconsistency between the nodes.

So if you use an asynchronous slave as an archiving server of a Galera Cluster, this feature is really mandatory.

As illustrated below, you can have a MariaDB Galera Cluster node joining a Percona XtraDB Cluster that will be used to delete historical data using pt-archiver:

pt-archiver is started with --set-vars "skip_replication=1"

The post MariaDB: Selective binary logs events appeared first on MySQL Performance Blog.

Percona Server 5.1.73-14.12 is now available

Do, 2014-07-31 13:52

Percona Server version 5.1.73-14.12

Percona is glad to announce the release of Percona Server 5.1.73-14.12 on July 31st, 2014 (Downloads are available here and from the Percona Software Repositories). Based on MySQL 5.1.73, including all the bug fixes in it, Percona Server 5.1.73-14.12 is now the current stable release in the 5.1 series. All of Percona‘s software is open-source and free, all the details of the release can be found in the 5.1.73-14.12 milestone at Launchpad.

NOTE: Packages for Debian 7.0 (wheezy), Ubuntu 13.10 (saucy) and 14.04 (trusty) are not available for this release due to conflict with newer packages available in those releases.

Bugs Fixed:

  • Percona Server couldn’t be built with Bison 3.0. Bug fixed #1262439, upstream #71250.
  • Ignoring Query Cache Comments feature could cause server crash. Bug fixed #705688.
  • Database administrator password could be seen in plain text when debconf-get-selections was executed. Bug fixed #1018291.
  • If XtraDB variable innodb_dict_size was set, the server could attempt to remove a used index from the in-memory InnoDB data dictionary, resulting in a server crash. Bugs fixed #1250018 and #758788.
  • Ported a fix from MySQL 5.5 for upstream bug #71315 that could cause a server crash f a malformed GROUP_CONCAT function call was followed by another GROUP_CONCAT call. Bug fixed #1266980.
  • MTR tests from binary tarball didn’t work out of the box. Bug fixed #1158036.
  • InnoDB did not handle the cases of asynchronous and synchronous I/O requests completing partially or being interrupted. Bugs fixed #1262500 (upstream #54430).
  • Percona Server version was reported incorrectly in Debian/Ubuntu packages. Bug fixed #1319670.
  • Percona Server source files were referencing Maatkit instead of Percona Toolkit. Bug fixed #1174779.
  • The XtraDB version number in univ.i was incorrect. Bug fixed #1277383.

Other bug fixes: #1272732, #1167486, and #1314568.

Release notes for Percona Server 5.1.73-14.12 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.1.73-14.12 is now available appeared first on MySQL Performance Blog.