MySQL High Performance

Syndicate content
Percona's MySQL & InnoDB performance and scalability blog
Updated: 4 hours 57 min ago

Percona Server 5.5.44-37.3 is now available

Mi, 2015-07-01 13:43


Percona is glad to announce the release of Percona Server 5.5.44-37.3 on July 1, 2015. Based on MySQL 5.5.44, including all the bug fixes in it, Percona Server 5.5.44-37.3 is now the current stable release in the 5.5 series.

Percona Server is open-source and free. Details of the release can be found in the 5.5.44-37.3 milestone on Launchpad. Downloads are available here and from the Percona Software Repositories.

Bugs Fixed:

  • Symlinks to libmysqlclient libraries were missing on CentOS 6. Bug fixed #1408500.
  • RHEL/CentOS 6.6 OpenSSL package (1.0.1e-30.el6_6.9), containing a fix for CVE-2015-4000, changed the DH key sizes to a minimum of 768 bits. This caused an issue for MySQL as it uses 512 bit keys. Fixed by backporting an upstream 5.7 fix that increases the key size to 2048 bits. Bug fixed #1462856 (upstream #77275).
  • innochecksum would fail to check tablespaces in compressed format. The fix for this bug has been ported from Facebook MySQL 5.1 patch. Bug fixed #1100652 (upstream #66779).
  • Issuing SHOW BINLOG EVENTS with an invalid starting binlog position would cause a potentially misleading message in the server error log. Bug fixed #1409652 (upstream #75480).
  • While using max_slowlog_size, the slow query log was rotated every time slow query log was enabled, not really checking if the current slow log is indeed bigger than max_slowlog_size or not. Bug fixed #1416582.
  • If query_response_time_range_base variable was set as a command line option or in a configuration file, its value would not take effect until the first flush was made. Bug fixed #1453277 (Preston Bennes).
  • Prepared XA transactions with update undo logs were not properly recovered. Bug fixed #1468301.
  • Variable log_slow_sp_statements now supports skipping the logging of stored procedures into the slow log entirely with new OFF_NO_CALLS option. Bug fixed #1432846.

Other bugs fixed: #1380895 (upstream #72322).

(Please also note that Percona Server 5.6 series is the latest General Availability series and current GA release is 5.6.25-73.0.)

Release notes for Percona Server 5.5.44-37.3 are available in our online documentation. Bugs can be reported on the launchpad bug tracker.

The post Percona Server 5.5.44-37.3 is now available appeared first on MySQL Performance Blog.

Percona Server 5.6.25-73.0 is now available

Mi, 2015-07-01 13:24

Percona is glad to announce the release of Percona Server 5.6.25-73.0 on July 1, 2015. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.25, including all the bug fixes in it, Percona Server 5.6.25-73.0 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free – and this is the latest release of our enhanced, drop-in replacement for MySQL. Complete details of this release can be found in the 5.6.25-73.0 milestone on Launchpad.

New Features:

Bugs Fixed:

  • Symlinks to libmysqlclient libraries were missing on CentOS 6. Bug fixed #1408500.
  • RHEL/CentOS 6.6 OpenSSL package (1.0.1e-30.el6_6.9), containing a fix for CVE-2015-4000, changed the DH key sizes to a minimum of 768 bits. This caused an issue for MySQL as it uses 512 bit keys. Fixed by backporting an upstream 5.7 fix that increases the key size to 2048 bits. Bug fixed #1462856 (upstream #77275).
  • Some compressed InnoDB data pages could be mistakenly considered corrupted, crashing the server. Bug fixed #1467760 (upstream #73689) Justin Tolmer.
  • innochecksum would fail to check tablespaces in compressed format. The fix for this bug has been ported from Facebook MySQL 5.6 patch. Bug fixed #1100652 (upstream #66779).
  • Using concurrent REPLACE, LOAD DATA REPLACE or INSERT ON DUPLICATE KEY UPDATE statements in the READ COMMITTED isolation level or with the innodb_locks_unsafe_for_binlog option enabled could lead to a unique-key constraint violation. Bug fixed #1308016 (upstream #76927).
  • Issuing SHOW BINLOG EVENTS with an invalid starting binlog position would cause a potentially misleading message in the server error log. Bug fixed #1409652 (upstream #75480).
  • While using max_slowlog_size, the slow query log was rotated every time slow query log was enabled, not really checking if the current slow log is indeed bigger than max_slowlog_size or not. Bug fixed #1416582.
  • Fixed possible server assertions when Backup Locks are used. Bug fixed #1432494.
  • If query_response_time_range_base variable was set as a command line option or in a configuration file, its value would not take effect until the first flush was made. Bug fixed #1453277 (Preston Bennes).
  • mysqld_safe script is now searching for libjemalloc.so.1 library, needed by TokuDB, in the basedir directory as well. Bug fixed #1462338.
  • Prepared XA transactions could cause a debug assertion failure during the shutdown. Bug fixed #1468326.
  • Variable log_slow_sp_statements now supports skipping the logging of stored procedures into the slow log entirely with new OFF_NO_CALLS option. Bug fixed #1432846.
  • TokuDB HotBackup library is now automatically loaded with mysqld_safe script. Bug fixed #1467443.

Other bugs fixed: #1457113, #1380895, and #1413836.

Release notes for Percona Server 5.6.25-73.0 are available in the online documentation. Please report any bugs on the launchpad bug tracker .

The post Percona Server 5.6.25-73.0 is now available appeared first on MySQL Performance Blog.

Using Cgroups to Limit MySQL and MongoDB memory usage

Mi, 2015-07-01 12:00

Quite often, especially for benchmarks, I am trying to limit available memory for a database server (usually for MySQL, but recently for MongoDB also). This is usually needed to test database performance in scenarios with different memory limits. I have physical servers with the usually high amount of memory (128GB or more), but I am interested to see how a database server will perform, say if only 16GB of memory is available.

And while InnoDB usually respects the setting of innodb_buffer_pool_size in O_DIRECT mode (OS cache is not being used in this case), more engines (TokuDB for MySQL, MMAP, WiredTiger, RocksDB for MongoDB) usually get benefits from OS cache, and Linux kernel by default is generous enough to allocate as much memory as available. There I should note that while TokuDB (and TokuMX for MongoDB) supports DIRECT mode (that is bypass OS cache), we found there is a performance gain if OS cache is used for compressed pages.

Well, an obvious recommendation on how to restrict available memory would be to use a virtual machine, but I do not like this because virtualization does come cheap and usually there are both CPU and IO penalties.

Other popular options I hear are:

  • to use "mem=" option in a kernel boot line. Despite the fact that it requires a server reboot by itself (so you can’t really script this and leave for automatic iterations through different memory options), I also suspect it does not work well in a multi-node NUMA environment – it seems that a kernel limits memory only from some nodes and not from all proportionally
  • use an auxiliary program that allocates as much memory as you want to make unavailable and execute mlock call. This option may work, but I again have an impression that the Linux kernel does not always make good choices when there is a huge amount of locked memory that it can’t move around. For example, I saw that in this case Linux starts swapping (instead of decreasing cached pages) even if vm.swappiness is set to 0.

Another option, on a raising wave of Docker and containers (like LXC), is, well, to use docker or another container… put a database server inside a container and limit resources this way. This, in fact, should work, but if you are lazy as I am, and do not want to deal with containers, we can just use Cgroups (https://en.wikipedia.org/wiki/Cgroups), which in fact are extensively used by mentioned Docker and LXC.

Using cgroups, our task can be accomplished in a few easy steps.

1. Create control group: cgcreate -g memory:DBLimitedGroup (make sure that cgroups binaries installed on your system, consult your favorite Linux distribution manual for how to do that)
2. Specify how much memory will be available for this group:
echo 16G > /sys/fs/cgroup/memory/DBLimitedGroup/memory.limit_in_bytesThis command limits memory to 16G (good thing this limits the memory for both malloc allocations and OS cache)
3. Now, it will be a good idea to drop pages already stayed in cache:
sync; echo 3 > /proc/sys/vm/drop_caches
4. And finally assign a server to created control group:

cgclassify -g memory:DBLimitedGroup `pidof mongod`

This will assign a running mongod process to a group limited by only 16GB memory.

On this, our task is accomplished… but there is one more thing to keep in mind.

This are dirty pages in the OS cache. As long as we rely on OS cache, Linux will control writing from OS cache to disk by two variables:
/proc/sys/vm/dirty_background_ratio and /proc/sys/vm/dirty_ratio.

These variables are percentage of memory that Linux kernel takes as input for flushing of dirty pages.

Let’s talk about them a little more. In simple terms:
/proc/sys/vm/dirty_background_ratio which by default is 10 on my Ubuntu, meaning that Linux kernel will start background flushing of dirty pages from OS cache, when amount of dirty pages reaches 10% of available memory.

/proc/sys/vm/dirty_ratio which by default is 20 on my Ubuntu, meaning that Linux kernel will start foreground flushing of dirty pages from OS cache, when amount of dirty pages reaches 20% of available memory. Foreground means that user threads executing IO might be blocked… and this is what will cause IO stalls for a user (and we want to avoid at all cost).

Why this is important to keep in mind? Let’s consider 20% from 256GB (this is what I have on my servers), this is 51.2GB, which database can make dirty VERY fast in write intensive workload, and if it happens that server has a slow storage (HDD RAID or slow SATA SSD), it may take long time for Linux kernel to flush all these pages, while stalling user’s IO activity meantime.

So it is worth to consider changing these values (or corresponding /proc/sys/vm/dirty_background_bytes and /proc/sys/vm/dirty_bytes if you like to operate in bytes and not in percentages).

Again, it was not important for our traditional usage of InnoDB in O_DIRECT mode, that’s why we did not pay much attention before to Linux OS cache tuning, but as soon as we start to rely on OS cache, this is something to keep in mind.

Finally, it’s worth remembering that dirty_bytes and dirty_background_bytes are related to ALL memory, not controlled by cgroups. It applies also to containers, if you are running several Docker or LXC containers on the same box, dirty pages among ALL of them are controlled globally by a single pair of dirty_bytes and dirty_background_bytes.

It may change it future Linux kernels, as I saw patches to apply dirty_bytes and dirty_background_bytes to cgroups, but it is not available in current kernels.

The post Using Cgroups to Limit MySQL and MongoDB memory usage appeared first on MySQL Performance Blog.

Playing with Percona XtraDB Cluster in Docker

Di, 2015-06-30 07:00

Like any good, thus lazy, engineer I don’t like to start things manually. Creating directories, configuration files, specify paths, ports via command line is too boring. I wrote already how I survive in case when I need to start MySQL server (here). There is also the MySQL Sandbox which can be used for the same purpose.

But what to do if you want to start Percona XtraDB Cluster this way? Fortunately we, at Percona, have engineers who created automation solution for starting PXC. This solution uses Docker. To explore it you need:

  1. Clone the pxc-docker repository:
    git clone https://github.com/percona/pxc-docker
  2. Install Docker Compose as described here
  3. cd pxc-docker/docker-bld
  4. Follow instructions from the README file:

    a) ./docker-gen.sh 5.6    (docker-gen.sh takes a PXC branch as argument, 5.6 is default, and it looks for it on github.com/percona/percona-xtradb-cluster)

    b) Optional: docker-compose build (if you see it is not updating with changes).

    c) docker-compose scale bootstrap=1 members=2 for a 3 node cluster

  5. Check which ports assigned to containers:
    $docker port dockerbld_bootstrap_1 3306 0.0.0.0:32768 $docker port dockerbld_members_1 4567 0.0.0.0:32772 $docker port dockerbld_members_2 4568 0.0.0.0:32776
    Now you can connect to MySQL clients as usual:
    $mysql -h 0.0.0.0 -P 32768 -uroot Welcome to the MySQL monitor.  Commands end with ; or g. Your MySQL connection id is 10 Server version: 5.6.21-70.1 MySQL Community Server (GPL), wsrep_25.8.rXXXX Copyright (c) 2009-2015 Percona LLC and/or its affiliates Copyright (c) 2000, 2015, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>
  6. To change MySQL options either pass it as a mount at runtime with something like volume: /tmp/my.cnf:/etc/my.cnf in docker-compose.yml or connect to container’s bash (docker exec -i -t container_name /bin/bash), then change my.cnf and run docker restart container_name

Notes.

  • If you don’t want to build use ready-to-use images
  • If you don’t want to run Docker Compose as root user add yourself to docker group

The post Playing with Percona XtraDB Cluster in Docker appeared first on MySQL Performance Blog.

Practical MySQL Performance Optimization (July 2 webinar)

Mo, 2015-06-29 14:24

Applications often become impacted by MySQL performance. Optimization is the obvious solution but where to start? Join me on July 2 at 11 a.m. Pacific and I’ll show you how to optimize MySQL in a practical way – spending time on what gives you the best return. Many of you have heard me talk on this topic before and this updated presentation will get you up to speed on the latest areas to consider when improving MySQL performance.

I’ll be pointing out the most important things you need to get right for each of the following:

  • Hardware
  • MySQL Configuration
  • Schema and Queries
  • Application Architecture

I’ll also share some tools and techniques to focus on the queries most important for your application. At the end of this webinar, aptly titled “Practical MySQL Performance Optimization,” you will know how to optimize MySQL performance in a practical way to ensure our system is “good enough” in terms of responsiveness, efficiency and scalability.

I hope to see you there! Register now to reserve your spot.

The post Practical MySQL Performance Optimization (July 2 webinar) appeared first on MySQL Performance Blog.

Tips for avoiding malware from a lesson learned

Fr, 2015-06-26 16:28

In a recent article on the Percona blog, I recommended readers to a tool called CamStudio for making technical screen recordings. The blog post was very popular and got 300+ Facebook likes in a short time. Providentially though, a reader commented that the installer (as downloaded from the project website) installed “pretty annoying adware on [his] PC.”

As I had been using a slightly dated installer, which did not show this issue, I started looking into the reader’s claims. Besides finding that the reader was correct in his claims about the project website’s installer, I found that even the installer from sourceforge.com (a well known open source download site) had a significant amount of adware in it.

However, the worst had yet to come. Reading through the CamStudio forum on SourceForge, I found out that the CamStudio binaries had apparently been plagued by adware and possibly also viruses and malware. I am however always somewhat suspicious of such reports; consider for example that CamStudio’s competitor TechSmith sells a very similar product (originally based on the same source code I believe) at $299 US per license. Not saying it happened, but one can easily see why competing companies may try to eliminate the open source/free competition.

Still, being cautious, I ran my older original installer (which did not have the adware issues) through virustotal.com, a Google service I learned about during this ‘adventure’. “Guess what” my daughter would say. It had a malware Trojan (Trojan.Siggen6.33552) in it which had only been discovered by a anti-virus software company last April, and only one in 56 virus scanners picked it up according to https://www.virustotal.com! Once the situation was clear, I immediately removed the blog post!

Clearly this was turning out not to be my day. Reading up on this Trojan proved that it was ‘designed for installation of another malware’. Given that Trojan.Siggen6.33552 had only been discovered in April, and given that it may have been installing other malware as per the anti-virus company who discovered it, I quickly decided to reinitialize my Windows PC. Better safe then sorry.

As I mentioned to my colleague David Busby, when you have something happen like this, you become much more security conscious! Thus, I did a review of my network security and was quite amazed at what I found, especially when compared with online security reports.

For example, we have uPnP (universal plug and play) on our routers, Skype automatically installs a (quite wide) hole in the Windows Firewall (seemingly even where it is not necessary), and we allow all 3rd party cookies in all our browsers. One would think this is all fine, but it makes things more easy for attackers!

     Besides virustotal.com, David showed me https://malwr.com – another great resource for analysing potential malwares.

Did you know that with the standard Skype settings, someone can easily work out your IP address? Don’t believe it? If you’re on Windows, go to Skype > Tools > Options > Advanced > Connection and hover your mouse over the blue/white question mark after ‘Allow direct connections to your contacts only’. You’ll see that it says “When you call someone who isn’t a contact, we’ll keep your IP address hidden. This may delay your call setup time.“ And apparently on Linux this option is not even directly available (more info here).

So, for example, to make Skype more secure I did 1) untick ‘use port 80 and 443 for additional incoming connections’, 2) setup a fixed port and punched a hole in the Windows firewall just for that port, for a specific program, a specific user, and for a specific IP range (much more restricted than the wide hole that was already there), 3) Removed the “Skype rule” which seemingly was placed there by the Skype installer, 4) Disabled uPnP on my router, 5) Disabled Skype from using uPnP, 6) Ticked ‘Allow direct connections to your contacts only’. Phewy. (Note that disabling uPnP (being a convenience protocol) can lead to some issues with smartTV’s / consoles / mobile phone apps if disabled.)

     All our networking & software setup these days is mostly about convenience.

Further reviewing the Windows firewall rules, I saw many rules that could be either removed or tied down significantly. It was like doing QA on my own network (phun intended :). The same with the router settings. Also did a router firmware upgrade, and installed the latest Windows security patches. All of the sudden that previously-annoying ‘we’ll just shut down your pc to install updates, even if you had work open’ feature in Windows seemed a lot more acceptable (I now have a one-week timeout for automatic restarts).

For the future ahead, when I download third party utilities (open source or not), I will almost surely run them through virustotal.com – a fantastic service by Google. It’s quite quick and easy to do; download, upload, check. I also plan to once in a while review Windows firewall rules, program security settings (especially for browsers and tools like Skype etc.), and see if there are Windows updates etc.

The most surprising thing of all? Having made all these security restrictions has given me 0% less functionality thus far.

Maybe it is indeed time we wake up about security.

The post Tips for avoiding malware from a lesson learned appeared first on MySQL Performance Blog.

Oracle license revenue and the MySQL ecosystem

Do, 2015-06-25 21:30

Oracle was in the news recently with the story of its license revenue declining as much as 17% in the recent quarter. This is blamed on transitioning to the cloud in some publications, but others, such as Bloomberg and TechRepublic, look deeper, seeing open source software responsible for the bulk of it.

Things are especially interesting in the MySQL ecosystem, as Oracle both owns its traditional “Enterprise” Oracle database and MySQL – a more modern open source database.

At Percona we see the same story repeating among many of our enterprise customers:

  1. MySQL proves itself. This generally happens one of two ways. One is for the enterprise using traditional enterprise databases, such as Oracle or DB2, to acquire a company which has been built on MySQL. After the dust settles the CFO or CIO discovers that the acquired company has been successfully running business-critical operations with MySQL and spending hundreds of thousands of dollars on database support instead of tens of millions. At this point it’s been shown that it can be done, so it should continue.

The other way is for MySQL to rise through the ranks in an organization. Typically it starts with some small MySQL use, such as running a bug tracking application in the IT department. Then it moves to MySQL being used with Drupal to power the main corporate website and an e-commerce function with Magento or something similar. Over time, MySQL proves itself and is trusted to handle more and more “core” enterprise databases that are absolutely critical for the business.

Interestingly enough, contrary to what some people have said, MySQL ownership by Oracle helps it to gain trust with many enterprise accounts. Enterprises may not like Oracle’s license and maintenance fees, but they like Oracle’s quality engineering, attention to security and predictable releases.

  1. New applications are built using MySQL. As the enterprise is ready to embrace MySQL it is added to the approved database list and now internal teams can use it to develop applications. In many cases the mandate goes even further with MySQL than with other open source technologies, as it is given preference, and teams need to really justify to management when they want to use Oracle or other proprietary database technologies. There are some cases when that may be warranted, but in most cases MySQL is good enough.
  1. Moving existing applications from Oracle to MySQL.  Depending on the organization and applications it can happen a couple of different ways. One is the equivalent applications are built from scratch on the new open source technology stack and the old application is retired. The other is only the database is migrated from Oracle to MySQL. Moving the database from Oracle to MySQL might be easy and might be close to a full application rewrite. For example, we see Java applications which often use the database as a simple data store through the ORM framework which can be moved to MySQL easily; on the other hand, applications built with extensive use of advanced stored procedures and Oracle-specific SQL extensions are much harder to move.

The wave of moving to open source database technologies will continue and we’re not alone in thinking that – Gartner believes that by 2018, 70% of new in-house applications will be built on open source database systems.

What are we currently seeing in the MySQL ecosystem? First, many customers tell us that they are looking at hefty price increases for MySQL support subscriptions. Some of the customers which had previously signed 5 year agreements with Sun (at the time it was acquired by Oracle) who are exploring renewing now, see price increases as much as 5x for a comparable environment. This is very understandable considering the pressures Oracle has on the market right now.

The issues, however, go deeper than the price. Many customers are not comfortable trusting Oracle to give them the best possible advice for moving from expensive Oracle to a much less expensive Oracle MySQL database. The conflicts are obvious when the highest financial reward comes to Oracle by proving applications can’t be moved to MySQL or any other open source database.

If you’re choosing MySQL, Oracle is financially interested in having you use the Enterprise Edition, which brings back many of the vendor lock-in issues enterprises are trying to avoid by moving to open source databases. Customers believe Oracle will ensure enterprise-only features are put in use in the applications, making it difficult to avoid renewing at escalating prices.

So what do our customers see in Percona which makes them prefer our support and other services to those of Oracle?

  • We are a great partner if you’re considering moving from the Oracle database to MySQL as we both have years of experience and no conflict of interest.
  • Percona Server, Percona XtraDB Cluster, Percona Xtrabackup and our other software for the MySQL ecosystem is 100% open source, which means we’re not trying to lock you into the “enterprise version” as we work together. Furthermore, many of the features which are only available in MySQL Enterprise Edition are available in the fully open source Percona Server, including audit, backup and authentication.
  • We are focused on solutions for your business, not pushing Percona-branded technology. If you choose to use Percona Server, great! If you are using MySQL, MariaDB, Amazon RDS, etc., that’s great too.

With the continuing trend of moving to open source database management systems the cost pressures on people running proprietary databases will continue to increase, and the only real solution is to accelerate moving to the open source stack. As you do that, you’re better off moving to completely open source technology, such as what is available from Percona, to avoid vendor lock-in. If you’re looking for the partner to help you to assess the migration strategy and execute the move successfully, check for conflicts of interests and ensure the interests of your and your provider are completely aligned.

The post Oracle license revenue and the MySQL ecosystem appeared first on MySQL Performance Blog.

Percona XtraDB Cluster (PXC): How many nodes do you need?

Di, 2015-06-23 15:12

A question I often hear when customers want to set up a production PXC cluster is: “How many nodes should we use?”

Three nodes is the most common deployment, but when are more nodes needed? They also ask: “Do we always need to use an even number of nodes?”

This is what we’ll clarify in this post.

This is all about quorum

I explained in a previous post that a quorum vote is held each time one node becomes unreachable. With this vote, the remaining nodes will estimate whether it is safe to keep on serving queries. If quorum is not reached, all remaining nodes will set themselves in a state where they cannot process any query (even reads).

To get the right size for you cluster, the only question you should answer is: how many nodes can simultaneously fail while leaving the cluster operational?

  • If the answer is 1 node, then you need 3 nodes: when 1 node fails, the two remaining nodes have quorum.
  • If the answer is 2 nodes, then you need 5 nodes.
  • If the answer is 3 nodes, then you need 7 nodes.
  • And so on and so forth.

Remember that group communication is not free, so the more nodes in the cluster, the more expensive group communication will be. That’s why it would be a bad idea to have a cluster with 15 nodes for instance. In general we recommend that you talk to us if you think you need more than 10 nodes.

What about an even number of nodes?

The recommendation above always specifies odd number of nodes, so is there anything bad with an even number of nodes? Let’s take a 4-node cluster and see what happens if nodes fail:

  • If 1 node fails, 3 nodes are remaining: they have quorum.
  • If 2 nodes fail, 2 nodes are remaining: they no longer have quorum (remember 50% is NOT quorum).

Conclusion: availability of a 4-node cluster is no better than the availability of a 3-node cluster, so why bother with a 4th node?

The next question is: is a 4-node cluster less available than a 3-node cluster? Many people think so, specifically after reading this sentence from the manual:

Clusters that have an even number of nodes risk split-brain conditions.

Many people read this as “as soon as one node fails, this is a split-brain condition and the whole cluster stop working”. This is not correct! In a 4-node cluster, you can lose 1 node without any problem, exactly like in a 3-node cluster. This is not better but not worse.

By the way the manual is not wrong! The sentence makes sense with its context.

There could actually reasons why you might want to have an even number of nodes, but we will discuss that topic in the next section.

Quorum with multiple data centers

To provide more availability, spreading nodes in several datacenters is a common practice: if power fails in one DC, nodes are available elsewhere. The typical implementation is 3 nodes in 2 DCs:

Notice that while this setup can handle any single node failure, it can’t handle all single DC failures: if we lose DC1, 2 nodes leave the cluster and the remaining node has not quorum. You can try with 4, 5 or any number of nodes and it will be easy to convince yourself that in all cases, losing one DC can make the whole cluster stop operating.

If you want to be resilient to a single DC failure, you must have 3 DCs, for instance like this:

Other considerations

Sometimes other factors will make you choose a higher number of nodes. For instance, look at these requirements:

  • All traffic is directed to a single node.
  • The application should be able to fail over to another node in the same datacenter if possible.
  • The cluster must keep operating even if one datacenter fails.

The following architecture is an option (and yes, it has an even number of nodes!):

Conclusion

Regarding availability, it is easy to estimate the number of nodes you need for your PXC cluster. But node failures are not the only aspect to consider: Resilience to a datacenter failure can, for instance, influence the number of nodes you will be using.

The post Percona XtraDB Cluster (PXC): How many nodes do you need? appeared first on MySQL Performance Blog.

Percona Monitoring Plugins 1.1.5 release

Mo, 2015-06-22 07:00

Percona is glad to announce the release of Percona Monitoring Plugins 1.1.5.

Changelog:

  • Added more DB instance classes to pmp-check-aws-rds.py (issue 1398911)
  • Added configurable query period and average time to pmp-check-aws-rds.py (issue 1436943)
  • Added region support to pmp-check-aws-rds.py (issue 1442980)
  • Added an option to alert when server is not configured as replica to pmp-check-mysql-replication-delay (issue 1357017)
  • Improved usage of lock-free SHOW SLAVE STATUS query (issue 1380690)
  • Fixed reporting of slave lag in ss_get_mysql_stats.php (issue 1389769)

We have also moved the code to Github https://github.com/percona/percona-monitoring-plugins but the bug tracker is still on Launchpad https://bugs.launchpad.net/percona-monitoring-plugins.

A new tarball is available from downloads area or in packages from our software repositories. The plugins are fully supported for customers with a Percona Support contract and free installation services are provided as part of some contracts. In addition as part of Percona’s Remote DBA installation and setup of these tools are included with our services. You can find links to the documentation, forums and more at the project homepage.

About Percona Monitoring Plugins
Percona Monitoring Plugins are high-quality components to add enterprise-grade MySQL monitoring and graphing capabilities to your existing in-house, on-premises monitoring solutions. The components are designed to integrate seamlessly with widely deployed solutions such as Nagios, Cacti and Zabbix.

The post Percona Monitoring Plugins 1.1.5 release appeared first on MySQL Performance Blog.

Q&A: High availability when using MySQL in the cloud

Fr, 2015-06-19 13:00

Last week I hosted a webinar on using MySQL in the cloud for High Availability (HA) alongside 451 Research analyst Jason Stamper. You can watch the recording and also download the slides (free) here. Just click the “Register” button at the end of that page.

We had several excellent questions and we didn’t have time to get to several of them in the allotted time. I’m posting them here along with the answers. Feel free to ask follow-up questions in the comments below.

Q: Can the TokuDB engine be used in a PXC environment?

A: No, TokuDB cannot currently be used in a PXC environment, the only supported engine in Percona XtraDB Cluster 5.6 is InnoDB.

Q: With Galera replication (PXC), is balancing the load on each node?

A: No, you need to implement your own load balancing and HA layer between your clients and the Percona XtraDB Cluster server.  Examples mentioned in the webinar include HAProxy and F5 BigIP.

Q: What’s the best version of Percona XtraDB Cluster regarding InnoDB performance?

A: In general for best performance you should be using the latest release of Percona XtraDB Cluster 5.6, which is currently 5.6.24, released on June 3rd, 2015.

Q: Can I redirect my writes in Percona XtraDB Cluster to multiple nodes using the HAProxy? While trying with SysBench I can see write-only goes to first nodes in PXC while reads does goes to multiple nodes.

A: Yes you can configure HAProxy to distribute both reads and writes across all of your nodes in a Percona XtraDB Cluster environment. Perhaps SysBench created only one database connection for all writes, and so haproxy kept those confined to only one host. You may want to experiment with parallel_prepare.lua.

Q: What’s the optimal HA for small datasets (db is less than 10gb)?

A: The optimal HA deployment for small datasets would be dependent on your level of recovery required (tolerance for loss of transactions) and time that you can be in an unavailable state (seconds, minutes, hours?).  Unfortunately there isn’t a single answer to your question, however, if you are interested in further discussion on this point Percona would be happy to coordinate a time to speak.  Please feel free to contact me directly and we can continue the conversation at michael.coburn@percona.com.

 Q: Is there a concept of local master vs. remote master with PXC?

A: No there is no concept of local vs remote master.  All nodes in a Percona XtraDB Cluster can now be classified as Master, regardless of their proximity to the clients.

Q: Are there any concerns when considering AWS RDS or AURORA DB for MySQL HA in the Cloud?

A: Regarding AWS RDS, yes this a good option for MySQL HA in the Cloud.  I unfortunately haven’t worked with Aurora DB that much yet so I don’t have an opinion on it’s suitability for HA in the Cloud.

Q: We tried out PXC awhile back and it used to lock everything whenever any ddl was done. Has that changed?

A: We would have to look at the specifics of your environment, however, there have been numerous improvements in the 1½ years of development since Percona XtraDB Cluster went Generally Available (GA) on January 30th, 2014 in version 5.6.15.

Q: Is using the arbitrator a must?

A: No the arbitrator role via the garbd daemon is generally only used when operating in a minimal environment of two nodes that contain the data and you need a third node for quorum but don’t want to store the data a third time.

Q: Can we do a cluster across different zones?

A: Yes you can. However be aware that the latency incurred for all cluster certification operations will be impacted by the round trip time between nodes.

Q: Does PXC also support the MyISAM database?

A: No, Percona XtraDB Cluster does not support any storage engine other than InnoDB as of PXC 5.6.

Q: How do load balancers affect the throughput in a Galera-based setup given that the write would be limited by the slowest node?

A: Load balancers will introduce some measure of additional latency in the form of CPU time in the load balancer layer as it evaluates its own ruleset, and also in network time due to additional hop via load balancer.  Otherwise there should be no perceptible difference in the write throughput of a Percona XtraDB Cluster with and without a load balancer as it relates to the “slowest node” factor.

Q: Have you used MaxScale yet? If so, what are your thoughts?

A: Unfortunately I haven’t used MaxScale however Yves Trudeau, Percona Principal Architect, has recently written about MaxScale in this blog post.

Q: How do you configure timeout and maintain persistent connection to HAProxy?

A: I would encourage you to refer to the HAProxy Documentation.

The post Q&A: High availability when using MySQL in the cloud appeared first on MySQL Performance Blog.

Getting EXPLAIN information from already running queries in MySQL 5.7

Do, 2015-06-18 07:00

When a new version of MySQL is about to be released we read a lot of blog posts about the performance and scalability improvements. That’s good but sometimes we miss some small features that can help us a lot in our day-to-day tasks. One good example is the blog post that Aurimas wrote about a new small feature in MySQL 5.6 that I didn’t know about until I read it: the Automatic InnoDB transaction log file size change. How cool is that?

I plan to write a series of blog posts that will show some of those small new features in MySQL 5.7 that are going to be really useful. I’m going to start with EXPLAIN FOR CONNECTION.

This feature allows us to run an EXPLAIN for an already running statement. Let’s say that you find a query that has been running for a long time and you want to check why that could be happening. In 5.7 you can just ask MySQL to EXPLAIN the query that a particular connection is running and get the execution path. You can use it if the query is a SELECT, DELETE, INSERT, REPLACE or UPDATE. Won’t work if the query is a prepared statement though.

Let me show you an example of how it works.

We have a long running join.

mysql [localhost] {msandbox} ((none)) > show processlist G *************************** 1. row *************************** Id: 9 User: msandbox Host: localhost db: employees Command: Query Time: 49 State: Sending data Info: select count(*) from employees, salaries where employees.emp_no = salaries.emp_no

Let’s see the execution plan for the query:

mysql [localhost] {msandbox} ((none)) > explain for connection 9 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 299540 filtered: 100.00 Extra: NULL *************************** 2. row *************************** id: 1 select_type: SIMPLE table: salaries partitions: NULL type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 2803840 filtered: 100.00 Extra: Using where; Using join buffer (Block Nested Loop)

The join between those tables is not using any index at all so there is some room for improvement here

Conclusion

You can use this feature to see why a query is running for too long and based on the info decide how to fix it and how to proceed. This is going to be a very useful feature for DBAs who want to diagnose performance problems and slow queries.

The post Getting EXPLAIN information from already running queries in MySQL 5.7 appeared first on MySQL Performance Blog.

Update on the InnoDB double-write buffer and EXT4 transactions

Mi, 2015-06-17 14:15

In a post, written a few months ago, I found that using EXT4 transactions with the “data=journal” mount option, improves the write performance significantly, by 55%, without putting data at risk. Many people commented on the post mentioning they were not able to reproduce the results and thus, I decided to further investigate in order to find out why my results were different.

So, I ran sysbench benchmarks on a few servers and found when the InnoDB double-write buffer limitations occur and when they don’t. I also made sure some of my colleagues were able to reproduce the results. Basically, in order to reproduce the results you need the following conditions:

  • Spinning disk (no SSD)
  • Enough CPU power
  • A dataset that fits in the InnoDB buffer pool
  • A continuous high write load with many ops waiting for disk

Using the InnoDB double write buffer on an SSD disk somewhat prevents us from seeing the issue, something good performance wise. That comes from the fact that the latency of each write operation is much lower. That makes sense, the double-writer buffer is an area of 128 pages on disk that is used by the write threads. When a write thread needs to write a bunch of dirty pages to disk, it first writes them sequentially to free slots in the double write buffer in a single iop and then, it spends time writing the pages to their actual locations on disk using typically one iop per page. Once done writing, it releases the double-write buffer slots it was holding and another thread can do its work. The presence of a raid controller with a write cache certainly helps, at least until the write cache is full. Thus, since I didn’t tested with a raid controller, I suspect a raid controller write cache will delay the apparition of the symptoms but if the write load is sustained over a long period of time, the issue with the InnoDB double write buffer will appear.

So, to recapitulate, on a spinning disk, a write thread needs to hold a lock on some of the double-write buffer slots for at least a few milliseconds per page it needs to write while on a SSD disk, the slots are released very quickly because of the low latency of the SSD storage. To actually stress the InnoDB double-write buffer on a SSD disk, one must push much more writes.

That leads us to the second point, the amount of CPU resources available. At first, one of my colleague tried to reproduce the results on a small EC2 instance and failed. It appeared that by default, the sysbench oltp.lua script is doing quite a lot of reads and those reads saturate the CPU, throttling the writes. By lowering the amount of reads in the script, he was then able to reproduce the results.

For my benchmarks, I used the following command:

sysbench --num-threads=16 --mysql-socket=/var/lib/mysql/mysql.sock
--mysql-database=sbtest --mysql-user=root
--test=/usr/share/doc/sysbench/tests/db/oltp.lua --oltp-table-size=50000000
--oltp-test-mode=complex --mysql-engine=innodb --db-driver=mysql
--report-interval=60 --max-requests=0 --max-time=3600 run

Both servers used were metal boxes with 12 physical cores (24 HT). With less CPU resources, I suggest adding the following parameters:

--oltp-point-selects=1
--oltp-range-size=1
--oltp-index-updates=10

So that the CPU is not wasted on reads and enough writes are generated. Remember we are not doing a generic benchmarks, we are just stressing the InnoDB double-write buffer.

In order to make sure something else isn’t involved, I verified the following:

  • Server independence, tried on 2 physical servers and one EC2 instance, Centos 6 and Ubuntu 14.04
  • MySQL provided, tried on MySQL community and Percona Server
  • MySQL version, tried on 5.5.37 and 5.6.23 (Percona Server)
  • Varied the InnoDB log file size from 32MB to 512MB
  • The impacts of the number of InnoDB write threads (1,2,4,8,16,32)
  • The use of Linux native asynchronous iop
  • Spinning and SSD storage

So, with all those verifications done, I can maintain that if you are using a server with spinning disks and a high write load, using EXT4 transactions instead of the InnoDB double write buffer yields to an increase in throughput of more than 50%. In an upcoming post, I’ll show how the performance stability is affected by the InnoDB double-write buffer under a high write load.

Appendix: the relevant part of the my.cnf

innodb_buffer_pool_size = 12G
innodb_write_io_threads = 8 # or else in {1,2,4,8,16,32}
innodb_read_io_threads = 8
innodb_flush_log_at_trx_commit = 0 # must be 0 or 2 to really stress the double write buffer
innodb_log_file_size = 512M # or 32M, 64M
innodb_log_files_in_group = 2
innodb_file_per_table
innodb_flush_method=O_DIRECT # or O_DSYNC
innodb_buffer_pool_restore_at_startup=300 # On 5.5.x, important to warm up the buffer pool
#innodb_buffer_pool_load_at_startup=ON # on 5.6, important to warm up the buffer pool
#innodb_buffer_pool_dump_at_shutdown=ON # on 5.6, important to warm up the buffer pool,
skip-innodb_doublewrite # or commented out
innodb_flush_neighbor_pages=none # or area for spinning

The post Update on the InnoDB double-write buffer and EXT4 transactions appeared first on MySQL Performance Blog.

Speed up GROUP BY queries with subselects in MySQL

Mo, 2015-06-15 18:32

We usually try to avoid subselects because sometimes they force the use of a temporary table and limits the use of indexes. But, when is good to use a subselect?

This example was tested over table a (1310723 rows), b, c and d ( 5 rows each) and with MySQL version 5.5 and 5.6.

Let’s suppose we have a query like this:

select a.name,sum(a.count) aSum,avg(a.position) aAVG,b.col1,c.col2,d.col3 from a join b on (a.bid = b.id) join c on (a.cid = c.id) join d on (a.did = d.id) group by a.name,b.id,c.id,d.id

What will MySQL do? First it will take the entire data set – this means that will go through each row scanning the value of  “bid,” “cid” and “did” and then apply the join to each table. At this point it has the complete data set and then it will start to cluster it, executing the sum and the average functions.

Let’s analyze it step by step:

  1. Scan each row of  table a which has 1310720 rows.
  2. Join each row of table a with b, c and d – this means that each of the 1310720 rows will be joined, making the temporary table bigger.
  3. Execute the group by which will scan again the 1310720 rows and creating the result data set.

What can we do to optimize this query? We can’t avoid the group by over the 1.3M rows, but we are able to avoid the join over 1.3M of rows. How? We need all of the information from table a for the “group by” but we don’t need to execute all the joins before clustering them. Let’s rewrite the query:

select a.name,aSum,aAVG,b.col1,c.col2,d.col3 from ( select name,sum(count) aSum ,avg(position) aAVG,bid,cid,did from a group by name,bid,cid,did) a join b on (a.bid = b.id) join c on (a.cid = c.id) join d on (a.did = d.id)

We see from the above query that we are doing the “group by” only over table a, the result data set of that subquery is just 20 rows. But what about the query response time? The first query took 2.3 sec avg and the optimized query took 1.8 sec average, half a second faster.

What about adding a covering index? The index that we can add will be:

alter table a add index (name,bid,cid,did,count,position);

The explain plan of both queries shows that it is using just the index to resolve the query.

Now, the response time of the original query is 1.9 sec which is near the time of the optimized query. However, the response time of the optimized query now is 0.7 sec, nearly 3x faster. The cons of adding this index is that we are indexing the whole table and it shows that the index length is near 80% of the data length.

If the original query had “where” conditions, it will depend over which field. Let’s suppose add c.col2=3:select a.name,sum(a.count) aSum,avg(a.position) aAVG,b.col1,c.col2,d.col3 from a join b on (a.bid = b.id) join c on (a.cid = c.id) join d on (a.did = d.id) where c.col2=3 group by a.name,b.id,c.id,d.idNow, in the new query, the subquery will change. Table c and the “where” clause must be added to the subquery:select a.name,aSum,aAVG,b.col1,a.col2,d.col3 from ( select a.name,sum(count) aSum ,avg(position) aAVG,bid,cid,did,c.col2 from a join c on (a.cid = c.id) where c.col2=3 group by name,bid,cid,did) a join b on (a.bid = b.id) join d on (a.did = d.id)

But the differences in times are not as big (original query 1.1 sec and new query 0.9). Why? because the original query will have less data to group by. Adding c.col2=3 to the original query, the amount of data to group by is reduced from 1.3M to 262k. Indeed, if you add more “where” conditions on different tables, the dataset to sort will be smaller and the speed-up will decrease.

Conclusion: We usually add the GROUP BY at the end of queries, and that is ok because the syntax forces us to do it. However we can use a subquery to group only the data that we need and then perform the joins over other tables. This could speed up some of our GROUP BY queries.

The post Speed up GROUP BY queries with subselects in MySQL appeared first on MySQL Performance Blog.

Percona XtraDB Cluster: Quorum and Availability of the cluster

Fr, 2015-06-12 13:44

Percona XtraDB Cluster (PXC) has become a popular option to provide high availability for MySQL servers. However many people are still having a hard time understanding what will happen to the cluster when one or several nodes leave the cluster (gracefully or ungracefully). This is what we will clarify in this post.

Nodes leaving gracefully

Let’s assume we have a 3-node cluster and all nodes have an equal weight, which is the default.

What happens if Node1 is gracefully stopped (service mysql stop)? When shutting down, Node1 will instruct the other nodes that it is leaving the cluster. We now have a 2-node cluster and the remaining members have 2/2 = 100% of the votes. The cluster keeps running normally.

What happens now if Node2 is gracefully stopped? Same thing, Node3 knows that Node2 is no longer part of the cluster. Node3 then has 1/1 = 100% of the votes and the 1-node cluster can keep on running.

In these scenarios, there is no need for a quorum vote as the remaining node(s) always know what happened to the nodes that are leaving the cluster.

Nodes becoming unreachable

On the same 3-node cluster with all 3 nodes running, what happens now if Node1 crashes?

This time Node2 and Node3 must run a quorum vote to estimate if it is safe continue: they have 2/3 of the votes, 2/3 is > 50%, so the remaining 2 nodes have quorum and they keep on working normally.

Note that the quorum vote does not happen immediately when Node2 and Node3 are not able to join Node1. It only happens after the ‘suspect timeout’ (evs.suspect_timeout) which is 5 seconds by default. Why? It allows the cluster to be resilient to short network failures which can be quite useful when operating the cluster over a WAN. The tradeoff is that if a node crashes, writes are stalled during the suspect timeout.

Now what happens if Node2 also crashes?

Again a quorum vote must be performed. This time Node3 has only 1/2 of the votes: this is not > 50% of the votes. Node3 doesn’t have quorum, so it stops processing reads and writes.

If you look at the wsrep_cluster_status status variable on the remaining node, it will show NON_PRIMARY. This indicates that the node is not part of the Primary Component.

Why does the remaining node stop processing queries?

This is a question I often hear: after all, MySQL is up and running on Node3 so why is it prevented from running any query? The point is that Node3 has no way to know what happened to Node2:

  • Did it crash? In this case, it is safe for the remaining node to keep on running queries.
  • Or is there a network partition between the two nodes? In this case, it is dangerous to process queries because Node2 might also process other queries that will not be replicated because of the broken network link: the result will be two divergent datasets. This is a split-brain situation, and it is a serious issue as it may be impossible to later merge the two datasets. For instance if the same row has been changed in both nodes, which row has the correct value?

Quorum votes are not held because it’s fun, but only because the remaining nodes have to talk together to see if they can safely proceed. And remember that one of the goals of Galera is to provide strong data consistency, so any time the cluster does not know whether it is safe to proceed, it takes a conservative approach and it stops processing queries.

In such a scenario, the status of Node3 will be set to NON_PRIMARY and a manual intervention is needed to re-bootstrap the cluster from this node by running:

SET GLOBAL wsrep_provider_options='pc.boostrap=YES';

An aside question is: now it is clear why writes should be forbidden in this scenario, but what about reads? Couldn’t we allow them?

Actually this is possible from PXC 5.6.24-25.11 with the wsrep_dirty_reads setting.

Conclusion

Split-brain is one of the worst enemies of a Galera cluster. Quorum votes will take place every time one or several nodes suddenly become unreachable and are meant to protect data consistency. The tradeoff is that it can hurt availability, because in some situations a manual intervention is necessary to instruct the remaining nodes that they can accept executing queries.

The post Percona XtraDB Cluster: Quorum and Availability of the cluster appeared first on MySQL Performance Blog.

Percona Live Europe 2015! Call for speakers; registration open

Mi, 2015-06-10 07:00

Percona Live is moving from London to Amsterdam this year and the event is also expanding to three full days. Percona Live Europe 2015, September 21-23, will be at the Mövenpick Hotel Amsterdam City Centre. The call for speakers and Super Saver registration are now open. Hurry though because the deadline for submitting a speaking proposal is June 21st and Super Saver registration ends July 5th!

This year’s conference will feature one day of tutorials and two days of keynote talks and breakout sessions related to MySQL, NoSQL and Data in the Cloud. You’ll get briefed on the hottest topics, learn about operating a high-performing deployment and hear from top-industry leaders describe the future of the ecosystem – encompassing MySQL, MariaDB, Percona Server, MongoDB (and more). Attendees include DBAs, sysadmins, developers, architects, CTOs, CEOs, and vendors from around the world.

Have something to say? Why not lead a breakout session or a tutorial?

Breakout sessions are 50 minutes including a Q&A. Tutorial sessions focus on an immediate and practical application of in-depth MySQL and NoSQL knowledge. Tutorial speakers should assume that attendees will have laptops to work through detailed and potentially hands-on presentations. Tutorials are typically three hours long including a Q&A, however, if you have content for a full day, submissions for 6-hour-long tutorials are also being accepted. If your tutorial or breakout session is approved, you’ll receive a complimentary full-conference pass.

Huge thanks to our Conference Committee!

There would be no Percona Live without the hard work of our conference committees. Meet this year’s Percona Live Europe 2015 Conference Committee – a dedicated group of experts in MySQL, NoSQL and Data in the Cloud:

  • Erik Beebe – Founder and CTO, ObjectRocket
  • Luis Motta Campos – Database Administrator, ebay Classifieds Group
  • Colin Charles – Chief Evangelist, MariaDB
  • César Trigo Esteban – Development Director, Gigigo
  • Kenny Gorman – Chief Technologist; Data. Office of the CTO, Rackspace
  • Amrith Kumar – Founder and CTO, Tesora
  • Giuseppe Maxia – Quality Assurance Architect, VMWare
  • Shlomi Noach – Senior Systems Engineer, Booking.com
  • Konstantin Osipov – Engineering Manager, Mail.Ru
  • Morgan Tocker – MySQL Community Manager, Oracle
  • Art van Scheppingen – Head of Database Engineering, Spil Games
  • Charity Majors- Production Engineering Manager, Facebook
  • Peter Zaitsev – Co-founder and CEO, Percona
Sponsorships

Sponsorship opportunities for Percona Live Europe 2015 are now available. Sponsors become part of a dynamic and fast-growing ecosystem and interact with hundreds of DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solution vendors, and entrepreneurs who typically attend the event. This year’s conference will feature expanded accommodations and turnkey kiosks.

Planning to attend?

Super Saver registration discounts for Percona Live Europe 2015 are available through July 5th (at 11:30 p.m. CEST). Visit the Percona Live Europe 2015 website for more information about the conference. Interested community members can also register to receive email updates about Percona Live Europe 2015.

Percona has also negotiated a special hotel rate at the Mövenpick Hotel Amsterdam City Centre. If you book your hotel before July 6th your delicious breakfast is included.

I hope to see you in Amsterdam!

The post Percona Live Europe 2015! Call for speakers; registration open appeared first on MySQL Performance Blog.

Auditing MySQL with McAfee and MongoDB

Di, 2015-06-09 13:00

Greetings everyone! Let’s discuss a 3rd Party auditing solution to MySQL and how we can leverage MongoDB® to make sense out of all of that data.

The McAfee MySQL Audit plugin does a great job of capturing, at low level, activities within a MySQL server. It does this through some non-standard APIs which is why installing and configuring the plugin can be a bit difficult. The audit information is stored in JSON format, in a text file, by default.

There is 1 JSON object for each action that takes place within MySQL. If a user logs in, there’s an object. If that user queries a table, there’s an object. Imagine 1000 active connections from an application, each doing 2 queries per second. That’s 2000 JSON objects per second being written to the audit log. After 24 hours, that would be almost 173,000,000 audit entries!

How does one make sense of that many JSON objects? One option would be to write your own parser in $YOUR_FAVORITE_LANGUAGE and convert the JSON to INSERT statements and write the data back to MySQL (Note: If you do this, you can whitelist this table within the plugin so that these INSERTs are not re-audit logged). Or, we can use a system designed to import, store and query JSON objects, such as MongoDB.

Install McAfee Audit Plugin

First we need to download the source code for the plugin and download the source code for the specific MySQL version you are running. This is not a complete step-by-step HOWTO on installing this plugin; just some high-level points.

My client for this exercise is still on Percona Server 5.1.73, so we need the source for that EXACT version from percona.com.

We can clone the mcafee/mysql-audit using git.

Unzip the MySQL source and compile it; just don’t do “make install”, only “./configure” and “make” are necessary.

Now compile the plugin. You may want to read the detailed instructions.

This next step is tricky and really only necessary if you are not using vanilla MySQL. It is a required step to allow the plugin to use those non-standard API’s I mentioned earlier. You need to extract the offsets for the plugin to work. Follow the instructions given.

Once that is all done, you can:

INSTALL PLUGIN AUDIT SONAME 'libaudit_plugin.so';

If the plugin fails to load, check MySQL’s error log for the reason why and confer with the plugin documentation on how to resolve.

We now need to enable audit logging because nothing is enabled by default.

SET GLOBAL audit_record_cmds = "select,insert,update,delete"; SET GLOBAL audit_json_file = ON; SET GLOBAL audit_record_objs = "*.*,{}"; SET GLOBAL audit_force_record_logins = ON;

Look inside @@datadir and you should see a file called mysql-audit.json. You can tail -f this file if you’d like to watch it to make sure data is being written.

If you’d like some more background reading on the audit plugin, check out Fernando’s post on Experiences with McAfee Audit Plugin.

Setting Up MongoDB

Let me begin by stating this is my first time really dealing with MongoDB in any real sense. I spun up an EC2 instance in AWS (m3.large, CentOS 6) and installed MongoDB using yum and the Mongo repositories.

As the ephemeral storage for my instance had been mounted at /opt, I changed just this one option in the supplied /etc/mongod.conf and restarted mongo (service mongod restart).

dbpath=/opt/mongo

I then copied the mysql-audit.json from the MySQL host using SSH:

[percona@mysql-host ~]$ scp -i .ssh/amazon.pem /data/mysql/mysql-audit.json root@54.177.22.22:/tmp/

Then I imported this JSON file directly into MongoDB:

[root@ip-10-255-8-15 ~]# mongoimport --db test --collection audit --drop --file /tmp/mysql-audit.json

The above mongoimport command specifies the database in which to import (test) and in which collection (audit). I also specify to –drop the database before importing. This drop is necessary because the Audit Plugin appends to JSON file and if we repeated these import steps without the –drop, we would be duplicating data.

If there is enough interest, via the comments below, I will investigate the potential of using the socket functionality of the Audit Plugin to have the events stream directly into mongo.

For now though, it’s a wash-rinse-repeat cycle; though there is the ability to rotate the JSON audit log after a certain amount of time and import each file on a daily basis.

Making Data Make Sense

Here is a sample “document” (ie: audit event) that is created by the Audit Plugin.

{ "_id" : ObjectId("5571ea51b1e714b8d6d804c8"), "msg-type" : "activity", "date" : "1433438419388", "thread-id" : "10214180", "query-id" : "295711011", "user" : "activebatchSVC", "priv_user" : "activebatchSVC", "host" : "ecn.corp", "ip" : "10.2.8.9", "cmd" : "select", "objects" : [ { "db" : "", "name" : "*", "obj_type" : "TABLE" }, { "db" : "risque", "name" : "markets_source_tfutvol_eab", "obj_type" : "VIEW" }, { "db" : "historical", "name" : "futureopt", "obj_type" : "TABLE" }, { "db" : "risque", "name" : "securities_futures_optdef", "obj_type" : "TABLE" }, { "db" : "risque", "name" : "markets_source_tfutvol_eab", "obj_type" : "VIEW" }, { "db" : "historical", "name" : "futureopt", "obj_type" : "TABLE" }, { "db" : "risque", "name" : "securities_futures_optdef", "obj_type" : "TABLE" } ], "query" : "SELECT far, bar, baz FROM mytable" }

!! MongoDB BUG !!

Notice that last field in the document is named “query.” When I attempted some basic aggregate() functions on this field, I received errors on bad syntax. After much frustration, lots Googling and repeated testing, I came to the only conclusion that “query” is a reserved word in MongoDB. There is little to no documentation on this, aside from an almost 3 year old bug report that simply helped confirm my suspicion.

To work around the above bug issue, let’s rename all of the “query” fields to “qry”:

db.audit.update({}, { $rename: { "query": "qry"} }, false, true);

Now we can begin.

Basic Command Counters

Using any of the “top level” fields in each document, we can run reports (called aggregates in Mongo). So an easy one is to get a list of all unique “commands” and how many times they occurred.

> db.audit.aggregate([ { $group: { "_id": "$cmd", "count": { $sum: 1 } } } ]); { "_id" : "Failed Login", "count" : 2 } { "_id" : "select", "count" : 458366 } { "_id" : "Connect", "count" : 455090 } { "_id" : "insert", "count" : 2 } { "_id" : "Quit", "count" : 445025 } { "_id" : null, "count" : 1 }

Breaking down the command above, we are grouping all values in the “cmd” field and counting them up. The SQL equivalent would be:

SELECT cmd, count(cmd) FROM audit GROUP BY cmd;

User Counts

Let’s get a list and count of all user activities. This will include any of the commands listed in the previous aggregate.

> db.audit.aggregate([ { $group: { "_id": "$user", "count": { $sum: 1 } } } ]); { "_id" : "baw", "count" : 1883 } { "_id" : "eq_shrd", "count" : 1 } { "_id" : "reski", "count" : 3452 } { "_id" : "alin", "count" : 1 } { "_id" : "oey", "count" : 62 } { "_id" : "dule", "count" : 380062 } { "_id" : "ashi", "count" : 802 } { "_id" : "tech_shrd", "count" : 392464 }

A couple interesting things come out here. Firstly, the tech_shrd user does the most ‘activities’ over all other users. Is this expected? Is this normal? Your environment will determine that.

Specific User Activities

Let’s pick a specific user and get their activity counts to make sure they aren’t doing anything weird.

> db.audit.aggregate([ ... { $match: { "user": "tech_shrd" } }, ... { $group: { "_id": "$cmd", "count": { $sum: 1 } } } ... ]); { "_id" : "select", "count" : 132970 } { "_id" : "Connect", "count" : 133919 } { "_id" : "Quit", "count" : 125575 }

The SQL equivalent:

SELECT cmd, count(cmd) FROM audit WHERE user = 'tech_shrd';

Activities By User

We saw above that there were 2 insert commands. Who ran those?

> db.audit.aggregate([ ... { $match: { "cmd": "insert" } }, ... { $group: { "_id": "$user", "count": { $sum: 1 } } } ... ]); { "_id" : "graz", "count" : 2 }

More simply, we could have just done this to see the entire document/record which would include the SQL that the user executed, timestamp, hostname, etc.

> db.audit.find({ "cmd": "insert" });

The SQL equivalents:

SELECT user, count(user) FROM audit WHERE cmd = 'insert'; SELECT * FROM audit WHERE cmd = 'insert';

Table Activity

The most complex example I could come up with was trying to find out how many times each table was referenced. In theory, with weeks or even months of audit data, we could decide which tables aren’t needed any longer by the application.

> db.audit.aggregate( ... { $unwind: "$objects" }, ... { $group: { _id : "$objects.name", count: { $sum: 1 } } }, ... { $sort: { "count": -1 } } ... ); { "_id" : "*", "count" : 17359 } { "_id" : "swaps", "count" : 4392 } { "_id" : "futureopt", "count" : 3666 } ...(more)

You’ll notice in the sample document above that “objects” is an array of objects with 1 element for each table/view referenced in the ‘qry’ field. We need to “unwind” this array into single elements before we can count them. If someone knows a better way, please let me know. The Audit Plugin uses “*” to represent a derived table from a sub-SELECT, which has no proper name. We can remove all of these using:

> db.audit.update({ }, { $pull: { "objects": { "name": "*" } } }, false, true);

Audit Plugin Caveat: The ‘objects’ array is not a distinct list of the tables involved. For example, a SELECT statement that self-joins twice would produce 3 identical elements in the ‘objects’ array for that audit record. This may skew results. If anyone knows a cool Mongo trick to remove duplicates, please share in the comments.

Conclusion

For a quick wrap-up, we installed the McAfee Audit Plugin, exported some audit data, set up a MongoDB instance in AWS and imported the audit data. As you can see, the possibilities are plentiful on what kind of information you can gather. Feel free to comment on an aggregation you’d like to see if we were running this type of audit on your system.

Cheers,
Matthew

The post Auditing MySQL with McAfee and MongoDB appeared first on MySQL Performance Blog.

MaxScale: A new tool to solve your MySQL scalability problems

Mo, 2015-06-08 13:00

Ever since MySQL replication has existed, people have dreamed of a good solution to automatically split read from write operations, sending the writes to the MySQL master and load balancing the reads over a set of MySQL slaves. While if at first it seems easy to solve, the reality is far more complex.

First, the tool needs to make sure it parses and analyses correctly all the forms of SQL MySQL supports in order to sort writes from reads, something that is not as easy as it seems. Second, it needs to take into account if a session is in a transaction or not.

While in a transaction, the default transaction isolation level in InnoDB, Repeatable-read, and the MVCC framework insure that you’ll get a consistent view for the duration of the transaction. That means all statements executed inside a transaction must run on the master but, when the transaction commits or rollbacks, the following select statements on the session can be again load balanced to the slaves, if the session is in autocommit mode of course.

Then, what do you do with sessions that set variables? Do you restrict those sessions to the master or you replay them to the slave? If you replay the set variable commands, you need to associate the client connection to a set of MySQL backend connections, made of at least a master and a slave. What about temporary objects like with “create temporary table…”? How do you deal when a slave lags behind or what if worse, replication is broken? Those are just a few of the challenges you face when you want to build a tool to perform read/write splitting.

Over the last few years, a few products have tried to tackle the read/write split challenge. The MySQL_proxy was the first attempt I am aware of at solving this problem but it ended up with many limitations. ScaleARC does a much better job and is very usable but it stills has some limitations. The latest contender is MaxScale from MariaDB and this post is a road story of my first implementation of MaxScale for a customer.

Let me first introduce what is MaxScale exactly. MaxScale is an open source project, developed by MariaDB, that aims to be a modular proxy for MySQL. Most of the functionality in MaxScale is implemented as modules, which includes for example, modules for the MySQL protocol, client side and server side.

Other families of available modules are routers, monitors and filters. Routers are used to determine where to send a query, Read/Write splitting is accomplished by the readwritesplit router. The readwritesplit router uses an embedded MySQL server to parse the queries… quite clever and hard to beat in term of query parsing.

There are other routers available, the readconnrouter is basically a round-robin load balancer with optional weights, the schemarouter is a way to shard your data by schema and the binlog router is useful to manage a large number of slaves (have a look at Booking.com’s Jean-François Gagné’s talk at PLMCE15 to see how it can be used).

Monitors are modules that maintain information about the backend MySQL servers. There are monitors for a replicating setup, for Galera and for NDB cluster. Finally, the filters are modules that can be inserted in the software stack to manipulate the queries and the resultsets. All those modules have well defined APIs and thus, writing a custom module is rather easy, even for a non-developer like me, basic C skills are needed though. All event handling in MaxScale uses epoll and it supports multiple threads.

Over the last few months I worked with a customer having a challenging problem. On a PXC cluster, they have more than 30k queries/s and because of their write pattern and to avoid certification issues, they want to have the possibility to write to a single node and to load balance the reads. The application is not able to do the Read/Write splitting so, without a tool to do the splitting, only one node can be used for all the traffic. Of course, to make things easy, they use a lot of Java code that set tons of sessions variables. Furthermore, for ISO 27001 compliance, they want to be able to log all the queries for security analysis (and also for performance analysis, why not?). So, high query rate, Read/Write splitting and full query logging, like I said a challenging problem.

We experimented with a few solutions. One was a hardware load balancer that failed miserably – the implementation was just too simple, using only regular expressions. Another solution we tried was ScaleArc but it needed many rules to whitelist the set session variables and to repeat them to multiple servers. ScaleArc could have done the job but all the rules increases the CPU load and the cost is per CPU. The queries could have been sent to rsyslog and aggregated for analysis.

Finally, the HA implementation is rather minimalist and we had some issues with it. Then, we tried MaxScale. At the time, it was not GA and was (is still) young. Nevertheless, I wrote a query logging filter module to send all the queries to a Kafka cluster and we gave it a try. Kafka is extremely well suited to record a large flow of queries like that. In fact, at 30k qps, the 3 Kafka nodes are barely moving with cpu under 5% of one core. Although we encountered some issues, remember MaxScale is very young, it appeared to be the solution with the best potential and so we moved forward.

The folks at MariaDB behind MaxScale have been very responsive to the problems we encountered and we finally got to a very usable point and the test in the pilot environment was successful. The solution is now been deployed in the staging environment and if all goes well, it will be in production soon. The following figure is simplified view of the internals of MaxScale as configured for the customer:

The blocks in the figure are nearly all defined in the configuration file. We define a TCP listener using the MySQL protocol (client side) which is linked with a router, either the readwritesplit router or the readconn router.

The first step when routing a query is to assign the backends. This is where the read/write splitting decision is made. Also, as part of the steps required to route a query, 2 filters are called, regexp (optional) and Genlog. The regexp filter may be used to hot patch a query and the Genlog filter is the logging filter I wrote for them. The Genlog filter will send a json string containing about what can be found in the MySQL general query log plus the execution time.

Authentication attempts are also logged but the process is not illustrated in the figure. A key point to note, the authentication information is cached by MaxScale and is refreshed upon authentication failure, the refresh process is throttled to avoid overloading the backend servers. The servers are continuously monitored, the interval is adjustable, and the server status are used when the decision to assign a backend for a query is done.

In term of HA, I wrote a simple Pacemaker resource agent for MaxScale that does a few fancy things like load balancing with IPTables (I’ll talk about that in future post). With Pacemaker, we have a full fledge HA solution with quorum and fencing on which we can rely.

Performance wise, it is very good – a single core in a virtual environment was able to read/write split and log to Kafka about 10k queries per second. Although MaxScale supports multiple threads, we are still using a single thread per process, simply because it yields a slightly higher throughput and the custom Pacemaker agent deals with the use of a clone set of MaxScale instances. Remember we started early using MaxScale and the beta versions were not dealing gracefully with threads so we built around multiple single threaded instances.

So, since a conclusion is needed, MaxScale has proven to be a very useful and flexible tool that allows to elaborate solutions to problems that were very hard to tackle before. In particular, if you need to perform read/write splitting, then, try MaxScale, it is best solution for that purpose I have found so far. Keep in touch, I’ll surely write other posts about MaxScale in the near future.

The post MaxScale: A new tool to solve your MySQL scalability problems appeared first on MySQL Performance Blog.

High availability using MySQL in the cloud

Do, 2015-06-04 07:00

Next Wednesday (June 10) I’ll be co-presenting a webinar on using MySQL in the cloud for High Availability (HA). Joining me will be 451 Research analyst Jason Stamper and together we’ll talk about the realities of HA using MySQL in the cloud and how vendors are responding to changing application requirements with new developments that can enhance your deployment.

We’ll also present a comparison of available solutions along with key best practices you can follow for successfully attaining HA in the cloud with MySQL. The webinar is scheduled for June 10 at 10 a.m. Pacific. Register here.

Together we’ll cover:

  • What do HA MySQL deployments in the cloud look like today?
  • What are the developing requirements of applications based on future growth and scalability needs?
  • How are key vendors responding to these needs with new features and solution offerings, including those from OpenStack, Amazon, and others?
  • A high level, more technical comparison of the solutions
  • Keys to a successful HA MySQL deployment, including scaling from a single-node application to a cluster of MySQL instances

At the end of this webinar, you will have a good understanding of the options available for High Availability using MySQL in the cloud and how your current HA MySQL deployment in the cloud compares. You’ll also learn the tradeoffs you face depending on your HA solution and be able to identify which vendors and technologies are best suited for your needs.

This webinar, as usual, is free. Register now to reserve your spot and I hope to see you next Wednesday!

The post High availability using MySQL in the cloud appeared first on MySQL Performance Blog.

Percona XtraDB Cluster 5.6.24-25.11 is now available

Mi, 2015-06-03 20:19

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on June 3rd 2015. Binaries are available from downloads area or from our software repositories.

Based on Percona Server 5.6.24-72.2 including all the bug fixes in it, Galera Replicator 3.11, and on Codership wsrep API 25.11, Percona XtraDB Cluster 5.6.24-25.11 is now the current General Availability release. All of Percona’s software is open-source and free, and all the details of the release can be found in the 5.6.24-25.11 milestone at Launchpad.

New Features:

  • Percona XtraDB Cluster now allows reads in non-primary state by introducing a new session variable wsrep_dirty_reads. This variable is boolean and is OFF by default. When set to ON, a Percona XtraDB Cluster node accepts queries that only read, but not modify data even if the node is in the non-PRIM state (#1407770).
  • Percona XtraDB Cluster now allows queries against INFORMATION_SCHEMA and PERFORMANCE_SCHEMA even with variables wsrep_ready and wsrep_dirty_reads set to OFF. This allows monitoring applications to monitor the node when it is even in non-PRIM state (#1409618).
  • wsrep_replicate_myisam variable is now both global and session variable (#1280280).
  • Percona XtraDB Cluster now uses getifaddrs for node address detection (#1252700).
  • Percona XtraDB Cluster has implemented two new status variables: wsrep_cert_bucket_count and wsrep_gcache_pool_size for better instrumentation of galera memory usage. Variable wsrep_cert_bucket_count shows the number of cells in the certification index hash-table and variable wsrep_gcache_pool_size shows the size of the page pool and/or dynamic memory allocated for gcache (in bytes).

Bugs Fixed:

  • Using concurrent REPLACE, LOAD DATA REPLACE or INSERT ON DUPLICATE KEY UPDATE statements in the READ COMMITTED isolation level or with the innodb_locks_unsafe_for_binlog option enabled could lead to a unique-key constraint violation. Bug fixed #1308016.
  • Using the Rolling Schema Upgrade as a schema upgrade method due to conflict with wsrep_desync would allows only one ALTER TABLE to run concurrently. Bugs fixed #1330944 and #1330941.
  • SST would resume even when the donor was already detected as being in SYNCED state. This was caused when wsrep_desync was manually set to OFF which caused the conflict and resumed the donor sooner. Bug fixed #1288528.
  • DDL would fail on a node when running a TOI DDL, if one of the nodes has the table locked. Bug fixed #1376747.
  • xinet.d mysqlchk file was missing type = UNLISTED to work out of the box. Bug fixed #1418614.
  • Conflict between enforce_storage_engine and wsrep_replicate_myisam for CREATE TABLE has been fixed. Bug fixed #1435482.
  • A specific trigger execution on the master server could cause a slave assertion error under row-based replication. The trigger would satisfy the following conditions: 1) it sets a savepoint; 2) it declares a condition handler which releases this savepoint; 3) the trigger execution passes through the condition handler. Bug fixed #1438990.
  • Percona XtraDB Cluster Debian init script was testing connection with wrong credentials. Bug fixed #1439673.
  • Race condition between IST and SST in xtrabackup-v2 SST has been fixed. Bugs fixed #1441762, #1443881, and #1451524.
  • SST will now fail when move-back fails instead of continuing and failing at the next step. Bug fixed #1451670.
  • Percona XtraDB Cluster .deb binaries were built without fast mutexes. Bug fixed #1457118.
  • The error message text returned to the client in the non-primary mode is now more descriptive ("WSREP has not yet prepared node for application use"), instead of "Unknown command" returned previously. Bug fixed #1426378.
  • Out-of-bount memory access issue in seqno_reset() function has been fixed.
  • wsrep_local_cached_downto would underflow when the node on which it is queried has no writesets in gcache.

Other bugs fixed: #1290526.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.6.24-25.11 is now available appeared first on MySQL Performance Blog.

Optimizing Percona XtraDB Cluster for write hotspots

Mi, 2015-06-03 13:50

Some applications have a heavy write workload on a few records – for instance when incrementing a global counter: this is called a write hotspot. Because you cannot update the same row simultaneously from multiple threads, this can lead to performance degradation. When using Percona XtraDB Cluster (PXC), some users try to solve this specific issue by writing on multiple nodes at the same time. Good idea or bad idea? Read on!

Simultaneous writes on a standalone InnoDB server

Say you have these 3 transactions being run simultaneously (id is the primary key of the table):

# T1 UPDATE t SET ... WHERE id = 100 # T2 UPDATE t SET ... WHERE id = 100 # T3 UPDATE t SET ... WHERE id = 101

All transactions will require a row lock on the record they want to modify. So T3 can commit at the same time than T1 and/or T2, because it will not lock the same record as T1 and T2.

But T1 and T2 cannot execute simultaneously because they need to set a lock on the same record. Let’s assume T2 is executed by InnoDB first, how long does T1 need to wait? It is essentially the time needed for T2 to execute.

Simultaneous writes on multiple nodes (PXC)

Now is it any different if you have a 3-node PXC cluster and if you want to run T1 on Node1 on T2 on Node2? Let’s review step by step how the cluster will execute these queries:

1. T1 starts executing on Node1: a row lock is set on the record where id=100. T2 also starts executing on Node2 and also sets a row lock on the record where id=100. How is it possible that 2 transactions set the same lock on the same record? Remember that locking in Galera is optimistic, meaning that a transaction can only set locks on the node where it is executing, but never on the remote nodes: here, T1 sets a lock on Node1 and T2 sets a lock on Node2.

2. Let’s assume T1 reaches commit before T2. T1 is then synchronously replicated on all nodes and it gets a global sequence number. At this point, a certification test is run on Node1 to determine whether there is any write conflicts between T1 and other “in-flight” transactions in the cluster. Go to the next section if you want some clarification about “in-flight” transactions and the certification test.

Back to our transactions: T1 is the first transaction to try to commit, so no other transaction is “in-flight”: the certification test will succeed and InnoDB will be able to apply the transaction on Node1. On Node2, the same certification test will be run and T1 will be put in the apply queue (and it will be applied at some point in the near future).

Ok, wait a minute! No other transaction is “in-flight”, really? What about T2 on Node2? Actually T2 is simply a local transaction on Node2 and it is not known by the cluster yet. Therefore it is not what I called an “in-flight” transaction and it does not play any role in the certification test.

3. Now T2 reaches commit on node2. It is then synchronously replicated to all nodes and a certification test will run on node2. If T1 and T2 did commit simultaneously, there is a good chance that when T2 starts committing, T1 is still in the apply queue of Node2. In this case, the certification test for T2 will fail. Why? To make sure that T1 will commit on all nodes no matter what, any other transaction that wants to set a lock on the record where id=100 will be rejected.

Then if the certification test for T2 fails, T2 is rolled back. The only option to commit T2 is to retry executing it.

Let’s assume that this second try is successful, how long did T2 have to wait before being executing? Essentially we had to execute T2 twice so we had to replicate it twice, each replication taking 1 network RTT, we had to roll T2 back on Node2 (rollback is expensive in InnoDB) and the application had to decide that T2 had to be executed a second time. That is a lot more work and wait compared to the scenario on a single server.

So where is the fundamental problem when we tried to write on several nodes? Galera uses optimistic locking, and we had to go very far in the execution of T2 before realizing that the query will not succeed. Multi-node writing is therefore not a good solution at all when the system sees write hotspots.

“In-flight” transactions and certification test

“In-flight” transactions are transactions that have already been applied on at least one node of the cluster but not on all nodes. Remember that even if replicating transactions is synchronous, applying committed transactions on remote nodes is not. So a transaction Tx can be committed and applied on node1 but not on node2: on node2, Tx will simply sit in an apply queue, waiting to be executed. Tx is then an “in-flight” transaction.

The goal of the certification test is to make sure that no transaction can prevent Tx to execute on node2: as Tx is already on node1 and as we want data consistency, we must make sure that Tx will execute successfully no matter what can happen. And the magic of Galera is that the test is deterministic so group communication is not necessary when a node runs a certification test.

Conclusion

So what are your options with PXC when the workload has write hotspots? The most obvious one is to write on a single node: then you will have the same locking model as with a standalone InnoDB server. Performance will not be as good as with a standalone server as you will have to pay for synchronous replication, but you will avoid the very expensive write conflicts.

Another option would be to rearchitect your application to write less often. For instance, for a global counter, you could maintain it with Redis and only periodically flush the value to PXC.

And if you want to understand more about the benefits and drawback of writing on multiple nodes of a PXC cluster, you can read these two posts.

The post Optimizing Percona XtraDB Cluster for write hotspots appeared first on MySQL Performance Blog.