MySQL High Performance

Syndicate content
Percona's MySQL & InnoDB performance and scalability blog
Updated: 3 hours 24 min ago

Store UUID in an optimized way

Fr, 2014-12-19 14:00

A few years ago Peter Zaitsev, in a post titled “To UUID or not to UUID,” wrote: There is timestamp based part in UUID which has similar properties to auto_increment and which could be used to have values generated at same point in time physically local in BTREE index.”

For this post I’ve rearranged the timestamp part of UUID (Universal Unique Identifier) and did some benchmarks.

Many people store UUID as char (36) and use as row identity value (PRIMARY KEY) because it is unique across every table, every database and every server and allow easy merging of records from different databases. But here comes the problem, using it as PRIMARY KEY causes the problems described below.

Problems with UUID
  • UUID has 36 characters which makes it bulky.
  • InnoDB stores data in the PRIMARY KEY order and all the secondary keys also contain PRIMARY KEY. So having UUID as PRIMARY KEY makes the index bigger which can not be fit into the memory
  • Inserts are random and the data is scattered.

Despite the problems with UUID, people still prefer it because it is UNIQUE across every table, can be generated anywhere. In this blog, I will explain how to store UUID in an efficient way by re-arranging timestamp part of UUID.

Structure of UUID

MySQL uses UUID version 1 which is a 128-bit number represented by a utf8 string of five hexadecimal numbers

  • The first three numbers are generated from a timestamp.
  • The fourth number preserves temporal uniqueness in case the timestamp value loses monotonicity (for example, due to daylight saving time).
  • The fifth number is an IEEE 802 node number that provides spatial uniqueness. A random number is substituted if the latter is not available (for example, because the host computer has no Ethernet card, or we do not know how to find the hardware address of an interface on your operating system). In this case, spatial uniqueness cannot be guaranteed. Nevertheless, a collision should have very low probability.

The timestamp is mapped as follows:
When the timestamp has the (60 bit) hexadecimal value: 1d8eebc58e0a7d7. The following parts of the UUID are set:: 58e0a7d7-eebc-11d8-9669-0800200c9a66. The 1 before the most significant digits (in 11d8) of the timestamp indicates the UUID version, for time-based UUIDs this is 1.

Fourth and Fifth parts would be mostly constant if it is generated from a single server. First three numbers are based on timestamp, so they will be monotonically increasing. Lets rearrange the total sequence making the UUID closer to sequential. This makes the inserts and recent data look up faster. Dashes (‘-‘) make no sense, so lets remove them.
58e0a7d7-eebc-11d8-9669-0800200c9a66 => 11d8eebc58e0a7d796690800200c9a66

Benchmarking

I created created three tables

  • events_uuid – UUID binary(16) PRIMARY KEY
  • events_int – Additional BIGINT auto increment column and made it as primary key and index on UUID column
  • events_uuid_ordered – Rearranged UUID binary(16) as PRIMARY KEY

I created three stored procedures which insert 25K random rows at a time into the respective tables. There are three more stored procedures which call the random insert-stored procedures in a loop and also calculate the time taken to insert 25K rows and data and index size after each loop. Totally I have inserted 25M records.

    • Data Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Data Size in MB

      The data size for UUID table is more than other two tables.
    • Index Size
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Index Size in MB
    • Total Size
      Horizontal Axis – Number of inserts x 25,000
      Vertical Axis – Total Size in MB
    • Time taken
      Horizontal axis – Number of inserts x 25,000
      Vertical axis – Time Taken in seconds

For the table with UUID as PRIMARY KEY, you can notice that as the table grows big, the time taken to insert rows is increasing almost linearly. Whereas for other tables, the time taken is almost constant.

The size of UUID table is almost 50% bigger than Ordered UUID table and 30% bigger than table with BIGINT as PRIMARY KEY. Comparing the Ordered UUID table BIGINT table, the time taken to insert rows and the size are almost same. But they may vary slightly based on the index structure.

root@localhost:~# ls -lhtr /media/data/test/ | grep ibd -rw-rw---- 1 mysql mysql  13G Jul 24 15:53 events_uuid_ordered.ibd -rw-rw---- 1 mysql mysql  20G Jul 25 02:27 events_uuid.ibd -rw-rw---- 1 mysql mysql  15G Jul 25 07:59 events_int.ibd

Table Structure

#1 events_int CREATE TABLE `events_int` (  `count` bigint(20) NOT NULL AUTO_INCREMENT,  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL,  `event` int(11) DEFAULT NULL,  `ref_url` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `campaign_id` binary(16) COLLATE utf8_unicode_ci DEFAULT '',  `unique_id` binary(16) COLLATE utf8_unicode_ci DEFAULT NULL,  `user_agent` varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,  `city` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `country` varchar(80) COLLATE utf8_unicode_ci DEFAULT NULL,  `demand_partner_id` binary(16) DEFAULT NULL,  `publisher_id` binary(16) DEFAULT NULL,  `site_id` binary(16) DEFAULT NULL,  `page_id` binary(16) DEFAULT NULL,  `action_at` datetime DEFAULT NULL,  `impression` smallint(6) DEFAULT NULL,  `click` smallint(6) DEFAULT NULL,  `sold_impression` smallint(6) DEFAULT NULL,  `price` decimal(15,7) DEFAULT '0.0000000',  `actioned_at` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',  `unique_ads` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL,  `notification_url` text COLLATE utf8_unicode_ci,  PRIMARY KEY (`count`),  KEY `id` (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #2 events_uuid CREATE TABLE `events_uuid` (  `id` binary(16) NOT NULL,  `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),  KEY `index_events_on_actioned_at` (`actioned_at`),  KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci; #3 events_uuid_ordered CREATE TABLE `events_uuid_ordered` (   `id` binary(16) NOT NULL,   `unit_id` binary(16) DEFAULT NULL, ~ ~ PRIMARY KEY (`id`),   KEY `index_events_on_actioned_at` (`actioned_at`),   KEY `index_events_unit_demand_partner` (`unit_id`,`demand_partner_id`) ) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

Conclusions

 

    • Create function to rearrange UUID fields and use it

DELIMITER // CREATE DEFINER=`root`@`localhost` FUNCTION `ordered_uuid`(uuid BINARY(36)) RETURNS binary(16) DETERMINISTIC RETURN UNHEX(CONCAT(SUBSTR(uuid, 15, 4),SUBSTR(uuid, 10, 4),SUBSTR(uuid, 1, 8),SUBSTR(uuid, 20, 4),SUBSTR(uuid, 25))); // DELIMITER ;

Inserts

INSERT INTO events_uuid_ordered VALUES (ordered_uuid(uuid()),'1','M',....);

Selects

SELECT HEX(uuid),is_active,... FROM events_uuid_ordered ;

    • Define UUID as binary(16) as binary does not have any character set

 

References

 

The post Store UUID in an optimized way appeared first on MySQL Performance Blog.

Making HAProxy 1.5 replication lag aware in MySQL

Do, 2014-12-18 15:48

HAProxy is frequently used as a software load balancer in the MySQL world. Peter Boros, in a past post, explained how to set it up with Percona XtraDB Cluster (PXC) so that it only sends queries to available nodes. The same approach can be used in a regular master-slaves setup to spread the read load across multiple slaves. However with MySQL replication, another factor comes into play: replication lag. In this case the approach mentioned for Percona XtraDB Cluster does not work that well as the check we presented only returns ‘up’ or ‘down’. We would like to be able to tune the weight of a replica inside HAProxy depending on its replication lag. This is what we will do in this post using HAProxy 1.5.

Agent checks in HAProxy

HAProxy 1.5 allows us to run an agent check, which is a check that can be added to a regular health check. The benefit of agent checks is that the return value can be ‘up’ or ‘down’, but also a weight.

What is an agent? It is simply a program that can be accessed from a TCP connection on a given port. So if we want to run an agent on a MySQL server that will:

  • Mark the server as down in HAProxy if replication is not working
  • Set the weight to 100% if the replication lag is < 10s
  • Set the weight to 50% if the replication lag is >= 10s and < 60s
  • Set the weight to 5% in all other situations

We can use a script like this:

$ less agent.php = 10 && $lag < 60){ return "up 50%"; } else return "up 5%"; } set_time_limit(0); $socket = stream_socket_server("tcp://127.0.0.1:$port", $errno, $errstr); if (!$socket) { echo "$errstr ($errno) n"; } else { while ($conn = stream_socket_accept($socket,9999999999999)) { $cmd = "$mysql -h127.0.0.1 -u$user -p$password -P$mysql_port -Ee "$query" | grep Seconds_Behind_Master | cut -d ':' -f2 | tr -d ' '"; exec("$cmd",$lag); $weight = set_weight($lag[0]); unset($lag); fputs ($conn, $weight); fclose ($conn); } fclose($socket); } ?>

If you want the script to be accessible from port 6789 and connect to a MySQL instance running on port 3306, run:

$ php agent.php 6789 3306

You will also need a dedicated MySQL user:

mysql> GRANT REPLICATION CLIENT ON *.* TO 'haproxy'@'127.0.0.1' IDENTIFIED BY 'haproxy_pwd';

When the agent is started, you can check that it is working properly:

# telnet 127.0.0.1 6789 Trying 127.0.0.1... Connected to 127.0.0.1. Escape character is '^]'. up 100% Connection closed by foreign host.

Assuming it is run locally on the app server, that 2 replicas are available (192.168.10.2 and 192.168.10.3) and that the application will send all reads on port 3307, you will define a frontend and a backend in your HAProxy configuration like this:

frontend read_only-front bind *:3307 mode tcp option tcplog log global default_backend read_only-back backend read_only-back mode tcp balance leastconn server slave1 192.168.10.2 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions server slave2 192.168.10.3 weight 100 check agent-check agent-port 6789 inter 1000 rise 1 fall 1 on-marked-down shutdown-sessions

Demo

Now that everything is set up, let’s see how HAProxy can dynamically change the weight of the servers depending on the replication lag.

No lag

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # HAProxy $ echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,200

Slave1 lagging

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 25 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,50 read_only-back,slave2,UP,100 read_only-back,BACKEND,UP,150

Slave2 down

# Slave1 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: 0 # Slave2 $ mysql -Ee "show slave status" | grep Seconds_Behind_Master Seconds_Behind_Master: NULL # echo "show stat" | socat stdio /run/haproxy/admin.sock | cut -d ',' -f1,2,18,19 # pxname,svname,status,weight read_only-front,FRONTEND,OPEN, read_only-back,slave1,UP,100 read_only-back,slave2,DOWN (agent),100 read_only-back,BACKEND,UP,100

Conclusion

Agent checks are a nice addition in HAProxy 1.5. The setup presented above is a bit simplistic though: for instance, if HAProxy fails to connect to the agent, it will not mark the corresponding as down. It is then recommended to keep a regular health check along with the agent check.

Astute readers will also notice that in this configuration, if replication is broken on all nodes, HAProxy will stop sending reads. This may not be the best solution. Possible options are: stop the agent and mark the servers as UP using the stats socket or add the master as a backup server.

And as a final note, you can edit the code of the agent so that replication lag is measured with Percona Toolkit’s pt-heartbeat instead of Seconds_Behind_Master.

The post Making HAProxy 1.5 replication lag aware in MySQL appeared first on MySQL Performance Blog.

InnoDB’s multi-versioning handling can be Achilles’ heel

Mi, 2014-12-17 16:05

I believe InnoDB storage engine architecture is great for a lot of online workloads, however, there are no silver bullets in technology and all design choices have their trade offs. In this blog post I’m going to talk about one important InnoDB limitation that you should consider.

InnoDB is a multiversion concurrency control (MVCC) storage engine which means many versions of the single row can exist at the same time. In fact there can be a huge amount of such row versions. Depending on the isolation mode you have chosen, InnoDB might have to keep all row versions going back to the earliest active read view, but at the very least it will have to keep all versions going back to the start of SELECT query which is currently running.

In most cases this is not a big deal – if you have many short transactions happening you will have only a few row versions to deal with. If you just use the system for reporting queries but do not modify data aggressively at the same time you also will not have many row versions. However, if you mix heavy updates with slow reporting queries going at the same time you can get into a lot of trouble.

Consider for example an application with a hot row (something like actively updated counter) which has 1000 updates per second together with some heavy batch job that takes 1000 to run. In such case we will have 1M of row versions to deal with.

Let’s now talk about how those old-row versions are stored in InnoDB – they are stored in the undo space as an essentially linked list where each row version points to the previous row version together with transaction visibility information that helps to decide which version will be visible by this query. Such design favors short new queries that will typically need to see one of the newer rows, so they do not have to go too far in this linked list. This might not be the case with reporting queries that might need to read rather old row version which correspond to the time when the query was started or logical backups that use consistent reads (think mysqldump or mydumper) which often would need to access such very old row versions.

So going through the linked list of versions is expensive, but how expensive it can get? In this case a lot depends upon whenever UNDO space fits in memory, and so the list will be traversed efficiently – or it does not, in which case you might be looking at the massive disk IO. Keep in mind undo space is not clustered by PRIMARY key, as normal data in InnoDB tables, so if you’re updating multiple rows at the same time (typical case) you will be looking at the row-version chain stored in many pages, often as little as one row version per page, requiring either massive IO or a large amount of UNDO space pages to present in the InnoDB Buffer pool.

Where it can get even worse is Index Scan. This is because Indexes are structured in InnoDB to include all row versions corresponding to the key value, current and past. This means for example the index for KEY=5 will contain pointers to all rows that either have value 5 now or had value 5 some time in the past and have not been purged yet. Now where it can really bite is the following – InnoDB needs to know which of the values stored for the key are visible by the current transaction – and that might mean going through all long-version chains for each of the keys.

This is all theory, so lets see how we can simulate such workloads and see how bad things really can get in practice.

I have created 1Bil rows “sysbench” table which takes some 270GB space and I will use a small buffer pool – 6GB. I will run sysbench with 64 threads pareto distribution (hot rows) while running a full table scan query concurrently: select avg(k) from sbtest1 Here is exact sysbench run done after prepare.

sysbench --num-threads=64 --report-interval=10 --max-time=0 --max-requests=0 --rand-type=pareto --oltp-table-size=1000000000 --mysql-user root --mysql-password=password --test /usr/share/doc/sysbench/tests/db/oltp.lua run

Here is the explain for the “reporting” query that you would think to be a rather efficient index scan query. With just 4 bytes 1 Billion of values would be just 4G (really more because of InnoDB overhead) – not a big deal for modern systems:

mysql> explain select avg(k) from sbtest1 G *************************** 1. row *************************** id: 1 select_type: SIMPLE table: sbtest1 type: index possible_keys: NULL key: k_1 key_len: 4 ref: NULL rows: 953860873 Extra: Using index 1 row in set (0.00 sec)

2 days have passed and the “reporting” query is still running… furthermore the foreground workload started to look absolutely bizarre:

[207850s] threads: 64, tps: 0.20, reads: 7.40, writes: 0.80, response time: 222481.28ms (95%), errors: 0.70, reconnects: 0.00 [207860s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207870s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207880s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207890s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207900s] threads: 64, tps: 2.70, reads: 47.60, writes: 11.60, response time: 268815.49ms (95%), errors: 0.00, reconnects: 0.00 [207910s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207920s] threads: 64, tps: 2.30, reads: 31.60, writes: 9.50, response time: 294954.28ms (95%), errors: 0.00, reconnects: 0.00 [207930s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [207940s] threads: 64, tps: 2.90, reads: 42.00, writes: 12.20, response time: 309332.04ms (95%), errors: 0.00, reconnects: 0.00 [207950s] threads: 64, tps: 0.20, reads: 4.60, writes: 1.00, response time: 318922.41ms (95%), errors: 0.40, reconnects: 0.00 [207960s] threads: 64, tps: 0.20, reads: 1.90, writes: 0.50, response time: 335170.09ms (95%), errors: 0.00, reconnects: 0.00 [207970s] threads: 64, tps: 0.60, reads: 13.20, writes: 2.60, response time: 292842.88ms (95%), errors: 0.60, reconnects: 0.00 [207980s] threads: 64, tps: 2.60, reads: 37.60, writes: 10.20, response time: 351613.43ms (95%), errors: 0.00, reconnects: 0.00 [207990s] threads: 64, tps: 5.60, reads: 78.70, writes: 22.10, response time: 186407.21ms (95%), errors: 0.00, reconnects: 0.00 [208000s] threads: 64, tps: 8.10, reads: 120.20, writes: 32.60, response time: 99179.05ms (95%), errors: 0.00, reconnects: 0.00 [208010s] threads: 64, tps: 19.50, reads: 280.50, writes: 78.90, response time: 27559.69ms (95%), errors: 0.00, reconnects: 0.00 [208020s] threads: 64, tps: 50.70, reads: 691.28, writes: 200.70, response time: 5214.43ms (95%), errors: 0.00, reconnects: 0.00 [208030s] threads: 64, tps: 77.40, reads: 1099.72, writes: 311.31, response time: 2600.66ms (95%), errors: 0.00, reconnects: 0.00 [208040s] threads: 64, tps: 328.20, reads: 4595.40, writes: 1313.40, response time: 911.36ms (95%), errors: 0.00, reconnects: 0.00 [208050s] threads: 64, tps: 538.20, reads: 7531.90, writes: 2152.10, response time: 484.46ms (95%), errors: 0.00, reconnects: 0.00 [208060s] threads: 64, tps: 350.70, reads: 4913.45, writes: 1404.09, response time: 619.42ms (95%), errors: 0.00, reconnects: 0.00 [208070s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208080s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208090s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208100s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208110s] threads: 64, tps: 1.60, reads: 24.20, writes: 6.80, response time: 42385.40ms (95%), errors: 0.10, reconnects: 0.00 [208120s] threads: 64, tps: 0.80, reads: 28.20, writes: 3.40, response time: 51381.54ms (95%), errors: 2.80, reconnects: 0.00 [208130s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208140s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208150s] threads: 64, tps: 0.00, reads: 0.00, writes: 0.00, response time: 0.00ms (95%), errors: 0.00, reconnects: 0.00 [208160s] threads: 64, tps: 0.60, reads: 14.20, writes: 2.40, response time: 93248.04ms (95%), errors: 0.80, reconnects: 0.00

As you can see we have long stretches of times when there are no queries completed at all… going to some spikes of higher performance. This is how it looks on the graph:

Corresponding CPU usage:

This shows what we are not only observing something we would expect with InnoDB design but also there seems to be some serve starvation happening in this case which we can confirm:

Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790809552640 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790804735744 has waited at row0sel.cc line 3506 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790810756864 has waited at row0sel.cc line 4125 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive number of readers 1, waiters flag 1, lock_word: ffffffffffffffff Last time read locked in file row0sel.cc line 4125 Last time write locked in file /mnt/workspace/percona-server-5.6-debian-binary/label_exp/ubuntu-trusty-64bit/percona-server-5.6-5.6.21-70.0/storage/innobase/row/row0ins.cc line 2357 --Thread 139790811158272 has waited at btr0cur.cc line 3852 for 194.00 seconds the semaphore: S-lock on RW-latch at 0x7f24d9c01bc0 '&block->lock' a writer (thread id 139790814770944) has reserved it in mode wait exclusive

Waiting for the given buffer pool block to become available for more than 3 minutes is a big issue – this lock should never be held by more than a few microseconds.

SHOW PROCESSLIST confirms even most basic selects by primary key can get stalled for long time

| 5499 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5500 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5501 | root | localhost | sbtest | Query | 185 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5502 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5503 | root | localhost | sbtest | Query | 14 | statistics | SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 ORDER BY c | 0 | 0 | | 5504 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5505 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=1 | 0 | 0 | | 5506 | root | localhost | sbtest | Query | 236 | updating | DELETE FROM sbtest1 WHERE id=1 | 0 | 0 | | 5507 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5508 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id BETWEEN 1 AND 1+99 | 0 | 0 | | 5509 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=1 | 0 | 0 | | 5510 | root | localhost | sbtest | Query | 14 | updating | UPDATE sbtest1 SET c='80873149502-45132831358-41942500598-17481512835-07042367094-39557981480-593123 | 0 | 0 | | 5511 | root | localhost | sbtest | Query | 236 | updating | UPDATE sbtest1 SET k=k+1 WHERE id=18 | 0 | 1 | | 5512 | root | localhost | sbtest | Query | 14 | statistics | SELECT c FROM sbtest1 WHERE id=7 | 0 | 0 | | 6009 | root | localhost | sbtest | Query | 195527 | Sending data | select avg(k) from sbtest1 | 0 | 0 |

How do I know it is UNDO space related issue in this case? Because it ends up taking majority of buffer pool

mysql> select page_type,count(*) from INNODB_BUFFER_PAGE group by page_type; +-------------------+----------+ | page_type | count(*) | +-------------------+----------+ | EXTENT_DESCRIPTOR | 1 | | FILE_SPACE_HEADER | 1 | | IBUF_BITMAP | 559 | | IBUF_INDEX | 855 | | INDEX | 2186 | | INODE | 1 | | SYSTEM | 128 | | UNDO_LOG | 382969 | | UNKNOWN | 6508 | +-------------------+----------+ 9 rows in set (1.04 sec)

And it does so in a very crazy way – when there is almost no work being done UNDO_LOG contents of the buffer pool is growing very rapidly while when we’re getting some work done the INDEX type pages take a lot more space. To me this seems like as the index scan is going it touches some hot rows and some not-so-hot ones, containing less row versions and so does not put much pressure on “undo space.”

Take Away: Now you might argue that this given workload and situation is rather artificial and rather narrow. It well might be. My main point here is what if you’re looking at just part of your workload, such as your main short application queries, and not taking reporting or backups into account “because their performance is not important.” In this case you might be in for a big surprise. Those background activities might be taking much more than you would expect, and in addition, they might have much more of a severe impact to your main application workload, like in this case above.

P.S: I’ve done more experiments to validate how bad the problem really is and I can repeat it rather easily even without putting system into overdrive. Even if I run sysbench injecting just 25% of the transactions the system is possibly capable of handling at peak I have “select avg(k) from sbtest1″ query on 1 billion row table to never complete as it looks like the new entries are injected into the index at this point faster than Innodb can examine which of them are visible.

The post InnoDB’s multi-versioning handling can be Achilles’ heel appeared first on MySQL Performance Blog.

OpenStack Live tutorials & sessions to bring OpenStack users up to speed

Di, 2014-12-16 17:28

I attended the OpenStack Paris summit last month (Percona had a booth there). It was my first opportunity to meet face-to-face with this thriving community of developers and users. I’m proud that Percona is part of this open source family and look forward to reconnecting with many of the developers and users I met in Paris – as well as meeting new faces – at OpenStack Live in Silicon Valley April 13-14.

OpenStack summits, generally held twice a year, are the place where (for the most part) developers meet and design “in the open,” as the OpenStack organization says. OpenStack Live 2015, held in parallel with the annual Percona Live MySQL Conference and Expo, will be a unique opportunity for users and enthusiasts to learn from leading OpenStack experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

OpenStack Live will also provide some serious classroom-style learning. Percona announced the OpenStack Live tutorials sessions a couple days ago. Most sessions are three hours long and because they really are “hands-on” require that you bring your laptop – and a power cord (not to be confused with a “power chord,” though those also welcome”).

Let’s take a closer look at the OpenStack Live tutorial sessions.

Barbican: Securing Your Secrets.” Join Rackspace gurus Douglas Mendizábal, Chelsea Winfree and Steve Heyman on a tour through the magical world of Barbican (yes, they are dedicated members of the Barbican project).

Don’t be intimidated if don’t have any previous experience with Barbican (and if you’ve never heard of it, more the reason to attend!). A basic understanding of security components (such as keys and certificates) and a basic understanding of ReST is helpful, but not required.

By the end of the class you will know:
1)   Importance of secret storage
2)   How to store & retrieve secrets with Barbican
3)   How to submit an order with Barbican
4)   How to create a container
5)   Use cases for Barbican / Examples
6)   The future of Barbican –Ordering SSL Certs

Deploying, Configuring and Operating OpenStack Trove.” As the title suggests, these three hours focus squarely on Trove. The tutorial – led by Tesora founder & CTO Amrith Kumar, along with Doug Shelley, the company’s vice president of product development – will begin with a quick overview of OpenStack and the various services.

If you attend this tutorial you’ll actually deploy your own OpenStack environment – and create and manage a Nova (compute) instance using a command line and a graphical user interface (Horizon). And the fun continues! You’ll then install and configure Trove, and create and manage a single MySQL instance. Finally, pupils will create and operate a simple replicated MySQL instance pair and ensure that data is being properly replicated from master to slave.

Essential DevStack.” DevStack is an opinionated script to quickly create an OpenStack development environment. It can also be used to demonstrate starting/running OpenStack services and provide examples of using them from a command line. The power of DevStack lies within small trick that if people understand can hugely improve the contribution effectiveness, quality and required time. This three-hour tutorial will be led by Red Hat senior software engineer Swapnil Kulkarni.

OpenStack Networking Introduction,” with PLUMgrid’s Valentina Alaria and Brendan Howes. Buckle your seat belts! Designed for IT professionals looking to expand their OpenStack “networking” (no, not the LinkedIn sort of networking) knowledge, OpenStack Networking Fundamentals will be a comprehensive and fast-paced course.

This half-day training provides an overview of OpenStack, its components and then dives deep into OpenStack Networking – the features and plugin model and its role in building an OpenStack Cloud. The training is concluded with a hands-on lab to bring all the concepts together.

OpenStack Networking (Neutron) Introduction [1 hour]
– Goal of Neutron
– Architecture of Neutron
– Plugin Architecture
– Use cases for Neutron
– What’s new in Juno & what’s planned for Kilo

OpenStack Networking (Neutron) Advanced [1 hour]
– Interaction with other OpenStack components (Compute & Storage)
– Designing Neutron for HA
– Installing Neutron
– Troubleshooting Neutron

Hands-on Lab [1 hour]
– Creation of tenant networks
– Configuration of external connectivity
– Advanced Features Configurati

Percona’s director of conferences, Kortney Runyan, offered a sneak peek at the OpenStack sessions last week. Attendees of the Percona Live MySQL Conference and Expo 2015 (April 13-16, 2015) with full-pass access are also welcome to attend OpenStack Live sessions. The two conferences are running in parallel, which is very exciting since there will be crossover opportunities between them.

I hope to see you next April! And be sure to take advantage of Early Bird pricing for OpenStack Live (register here).
And if you are an organizer of an OpenStack (or MySQL) Meetup and need some financial help, Percona is happy to chip in as a sponsor. Just let me know and I’ll work with you to set that up! You can drop me a note in the comments and I’ll contact you offline.

The post OpenStack Live tutorials & sessions to bring OpenStack users up to speed appeared first on MySQL Performance Blog.

MySQL Tutorials: A time to learn at Percona Live 2015

Mo, 2014-12-15 08:00

The many hours of intensive tutorials, led by some of the top minds in MySQL, have always been a major draw each year to the Percona Live MySQL Conference and Expo. And 2015’s event will be no exception.

Percona Live 2015 runs April 13-16 in Santa Clara, Calif. and the first day is dedicated to the classroom – so bring your laptops for the combined 45 hours of learning. MySQL tutorials are included with the full-conference pass but a “tutorial-only pass” is also available. Super-saver registration discounts have been extended until Dec. 19. Here’s a look at this year’s tutorials lineup. (There will be a couple more announced in January).

And that’s just on Monday! There will be much more over the four days of the Percona Live MySQL Conference and Expo 2015. I posted a sneak peek of the full Percona Live (initial) roster a couple weeks ago. And remember, super-saver registration discounts have been extended until Dec. 19 so register now – and don’t forgot your laptop (and power cord)!

The post MySQL Tutorials: A time to learn at Percona Live 2015 appeared first on MySQL Performance Blog.

Percona XtraBackup 2.2.7 is now available

Mi, 2014-12-10 15:03

Percona is glad to announce the release of Percona XtraBackup 2.2.7 on December 10, 2014. Downloads are available from our download site or Percona Software Repositories.

Percona XtraBackup enables MySQL backups without blocking user queries, making it ideal for companies with large data sets and mission-critical applications that cannot tolerate long periods of downtime. Offered free as an open source solution, Percona XtraBackup drives down backup costs while providing unique features for MySQL backups.

Bugs Fixed:

  • Non-default value for innodb_log_block_size variable would cause assertion when preparing the backup. Bug fixed #1391216.
  • When Percona XtraBackup would run FLUSH ENGINE LOGS during the backup process on GTID master, command was recorded to the slave’s binary log as well, which lead to inconsistency between master and slave. Fixed by adding the NO_WRITE_TO_BINLOG clause to FLUSH ENGINE LOGS to avoid interfering with binary log and inconsistency with coordinates. Bug fixed #1394632.
  • Exporting tables was inefficient when backup contained a large (and unrelated) change buffer. Bug fixed #1366065 (Davi Arnaut).
  • innobackupex was printing the GTID even if the GTID mode was disabled which could cause confusion since it wasn’t incrementing. Now it prints only GTID when GITD mode is enabled and when GTID mode is disabled it prints only filename and position. innobackupex still prints GTID, filename and positions if MariaDB server is being backed up. Bug fixed #1391041.

Other bugs fixed: bug #1386157.

Release notes with all the bugfixes for Percona XtraBackup 2.2.7 are available in our online documentation. Bugs can be reported on the launchpad bug tracker. Percona XtraBackup is an open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases.

The post Percona XtraBackup 2.2.7 is now available appeared first on MySQL Performance Blog.

Recover MySQL root password without restarting MySQL (no downtime!)

Mi, 2014-12-10 08:00

Disclaimer: Do this at your own risk! It doesn’t apply if you’re using Pluggable authentication and certainly won’t be usable if/when MySQL system tables are stored on InnoDB

Recover your root password with care!

What is the situation?

The situation is the classic “need to recover MySQL root password” but you cannot restart MySQL (because it is the master production server, or any other reason), which makes the –skip-grant-tables solution as a no-no possibility.

 What can I do?

There is a workaround, which is the following:

  •  Launch another instance of mysqld, a small one (without innodb).
  •  Copy your user.[frm|MYD|MYI] files from the original datadir to the datadir of the new instance.
  • Modify them and then copy them back to the original location.

That simple? No, but close. Here is the step by step:

Step by step recovery
  1. Create a new datadir and run mysql_install_db for the new datadir. This one will be removed at the end. Don’t forget to change ownership to mysql user and group:
    [root@machina dbdata]# mkdir datadir [root@machina dbdata]# chown -R mysql:mysql datadir/ [root@machina dbdata]# mysql_install_db --datadir=/dbdata/datadir/ --user=mysql Installing MySQL system tables...OK Filling help tables...OK
  2. Launch the new instance. Be careful with the datadir path, the socket file and the port number. Also, disable InnoDB, you won’t need it, just add –skip-innodb AND –default-storage-engine=myisam:
    [root@machina datadir]# /usr/sbin/mysqld --basedir=/usr --datadir=/dbdata/datadir --plugin-dir=/usr/lib/mysql/plugin --skip-innodb --default-storage-engine=myisam --socket=/var/run/mysqld/mysql2.sock --port=3307 --user=mysql --log-error=/dblogs/log/error2.log --pid-file=/dbdata/data/mysql.pid &
  3. Copy the user.* files from the original mysql instance (the ones that you need to modify) to the new instance’s datadir and login to this instance of mysql:
    [root@machina ~]# cp /dbdata/data/mysql/user.* /dbdata/datadir/mysql/cp: overwrite `/dbdata/datadir/mysql/user.frm'? y cp: overwrite `/dbdata/datadir/mysql/user.MYD'? y cp: overwrite `/dbdata/datadir/mysql/user.MYI'? y [root@machina datadir]# mysql --socket=/var/run/mysqld/mysql2.sock -p Enter password: Welcome to the MySQL monitor.  Commands end with ; or g.
  4. Execute a “flush tables” command, so the user table will be “reopened” and you can see the data and verify:
    mysql2> flush tables; mysql2> select user, host, password from user where user like 'root'; +------+--------------------------------------+------------------------------------------+ | user | host                                 | password                                 | +------+--------------------------------------+------------------------------------------+ | root | localhost                            | 696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | 696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | 696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+------------------------------------------+ 3 rows in set (0.00 sec)
  5. Now, update the password field with the desired value:
    mysql2> update mysql.user set password='*696D727429CC43695423FA5F2F0155D92A0AAC08' where user like 'root'; Query OK, 3 rows affected (0.00 sec) Rows matched: 3  Changed: 3  Warnings: 0
  6. Verify again:
    mysql2> select user, host, password from user where user like 'root'; +------+--------------------------------------+-------------------------------------------+ | user | host                                 | password                                  | +------+--------------------------------------+-------------------------------------------+ | root | localhost                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | *696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec)
  7. Flush privileges and verify that the new password is correct, by logging in again:
    mysql2> flush privileges; Query OK, 0 rows affected (0.00 sec)
  8. Now that we have made the changes, we can move back the user.* files to the original location, being extremely careful with owner and privileges:
    [root@machina ~]# cd /dbdata/datadir/mysql/ [root@machina mysql]# cp user.* /dbdata/data/mysql/; chown mysql:mysql /dbdata/data/mysql/user.*; chmod 660 /dbdata/data/mysql/user.* cp: overwrite `/dbdata/data/mysql/user.frm'? y cp: overwrite `/dbdata/data/mysql/user.MYD'? y cp: overwrite `/dbdata/data/mysql/user.MYI'? y
  9. At this moment, you can shutdown the new mysql instance since is no longer needed. Be very very careful so you don’t end up shutting down your original mysqld!:
    [root@machina datadir]# mysqladmin --socket=/var/run/mysqld/mysql2.sock -p shutdown Enter password: 141120 06:59:14 mysqld_safe mysqld from pid file /dbdata/data/mysql.pid ended
  10. Now, the last step is to execute a “FLUSH PRIVILEGES” in the original mysqld. Since we cannot yet access it, we need to send a SIGHUP signal to mysqld. MySQL responds to this signal by reloading the grant tables and flushing tables, logs, the thread cache, and the host cache, so choose wisely the moment of the day when you want to send the SIGHUP since the performance might be degraded (look at “flush tables” ).The way to send SIGHUP is to execute “kill” command with the -1 flag:
    [root@machina datadir]# kill -1 $(/sbin/pidof mysqld)
  11. Finally, login into MySQL as root!:
    [root@machina datadir]# mysql -p Enter password: Welcome to the MySQL monitor.  Commands end with ; or g. Your MySQL connection id is 101208 mysql1> select user, host, password from mysql.user where user like 'root'; +------+--------------------------------------+-------------------------------------------+ | user | host                                 | password                                  | +------+--------------------------------------+-------------------------------------------+ | root | localhost                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | 127.0.0.1                            | *696D727429CC43695423FA5F2F0155D92A0AAC08 | | root | %                                    | *696D727429CC43695423FA5F2F0155D92A0AAC08 | +------+--------------------------------------+-------------------------------------------+ 3 rows in set (0.00 sec)
    You can see your schemas? of course you can! your databases are okay!
    mysql1> show databases; +--------------------+ | Database           | +--------------------+ | information_schema | | mysql              | | percona            | | testing            | +--------------------+ 4 rows in set (0.03 sec)

We’ve successfully recovered the MySQL root password without the need to restart MySQL and thus avoid downtime.

I hope you never face this situation, but in case you do, there’s a workaround to recover your access! Is there another way to perform this?

Share it with the world!

The post Recover MySQL root password without restarting MySQL (no downtime!) appeared first on MySQL Performance Blog.

OpenStack Live 2015: Sneak peak of the April conference

Di, 2014-12-09 17:00

On behalf of the OpenStack Live Conference Committee, I am excited to announce the sneak peek schedule for the inaugural OpenStack Live 2015 Conference! This new annual conference, running in parallel with the already established Percona Live MySQL Conference and Expo, will feature one day of tutorials followed by a full day of breakout sessions April 13-14, in Santa Clara, Calif.

Though the entire conference schedule won’t be finalized until early January, this initial list of talks is sure to spark interest! So without further ado, here is he OpenStack Live 2015 SNEAK PEEK SCHEDULE!

Deploying an OpenStack Cloud at Scale at Time Warner Cable
-Matthew Fischer, Principal Software Engineer for OpenStack DevOps at Time Warner Cable, and Clayton O’Neill, Principal Software Engineer for OpenStack Cloud at Time Warner Cable

An Introduction to Database as a Service with an Emphasis on OpenStack Using Trove
-Amrith Kumar, Founder and CTO of Tesora, Inc., and Tushar Katarki, Director of Product Management at Percona

Lightweight OpenStack Benchmarking Service with Rally and Docker
-Swapnil Kulkarni, Senior Software Engineer at Red Hat

MySQL and OpenStack Deep Dive
-Peter Boros, Principal Architect at Percona

This is just a small taste of what will be presented at OpenStack Live 2015 conference this spring. Take advantage of this unique opportunity to hear from leading experts in the field about top cloud strategies, improving overall cloud performance, and operational best practices for managing and optimizing OpenStack and its MySQL database core.

As a special bonus, OpenStack Live attendees attendees will also have access to the Percona Live MySQL Conference & Expo keynotes, receptions, exhibition hall, and Birds of a Feather sessions on April 13 and 14, allowing them to dive deeper into MySQL topics such as high availability, security, performance optimization, and much more.

Registration for OpenStack Live 2015 is now open… register now with Early Bird pricing! Hope to see you there!

The post OpenStack Live 2015: Sneak peak of the April conference appeared first on MySQL Performance Blog.

MySQL 5.6 Transportable Tablespaces best practices

Di, 2014-12-09 08:00

In MySQL 5.6 Oracle introduced a Transportable Tablespace feature (copying tablespaces to another server) and Percona Server adopted it for partial backups which means you can now take individual database or table backups and your destination server can be a vanilla MySQL server. Moreover, since Percona Server 5.6, innodb_import_table_from_xtrabackup is obsolete as Percona Server also implemented Oracle MySQL’s transportable tablespaces feature which as I mentioned gives you the ability to copy tablespace (table.ibd) between servers. Let me demonstrate this through one example where I am going to take partial backup of selective tables instead of an entire MySQL server and restore it on a running MySQL server on destination without taking it offline.

MySQL 5.6 Transportable Tablespaces with Percona XtraBackup
Percona XtraBackup is open source, free MySQL hot backup software that performs non-blocking backups for InnoDB and XtraDB databases. You can backup up your databases without sacrificing read/write ability and, on top of that, Percona XtraBackup supports partial backup schemas that correspond to backup-only specific databases or tables instead of taking backups of the entire database server.

For partial backups, your source server from where you taking the backup must have the innodb_file_per_table option enabled and the importing server should have innodb_file_per_table and innodb_expand_import enabled  – or innodb_import_table_from_xtrabackup (only supported for Percona Server) depends on Percona Server version for the the last option for restoring the database tables. This is all valid till Percona Server version 5.5 and you can find further details about partial backups here. Percona CTO Vadim Tkachenko wrote nice post on it about how to copy InnoDB tables between servers on Percona Server prior to version 5.6.

I am going to use Percona Server 5.6 as it uses the feature of Transportable Tablespace. There are two tables under database irfan named as “test” and “dummy”. I am going to take backup of only test table as partial backup instead taking backup of entire database server.

mysql> show tables; +----------------+ | Tables_in_irfan| +----------------+ | dummy | | test | +----------------+ mysql> show create table test; +-------+--------------------------------------------------------------------------------------------------------------------------+ | Table | Create Table | +-------+--------------------------------------------------------------------------------------------------------------------------+ | test | CREATE TABLE `test` ( `id` int(11) DEFAULT NULL, `name` char(15) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1 | +-------+--------------------------------------------------------------------------------------------------------------------------+ 1 row in set (0.00 sec) mysql> SELECT COUNT(*) FROM test; +----------+ | COUNT(*) | +----------+ | 1000 | +----------+ 1 row in set (0.00 sec)

I am going to use latest version of Percona XtraBackup which supports multiple ways to take partial backups that are –include option, –tables-file option and –databases option. I am going to use –tables-file option to take backup of specific database table.

irfan@source$ xtrabackup --version xtrabackup version 2.2.5 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) irfan@source$ mysql --skip-column-names -e "SELECT CONCAT(table_schema,'.',table_name) FROM information_schema.tables WHERE table_schema IN ('irfan') AND TABLE_NAME = 'test';" > /root/tables_to_backup.txt irfan@source$ cat tables_to_backup.txt irfan.test

Now as below you need to take backup of irfan.test database table. Note, how –tables-file option passed to backup only irfan.test database table which takes backup of only specified database table in tables_to_backup.txt file.

irfan@source$ innobackupex --no-timestamp --tables-file=/root/tables_to_backup.txt /root/partial_backup/ > /root/xtrabackup.log 2>&1 irfan@source$ cat /root/xtrabackup.log . . >> log scanned up to (2453801809) >> log scanned up to (2453801809) [01] ...done [01] Copying ./irfan/test.ibd to /root/partial_backup/irfan/test.ibd [01] ...done xtrabackup: Creating suspend file '/root/partial_backup/xtrabackup_suspended_2' with pid '14442' . . 141101 12:37:27 innobackupex: Starting to backup non-InnoDB tables and files innobackupex: in subdirectories of '/var/lib/mysql' innobackupex: Backing up file '/var/lib/mysql/irfan/test.frm' 141101 12:37:27 innobackupex: Finished backing up non-InnoDB tables and files 141101 12:37:27 innobackupex: Executing LOCK BINLOG FOR BACKUP... 141101 12:37:27 innobackupex: Executing FLUSH ENGINE LOGS... 141101 12:37:27 innobackupex: Waiting for log copying to finish . . xtrabackup: Creating suspend file '/root/partial_backup/xtrabackup_log_copied' with pid '14442' xtrabackup: Transaction log of lsn (2453801809) to (2453801809) was copied. 141101 12:37:28 innobackupex: Executing UNLOCK BINLOG 141101 12:37:28 innobackupex: Executing UNLOCK TABLES 141101 12:37:28 innobackupex: All tables unlocked . . 141101 12:37:28 innobackupex: completed OK!

Successful backup with show you “completed OK” at the end of the backup. If you scripted the backup for automation you can also check backup status to see whether it succeeded or failed by checking exit status of the backup script.

For the next step, we need to prepare the backup because there might be uncomitted transactions that needs to rollback or transactions in the log to be replayed to backup. You need to mention –export option specifically to prepare backup in order to create table.exp and table.cfg files (prior to MySQL/PS 5.6). You can read more on it in documentation. You can prepare the backup as below.

irfan@source$ innobackupex --apply-log --export /root/partial_backup/ > /root/xtrabackup-prepare.log 2>&1 irfan@source$ cat /root/xtrabackup-prepare.log . . xtrabackup version 2.2.5 based on MySQL server 5.6.21 Linux (x86_64) (revision id: ) xtrabackup: auto-enabling --innodb-file-per-table due to the --export option xtrabackup: cd to /root/partial_backup xtrabackup: This target seems to be already prepared. . . xtrabackup: export option is specified. xtrabackup: export metadata of table 'irfan/test' to file `./irfan/test.exp` (1 indexes) xtrabackup: name=GEN_CLUST_INDEX, id.low=5043, page=3 . . xtrabackup: starting shutdown with innodb_fast_shutdown = 0 InnoDB: FTS optimize thread exiting. InnoDB: Starting shutdown... InnoDB: Shutdown completed; log sequence number 2453817454 141102 11:25:13 innobackupex: completed OK!

Again a successfully prepared backup should show you “completed OK” at end. Once the backup is prepared it means it’s usable and ready to restore. For that first create the same table structure on destination as source.

mysql [localhost] {msandbox} (irfan) > CREATE TABLE `test` ( -> `id` int(11) DEFAULT NULL, -> `name` char(15) DEFAULT NULL -> ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.12 sec) mysql [localhost] {msandbox} (irfan) > show tables; +----------------+ | Tables_in_irfan | +----------------+ | test | +----------------+ 1 row in set (0.00 sec) irfan@destination$ ls -la data/irfan/ total 116 drwx------ 2 root root 4096 Nov 2 16:29 . drwx------ 6 root root 4096 Nov 2 16:23 .. -rw-rw---- 1 root root 8586 Nov 2 16:29 test.frm -rw-rw---- 1 root root 98304 Nov 2 16:29 test.ibd

Now discard the existing tablespace and copy the test.ibd and test.cfg which Percona XtraBackup produced after preparing the backup and import tablespace back from source to destination as illustrated below. As a side note, you may need to change the ownership of test.cfg file and test.ibd file to mysql in order to make it accessible from MySQL server.

mysql [localhost] {msandbox} (irfan) > ALTER TABLE test DISCARD TABLESPACE; Query OK, 0 rows affected (0.10 sec) irfan@source$ cd /root/partial_backup/irfan/ irfan@source$ scp test.cfg test.ibd root@destination:/root/sandboxes/msb_PS-5_6_21/data/irfan/ mysql [localhost] {msandbox} (irfan) > ALTER TABLE test IMPORT TABLESPACE; Query OK, 0 rows affected (0.10 sec) irfan@destination$ tail -30 data/msandbox.err 2014-11-02 16:32:53 2037 [Note] InnoDB: Importing tablespace for table 'irfan/test' that was exported from host 'Hostname unknown' 2014-11-02 16:32:53 2037 [Note] InnoDB: Phase I - Update all pages 2014-11-02 16:32:53 2037 [Note] InnoDB: Sync to disk 2014-11-02 16:32:53 2037 [Note] InnoDB: Sync to disk - done! 2014-11-02 16:32:53 2037 [Note] InnoDB: Phase III - Flush changes to disk 2014-11-02 16:32:53 2037 [Note] InnoDB: Phase IV - Flush complete mysql [localhost] {msandbox} (irfan) > SELECt COUNT(*) FROM test; +----------+ | COUNT(*) | +----------+ | 1000 | +----------+

NOTE:  The .cfg file contains the InnoDB dictionary dump in special(binary) format for corresponding table.The .cfg file is not required to import a tablespace to MySQL 5.6 or Percona Server 5.6. A tablespace can be imported successfully even if it is from another server, but innodb will do schema validation if the corresponding .cfg file is present in the same directory.

Now, as you can see from the error log, that table is imported successfully on test database and changes to innodb tablespace completed correctly. My colleague Miguel Angel Nieto wrote a related post on this titled, “How to recover a single InnoDB table from a Full Backup.”

There is another method to copy a table from a running MySQL instance to another running MySQL server which is described in the MySQL manual. For completeness let me describe to you the procedure quickly.

MySQL 5.6 Transportable Tablespaces with FLUSH TABLES FOR EXPORT
On source server, I have table named “dummy” which i will copy to destination server.

mysql [localhost] {msandbox} (irfan) > show tables; +----------------+ | Tables_in_test | +----------------+ | dummy | | test | +----------------+ mysql [localhost] {msandbox} (irfan) > SELECT COUNT(*) FROM dummy; +----------+ | COUNT(*) | +----------+ | 100 | +----------+

First, create the same table structure on destination server.

mysql [localhost] {msandbox} (irfan) > CREATE TABLE `dummy` ( `id` int(11) DEFAULT NULL, `dummy` varchar(10) DEFAULT NULL ) ENGINE=InnoDB DEFAULT CHARSET=latin1; Query OK, 0 rows affected (0.07 sec)

On the destination server, discard the existing tablespace before importing tablespace from source server.

mysql [localhost] {msandbox} (irfan) > ALTER TABLE dummy DISCARD TABLESPACE; Query OK, 0 rows affected (0.00 sec)

Run FLUSH TABLES FOR EXPORT on the source server to ensure all table changes flushed to the disk. It will block write transactions to the named table while only allowing read-only operations. This can be a problem on high-write workload systems as a table can be blocked for a longer period of time if your table is huge in size. While making backups with Percona XtraBackup you can manage this in a non-blocking way and it will also save binary log coordinates that can be useful in replication and disaster recovery scenario.

FLUSH TABLES FOR EXPORT is only applicable to Oracle MySQL 5.6/Percona Server 5.6 FLUSH TABLES FOR EXPORT and will produce table metadata file .cfg and tablespace file .ibd. Make sure you copy both table.cfg and table.ibd files before releasing lock. It’s worth mentioning that you shouldn’t logout from the mysql server before copying table cfg and table.ibd file otherwise it will release the lock. After copying table metadata file (.cfg) and tablespace (.ibd) file release the lock on source server.

# Don't terminate session after acquiring lock otherwise lock will be released. mysql [localhost] {msandbox} (irfan) > FLUSH TABLES dummy FOR EXPORT; Query OK, 0 rows affected (0.00 sec) # open another terminal Need another session to copy table files. irfan@source$ scp dummy.cfg dummy.ibd root@destination-server:/var/lib/mysql/irfan/ # Go back to first session. mysql [localhost] {msandbox} (irfan) > UNLOCK TABLES;

On destination server import the tablespace and verify from mysql error log as below.

mysql [localhost] {msandbox} (test) > ALTER TABLE dummy IMPORT TABLESPACE; Query OK, 0 rows affected (0.04 sec) $ tail -f data/msandbox.err 2014-11-04 13:12:13 2061 [Note] InnoDB: Phase I - Update all pages 2014-11-04 13:12:13 2061 [Note] InnoDB: Sync to disk 2014-11-04 13:12:13 2061 [Note] InnoDB: Sync to disk - done! 2014-11-04 13:12:13 2061 [Note] InnoDB: Phase III - Flush changes to disk 2014-11-04 13:12:13 2061 [Note] InnoDB: Phase IV - Flush complete mysql [localhost] {msandbox} (irfan) > SELECT COUNT(*) FROM dummy; +----------+ | COUNT(*) | +----------+ | 100 | +----------+

Take into account that there are some LIMITATIONS when copying tablespace between servers which are briefly outlined in the MySQL manual

Conclusion:
Transportable Tablespace is nice feature introduced in MySQL 5.6 and I described the use cases for that in this post. Prior to MySQL 5.6, you could backup/restore specific database tables via Percona XtraBackup’s partial backup feature (destination server should be Percona Server for this case).  Comments are welcome

The post MySQL 5.6 Transportable Tablespaces best practices appeared first on MySQL Performance Blog.

What happens when your application cannot open yet another connection to MySQL

Mo, 2014-12-08 15:57

Have you ever experienced a situation where one moment you can connect to the MySQL database and the next moment  you cannot, only to be able to connect again a second later? As you may know one cannot open infinite connections with MySQL. There’s a practical limit and more often than not it is imposed by the underlying operating system. If you’re getting:

ERROR 2003 (HY000): Can't connect to MySQL server on '192.168.0.10' (99)

…there’s a good chance you’re hitting such limit. What might be misleading in the information above is whom (which side) is preventing the connection from being established.

Understanding the problem at hand

Whenever a client uses the network to connect to a service running on a given port of a server this connection is established through the creation of a socket:

A socket is a tuple of 4 terms: the source and destination IPs and ports.

The destination port is obviously the one where the service is running on the server. For instance usually port 22 for SSH and port 3306 for MySQL. The source port is an arbitrary local network port on the client side, which should show in tools like netstat and ss:

Note: I’ve used netstat and ss alternately in this post but if you read the man page for netstat these days you’ll see a note that says: “This program is obsolete. Replacement for netstat is ss”. I was advised to give preference to ss over netstat on a busy server: besides being faster and providing a more rich set of information, ss puts less stress on the server as it talks directly to the kernel while what netstat does is to scan /dev.

$ ss -na dport = :3306 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.0.100:48681 192.168.0.10:3306

The example above shows that server 192.168.0.100 has a TCP connection established with MySQL (3306) running on server 192.168.0.10 through local port 48681. The range of local ports that can be used for TCP and UDP traffic is found in /proc/sys/net/ipv4/ip_local_port_range:

$ sysctl net.ipv4.ip_local_port_range net.ipv4.ip_local_port_range = 32768 61000

Those values from above are the default ones found in many Linux distributions: 32768 denotes the first local port that can be used and 61000 indicates the last one, for a total of 28233 available ports. It may look like a lot but it truly depends on the nature of the local applications connecting to services in other servers through the network.

When it comes to connecting to database servers, many applications chose to open a new connection for a single request only, closing it right after the request is processed. Even though the connection is closed by the client (application) the local port it was using is not immediately released by the OS to be reused by another connection: it will sit in a TIME_WAIT state for (usually) 60 seconds – this value cannot be easily changed as it is hard coded in the kernel:

#define TCP_TIMEWAIT_LEN (60*HZ) /* how long to wait to destroy TIME-WAIT                   * state, about 60 seconds    */

And contrary to what you may have heard or read, tunning /proc/sys/net/ipv4/tcp_fin_timeout is of no use here as it rules a different type of timeout and has no impact in releasing connections hanging in TIME_WAIT state. To better understand the role played by such state I suggest you read Vincent Bernat’s post, from which I reproduce:

There are two purposes for the TIME-WAIT state:

The most known one is to prevent delayed segments from one connection being accepted by a later connection relying on the same quadruplet (source address, source port, destination address, destination port) (…)

The other purpose is to ensure the remote end has closed the connection. When the last ACK is lost, the remote end stays in the LAST-ACK state. Without the TIME-WAIT state, a connection could be reopened while the remote end still thinks the previous connection is valid. (…)

The problem with this situation is that if you keep accumulating connections in TIME_WAIT state you’ll quickly saturate the available local ports. And if all ports are taken then any attempt for a new connection will result in an error similar to the one from above.

Reproducing the problem

It’s easy to verify this scenario, it suffices to decrease the local IP port range from the computer that’ll be starting the connections (usually the application server) to, for example, only half a dozen ports:

$ echo 61001 61006 > /proc/sys/net/ipv4/ip_local_port_range

Then we proceed with opening 6 connections from the application server (192.168.0.100) to the database server (192.168.0.10):

$ ss -na dport = :3306 State Recv-Q Send-Q Local Address:Port Peer Address:Port ESTAB 0 0 192.168.0.100:61005 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61001 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61004 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61006 192.168.0.10:3306 ESTAB 0 0 192.168.0.100:61002 192.168.0.10:3306

Now, when we try to open a seventh connection to the database we’ll hit that error stated in the beginning of this post. What the error actually means is:

$ perror 99 OS error code 99: Cannot assign requested address

In fact, it should be complemented with “… because there’s no available local network ports left“.

And even if we close all 6 MySQL connections right away they’ll all move from ESTABLISHED to TIME_WAIT state and we’ll still need to wait for them to expire until we can open a new connection:

$ ss -na dport = :3306 State Recv-Q Send-Q Local Address:Port Peer Address:Port TIME-WAIT 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61003 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME-WAIT 0 0 192.168.0.100:61002 192.168.0.10:3306

That’s where scalability problems happen if your application server keeps opening more connections than it can have old ones released in time. For example, considering the default port range from 32768 to 61000 and a TIME_WAIT of 60 seconds, in theory we can only open and close around 470 new network connections each second ((61000 – 32768 + 1)/60 = 470.55) before we saturate the available local network ports; that’s not much …

Possible Solutions

This is by no means an exhaustive list but here’s a few possible approaches to consider. If you have something to add about those or happen to known about any other please let me know by leaving a comment below.

Increasing port range

If the server initiating the connections is operating with the default port range you can start by increasing it somewhat. The first 1023 ports are said to be privileged, meaning only root can start an application listening to one of these initial ports. In the other extreme,  the highest port you can have assigned is 65535 (2^16-1). In practice, then, you can increase the port range to the maximum of 1024-65535, which would provide 64512 ports, allowing in theory around 1075 briefly connections being opened and closed per second:

$ sysctl -w net.ipv4.ip_local_port_range="1024 65535"

To make this change permanent and survive a server reboot you need to add the following line to /etc/sysctl.conf:

net.ipv4.ip_local_port_range=1024 65535

You should however pay attention when stretching these values to the limits.

Adding extra IP addresses and listening to multiple ports

Something that wasn’t clear to me at first is that the port range limitation is applied per quadruplet (<source address>:<source port>, <destination address>:<destination port>). As such, if you have port range set from 60001 to 60006 in the client you should be able to open no more than 6 MySQL connections from the same <address>:<port> pair to the same <address>:<port> pair, as well as 6 SSH connections,  6 NC connections, etc:

tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61005 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61003 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61005 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61006 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61001 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61003 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61006 192.168.0.10:9999 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61001 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.11:3306 ESTABLISHED <-- here! tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61002 192.168.0.10:22 ESTABLISHED tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED

Note in the list above there’s more than one connection established from the same source <address>:<port> pair, though binded to different destination <address>:<port> pairs. There’s even a 7th connection to MySQL, though to one running on a different server (192.168.0.11:3306).

So, another way of increasing the amount of concurrent connections is by adding an additional IP address to either the outgoing side (client) or to the incoming side (server) and making part of the connections happen through it. Likewise, even though you cannot make MySQL listen to multiple ports simultaneously you can configure your firewall to accept connections to other ports and redirect them to port 3306 (or any other you’re using). In the example below I configure iptables to redirect all connections destined to port 80 to port 3306:

$ iptables -A PREROUTING -t nat -p tcp --dport 80 -j REDIRECT --to-port 3306

Modifying the connection behavior of the application(s)

Expanding the number of possible quadruplets will certainly help in increasing the amount of possible concurrent connections but it won’t scale indefinitely. In the long term you may need to review your application configuration behavior and setup/re-engineer it in such a way to avoid it opening and closing many connections over the network too often – if that’s the problem after all. You may resort to some sort of connection pooling instead, but be sure to evaluate it well first.

Tweaking TCP parameter settings

Even though you cannot easily decrease the timeout for TIME_WAIT state there’s at least 3 different TCP parameters you can use to “bypass” this limitation. You should explore this options with caution though as these settings could affect the reliability of TCP connections.

tcp_tw_reuse

tcp_tw_reuse (Boolean; default: disabled; since Linux 2.4.19/2.6) Allow to reuse TIME_WAIT sockets for new connections when it is safe from protocol viewpoint. It should not be changed without advice/request of technical experts.

It is possible to force the kernel to reuse a connection hanging in TIME_WAIT state by setting /proc/sys/net/ipv4/tcp_tw_reuse to 1. What happens in practice is that you’ll keep seeing the closed connections hanging in TIME_WAIT until either they expire or a new connection is requested. In the later case, the connection will be “relived”.

Here’s an example from the previous scenario where I had limited the port range to only 6 ports. I enabled tcp_tw_reuse and opened 6 connections with MySQL, closing five of them soon afterwards:

$ netstat -tn|grep 3306 tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 TIME_WAIT

The TIME_WAIT was still in countdown mode (both ss and netstat have option -o/–timers to display those) when I opened a new connection:

$ netstat -ton|grep 3306 tcp 0 0 192.168.0.100:61006 192.168.0.10:3306 TIME_WAIT timewait (35.07/0/0) tcp 0 0 192.168.0.100:61005 192.168.0.10:3306 TIME_WAIT timewait (34.52/0/0) tcp 0 0 192.168.0.100:61002 192.168.0.10:3306 ESTABLISHED keepalive (3586.59/0/0) tcp 0 0 192.168.0.100:61004 192.168.0.10:3306 TIME_WAIT timewait (33.91/0/0) tcp 0 0 192.168.0.100:61001 192.168.0.10:3306 TIME_WAIT timewait (35.65/0/0) tcp 0 0 192.168.0.100:61003 192.168.0.10:3306 ESTABLISHED keepalive (7196.66/0/0)

Note the new connection was established in lieu of the one hanging in TIME_WAIT state for longer (using local port 61003).

tcp_tw_recycle

tcp_tw_recycle (Boolean; default: disabled; since Linux 2.4) Enable fast recycling of TIME_WAIT sockets. Enabling this option is not recommended since this causes problems when working with NAT (Network Address Translation).

When you enable /proc/sys/net/ipv4/tcp_tw_recycle closed connections will not show under TIME_WAIT anymore – they disappear from netstat altogether. But as soon as you open a new connection (within the 60 seconds mark) it will recycle one of those. But everyone writing about this alternative seems to advise against it’s use. Bottom line is: it’s preferable to reuse a connection than to recycle it.

tcp_max_tw_buckets

tcp_max_tw_buckets (integer; default: see below; since Linux 2.4) The maximum number of sockets in TIME_WAIT state allowed in the system. This limit exists only to prevent simple denial- of-service attacks. The default value of NR_FILE*2 is adjusted depending on the memory in the system. If this number is exceeded, the socket is closed and a warning is printed.

This parameter rules how many connections can remain in TIME_WAIT state concurrently: the kernel will
simply kill connections hanging in such state above that number. For example, continuing with the previous scenario where I had configured the client server with a port range composed of only 6 ports, if I set /proc/sys/net/ipv4/tcp_max_tw_buckets to 5, then open 6 concurrent connections with MySQL and then immediatelly close all 6 I’ll find only 5 of them hanging in the TIME_WAIT state – as with tcp_tw_recycle, one of then will simply disapear from netstat. This situation allows me to immediately open a new connection without needing to wait for a minute. However, I won’t be able to open a second one until one  of the other 5 connections in TIME_WAIT expire and freed the local port it was using. The secret here, then, is to find a compromise between the number of available network ports and the number of connections we allow to remain in TIME_WAIT state. The default value of this setting is 65536, which means by default the system allows all possible connections to go over the TIME_WAIT state when closed.

If “LAMP” server, use local socket

Finally, it’s important to mention that if your application is hosted in the same server as the database you may (should ?) open connections using MySQL’s socket file directly, without going over the network, and thus avoid this port range/TIME_WAIT problematic altogether. Interestingly, you can accomplish this in one of two ways: specifying the socket with the –socket option or using –host=localhost. As MySQL’s manual mentions:

On Unix, MySQL programs treat the host name localhost specially, in a way that is likely different from what you expect compared to other network-based programs. For connections to localhost, MySQL programs attempt to connect to the local server by using a Unix socket file.

Clarification

I prefer to overstate this problem in favor of avoiding any confusion: the error described in this post and all the situations surrounding it is to be found in the computer that starts the connections (usually the application server), and not in the server holding the services. Whenever a client closes the connection, while it shows hanging in TIME_WAIT state there won’t be any remaining trace of the connection showing in netstat in the server side. However, if for some reason it is the server that “closes the connection first, it gets the TIME-WAIT state while the client will consider the corresponding quadruplet free and hence may reuse it for a new connection.

The post What happens when your application cannot open yet another connection to MySQL appeared first on MySQL Performance Blog.

Streamlined Percona XtraDB Cluster (or anything) testing with Consul and Vagrant

Fr, 2014-12-05 12:00
Introducing Consul

I’m always interested in what Mitchell Hashimoto and Hashicorp are up to, I typically find their projects valuable.  If you’ve heard of Vagrant, you know their work.

I recently became interested in a newer project they have called ‘Consul‘.  Consul is a bit hard to describe.  It is (in part):

  • Highly consistent metadata store (a bit like Zookeeeper)
  • A monitoring system (lightweight Nagios)
  • A service discovery system, both DNS and HTTP-based. (think of something like haproxy, but instead of tcp load balancing, it provides dns lookups with healthy services)
What this has to do with Percona XtraDB Cluster

I’ve had some more complex testing for Percona XtraDB Cluster (PXC) to do on my plate for quite a while, and I started to explore Consul as a tool to help with this.  I already have Vagrant setups for PXC, but ensuring all the nodes are healthy, kicking off tests, gathering results, etc. were still difficult.

So, my loose goals for Consul are:

  • A single dashboard to ensure my testing environment is healthy
  • Ability to adapt to any size environment — 3 node clusters up to 20+
  • Coordinate starting and stopping load tests running on any number of test clients
  • Have the ability to collect distributed test results

I’ve succeeded on some of these fronts with a Vagrant environment I’ve been working on. This spins up:

  • A Consul cluster (default is a single node)
  • Test server(s)
  • A PXC cluster

Additionally, it integrates the Test servers and PXC nodes with Consul such that:

  • The servers setup a Consul agent in client mode to the  Consul cluster
  • Additionally, they setup a local DNS forwarder that sends all DNS requests to the ‘.consul’ domain to the local agent to be serviced by the Consul cluster.
  • The servers register services with Consul that run local health checks
  • The test server(s) setup a ‘watch’ in consul to wait for starting sysbench on a consul ‘event’.
Seeing it in action

Once I run my ‘vagrant up’, I get a consul UI I can connect to on my localhost at port 8501:

Consul’s Node Overview

 

I can see all 5 of my nodes.  I can check the services and see that test1 is failing one health check because sysbench isn’t running yet:

Consul reporting sysbench is not running.

This is expected, because I haven’t started testing yet.  I can see that my PXC cluster is healthy:

 

Health checks are using clustercheck from the PXC package

 

Involving Percona Cloud Tools in the system

So far, so good.  This Vagrant configuration (if I provide a PERCONA_AGENT_API_KEY in my environment) also registers my test servers with Percona Cloud Tools, so I can see data being reported there for my nodes:

Percona Cloud Tool’s Dashboard for a single node

So now I am ready to begin my test.  To do so, I simply need to issue a consul event from any of the nodes:

jayj@~/Src/pxc_consul [507]$ vagrant ssh consul1 Last login: Wed Nov 26 14:32:38 2014 from 10.0.2.2 [root@consul1 ~]# consul event -name='sysbench_update_index' Event ID: 7c8aab42-fd2e-de6c-cb0c-1de31c02ce95

My pre-configured watchers on my test node knows what to do with that event and launches sysbench.  Consul shows that sysbench is indeed running:

 

And I can indeed see traffic start to come in on Percona Cloud Tools:

I have testing traffic limited for my example, but that’s easily tunable via the Vagrantfile.  To show something a little more impressive, here’s a 5 node cluster running hitting around 2500 tps total throughput:

So to summarize thus far:
  • I can spin up any size cluster I want and verify it is healthy with Consul’s UI
  • I can spin up any number of test servers and kick off sysbench on all of them simultaneously
Another big trick of Consul’s

That so far so good, but let me point out a few things that may not be obvious.  If you check the Vagrantfile, I use a consul hostname in a few places.  First, on the test servers:

# sysbench setup 'tables' => 1, 'rows' => 1000000, 'threads' => 4 * pxc_nodes, 'tx_rate' => 10, 'mysql_host' => 'pxc.service.consul'

then again on the PXC server configuration:

# PXC setup "percona_server_version" => pxc_version, 'innodb_buffer_pool_size' => '1G', 'innodb_log_file_size' => '1G', 'innodb_flush_log_at_trx_commit' => '0', 'pxc_bootstrap_node' => (i == 1 ? true : false ), 'wsrep_cluster_address' => 'gcomm://pxc.service.consul', 'wsrep_provider_options' => 'gcache.size=2G; gcs.fc_limit=1024',

Notice ‘pxc.service.consul’.  This hostname is provided by Consul and resolves to all the IPs of the current servers both having and passing the ‘pxc’ service health check:

[root@test1 ~]# host pxc.service.consul pxc.service.consul has address 172.28.128.7 pxc.service.consul has address 172.28.128.6 pxc.service.consul has address 172.28.128.5

So I am using this to my advantage in two ways:

  1. My PXC cluster bootstraps the first node automatically, but all the other nodes use this hostname for their wsrep_cluster_address.  This means: no specific hostnames or ips in the my.cnf file, and this hostname will always be up to date with what nodes are active in the cluster; which is the precise list that should be in the wsrep_cluster_address at any given moment.
  2. My test servers connect to this hostname, therefore they always know where to connect and they will round-robin (if I have enough sysbench threads and PXC nodes) to different nodes based on the response of the dns lookup, which returns 3 of the active nodes in a different order each time.
(Some of) The Issues

This is still a work in progress and there are many improvements that could be made:

  • I’m relying on PCT to collect my data, but it’d be nice to utilize Consul’s central key/value store to store results of the independent sysbench runs.
  • Consul’s leader election could be used to help the cluster determine which node should bootstrap on first startup. I am assuming node1 should bootstrap.
  • A variety of bugs in various software still makes this a bit clunky sometimes to manage.  Here is a sample:
    • Consul events sometimes don’t fire in the current release (though it looks to be fixed soon)
    • PXC joining nodes sometimes get stuck putting speed bumps into the automated deploy.
    • Automated installs of percona-agent (which sends data to Percona Cloud Tools) is straight-forward, except when different cluster nodes clobber each other’s credentials.

So, in summary, I am happy with how easily Consul integrates and I’m already finding it useful for a product in its 0.4.1 release.

The post Streamlined Percona XtraDB Cluster (or anything) testing with Consul and Vagrant appeared first on MySQL Performance Blog.

MySQL and OpenStack deep dive: Dec. 10 webinar

Do, 2014-12-04 21:59

Fact: MySQL is the most commonly used database in OpenStack deployments. Of course that includes a number of MySQL variants – standard MySQL by Oracle, MariaDB, Percona Server, MySQL Galera, Percona XtraDB Cluster, etc.

However, there are many misconceptions and myths around the pros and cons of these MySQL flavors. Join me and my friend Jay Pipes of Mirantis next Wednesday (Dec. 10) at 10 a.m. Pacific and we’ll dispel some of these myths and provide a clearer picture of the strengths and weaknesses of each of these flavors.

This free Percona webinar, titled “MySQL and OpenStack Deep Dive,” will also illuminate the pitfalls to avoid when migrating between MySQL flavors – and what architectural information to take into account when planning your OpenStack MySQL database deployments.

We’ll also discuss replication topologies and techniques, and explain how the Galera Cluster variants differ from standard MySQL replication.

Finally, in the latter part of the session, we’ll take a deep dive into MySQL database performance analysis, diving into the results of a Rally run showing a typical Nova workload. In addition, we’ll use Percona Toolkit’s famed pt-query-digest tool to determine if a synchronously replication database cluster like the free Percona XtraDB Cluster is a good fit for certain OpenStack projects.

The webinar is free but I encourage you to register now to reserve your spot. See you Dec. 10! In the meantime, learn more about the new annual OpenStack Live Conference and Expo which debuts April 13-14 in the heart of Silicon Valley. If you register now you’ll save with Early Bird pricing. However, one lucky webinar attendee will win a full pass! So be sure to register for next week’s webinar now for your chance to win! (Click here!) The winner will be announced at the end of the webinar.

The post MySQL and OpenStack deep dive: Dec. 10 webinar appeared first on MySQL Performance Blog.

Sneak peek at the Percona Live MySQL Conference & Expo 2015

Do, 2014-12-04 13:55

You know you’ll be there so why not save some $$ by registering now for the Percona Live MySQL Conference & Expo 2015 (April 13-16 in Santa Clara, Calif.). Super Saver registration discounts are available through Dec. 14 at 11:30 p.m. PST. (That’s just 10 days away!)

What to expect this year? The Percona Live 2015 conference committee is putting together another fantastic event for the global MySQL community’s next meeting in April. The full conference agenda will be announced in January, but the initial roster includes:

  • Sunny Bains, Senior Engineering Manager at Oracle; “InnoDB 5.7- What’s New”
  • Yoshinori Matsunobu, Database Engineer at Facebook; “Fast Master Failover Without Data Loss”
  • Jeremy Cole, Senior Systems Engineer at Google, Inc.; “Exploring Your Data With InnoDB Explorer”
  • Tom Krouper, Staff Database Administrator at Twitter; “Upgrading to MySQL 5.6 @ Scale”
  • Jenni Snyder, Database Administrator at Yelp; “Schema changes multiple times a day? OK!”
  • Ike Walker, Database Architect at Flite; “Assembling the Perfect MySQL Toolbox”
  • Jean-François Gagné, Senior System Engineer/Architect at Booking.com; “Binlog Servers at Booking.com”
  • Jeremy Glick, Lead DBA at MyDBAteam, and Andrew Moore, MySQL DBA at Percona; “Using MySQL Audit Plugins and Elasticsearch ELK”
  • Tomáš Komenda, Team Lead and Database Specialist, and Lukáš Putna, Senior Developer and Database Specialist at Seznam.cz; “MySQL and HBase Ecosystem for Real-time Big Data Overviews”
  • Alexander Rubin, Principal Consultant at Percona; “Advanced MySQL Query Tuning”

And while the call for papers deadline has expired, there are still sponsorship opportunities available for the world’s largest annual MySQL event. Sponsors become a part of a dynamic and growing ecosystem and interact with more than 1,000 DBAs, sysadmins, developers, CTOs, CEOs, business managers, technology evangelists, solutions vendors, and entrepreneurs who attend the event.

Current sponsors include:

  • Diamond Plus: VMware
  • Gold: Codership, Pythian
  • Silver: Box, SpringbokSQL, Yelp
  • Exhibit Only: FoundationDB, Severalnines, Tokutek, VividCortex
  • Other Sponsors: MailChimp
  • Media Sponsor: Database Trends & Applications , Datanami, InfoQ , Linux Journal, O’Reilly Media

Percona Live 2015 will feature a variety of formal tracks and sessions related to High Availability, DevOps, Programming, Performance Optimization, Replication and Backup, MySQL in the Cloud, MySQL and NoSQL, MySQL Case Studies, Security, and What’s New in MySQL.

As usual the conference will be held in the heart of Silicon Valley at the Hyatt Regency Santa Clara and Santa Clara Convention Center. But this year Percona has also unveiled OpenStack Live 2015, a new conference that will run in parallel with Percona Live MySQL Conference & Expo 2015 on April 13 and 14.

And don’t forget, Super Saver registration discounts are available through Dec. 14 at 11:30 p.m. PST. I hope to see you in Santa Clara!

The post Sneak peek at the Percona Live MySQL Conference & Expo 2015 appeared first on MySQL Performance Blog.

Auto-bootstrapping an all-down cluster with Percona XtraDB Cluster

Mi, 2014-12-03 12:01

One new feature in Percona XtraDB Cluster (PXC) in recent releases was the inclusion of the ability for an existing cluster to auto-bootstrap after an all-node-down event.  Suppose you lose power on all nodes simultaneously or something else similar happens to your cluster. Traditionally, this meant manually re-bootstrapping the cluster, but not any more.

How it works

Given the above all-down situation, if all nodes are able to restart and see each other such that they all agree what the state was and that all nodes have returned, then the nodes will make a decision that it is safe for them to recover PRIMARY state as a whole.

This requires:

  • All nodes went down hard — that is; a kill -9, kernel panic, server power failure, or similar event
  • All nodes from the last PRIMARY component are restarted and are able to see each other again.
Demonstration

Suppose I have a 3 node cluster in a stable state. I then kill all nodes simultaneously (simulating a power failure or similar event):

[root@node1 ~]# killall -9 mysqld [root@node2 ~]# killall -9 mysqld [root@node3 ~]# killall -9 mysqld

I can see that each node maintained a state file in its datadir called ‘gvwstate.dat’. This contains the last known view of the cluster:

[root@node1 ~]# cat /var/lib/mysql/gvwstate.dat my_uuid: 78caedfe-75a5-11e4-ac69-fb694ee06530 #vwbeg view_id: 3 78caedfe-75a5-11e4-ac69-fb694ee06530 9 bootstrap: 0 member: 78caedfe-75a5-11e4-ac69-fb694ee06530 0 member: 87da2387-75a5-11e4-900f-ba49ecdce584 0 member: 8a25acd8-75a5-11e4-9e22-97412a1263ac 0 #vwend

This file will not exist on a node if it was shutdown cleanly, only if the mysqld was uncleanly terminated. This file should exist and be the same on all the nodes for the auto-recovery to work.

I can now restart all 3 nodes more or less at the same time. Note that none of these nodes are bootstrapping and all of the nodes have the wsrep_cluster_address set to a proper list of the nodes in the cluster:

[root@node1 ~]# service mysql start [root@node2 ~]# service mysql start [root@node3 ~]# service mysql start

I can indeed see that they all start successfully and enter the primary state:

[root@node1 ~]# mysql -e "show global status like 'wsrep_cluster%'" +--------------------------+--------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------+ | wsrep_cluster_conf_id | 0 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 | | wsrep_cluster_status | Primary | +--------------------------+--------------------------------------+ [root@node2 ~]# mysql -e "show global status like 'wsrep_cluster%'" +--------------------------+--------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------+ | wsrep_cluster_conf_id | 0 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 | | wsrep_cluster_status | Primary | +--------------------------+--------------------------------------+ [root@node3 ~]# mysql -e "show global status like 'wsrep_cluster%'" +--------------------------+--------------------------------------+ | Variable_name | Value | +--------------------------+--------------------------------------+ | wsrep_cluster_conf_id | 0 | | wsrep_cluster_size | 3 | | wsrep_cluster_state_uuid | 1ba6f69a-759b-11e4-89ba-62a713a26cd1 | | wsrep_cluster_status | Primary | +--------------------------+--------------------------------------+

Checking the logs, I can see this indication that the feature is working:

2014-11-26 19:59:36 1809 [Note] WSREP: promote to primary component 2014-11-26 19:59:36 1809 [Note] WSREP: view(view_id(PRIM,78caedfe,13) memb { 78caedfe,0 87da2387,0 8a25acd8,0 } joined { } left { } partitioned { }) 2014-11-26 19:59:36 1809 [Note] WSREP: save pc into disk 2014-11-26 19:59:36 1809 [Note] WSREP: clear restored view

Changing this behavior

This feature is enabled by default, but you can toggle it off with the pc.recovery setting in the wsrep_provider_options

This feature helps cover an edge case where manual bootstrapping was necessary in the past to recovery properly. This feature was added in Percona XtraDB Cluster version 5.6.19, but was broken due to this bug.  It was fixed in PXC 5.6.21

The post Auto-bootstrapping an all-down cluster with Percona XtraDB Cluster appeared first on MySQL Performance Blog.

Tips from the trenches for over-extended MySQL DBAs

Di, 2014-12-02 08:00

This post is a follow-up to my November 19 webinar, “Tips from the Trenches: A Guide to Preventing Downtime for the Over-Extended DBA,” during which I described some of the most common reasons DBAs experience avoidable downtime. The session was aimed at the “over-stretched DBA,” identified as the MySQL DBA short of time or an engineer of another discipline without the depth of the MySQL system. The over-stretched DBA may be prone to making fundamental mistakes that cause downtime through poor response time, operations that cause blocking on important data or administrative mishaps through the lack of best practice monitoring and alerting. (You can download my slides and view the recorded webinar here.)

Monitor the things
One of the aides to keeping the system up and running is ensuring that your finger is on the pulse of the environment. Here on the Percona Managed Services team, we leverage Percona Monitoring Plugins (open source plugins for Nagios, Cacti and Zabbix) to ensure we have visibility of our client’s operations. Having a handle on basics such as disk space, memory usage and MySQL operational metrics ensures that we avoid trivial downtime that would affect the client’s uptime or worse, their bottom line.

Road Blocks
One of the most common reasons that an application is unable to serve data to its end user is that access to a table is being blocked due to another ongoing operation. This can be blamed on a variety of sources: backups, schema changes, poor configuration and long running transactions can all lend themselves to costly blocking. Understanding the impact of actions on a MySQL server can be the difference between a happy end user and a frustrated one.

During the webinar I made reference to some resources and techniques that can assist the over extended DBA avoid downtime and here are some highlights….

Monitoring and Alerting
It’s important that you have some indications that something is reaching its capacity. It might be the disk, connections to MySQL or auto_increment limit on a highly used table. There is quite the landscape to cover but here are a handful of helpful tools:
* Percona Monitoring Plugins
* Monyog
* New Relic

Query Tuning
Poorly performing SQL can be indicative that the configuration is incorrect, that there’s a missing index or that your development team needs a quick lesson on MySQL anti-patterns. Arm yourself with proof that the SQL statements are substandard using these resources and work with the source to make things more efficient:
* Percona Cloud Tools
* pt-query-digest, explain, indexes

High Availability
If you need to ensure that your application survives hiccups such as hardware failure or network impairment, a well deployed HA solution will give you the peace of mind that you can quickly mitigate bumps in the road.
* MHA
Percona XtraDB Cluster, Galera
* Percona Replication Manager
* LinuxHA/Corosync/DRBD

Backups
A wise man once quoted “A backup today saves you tomorrow.” Covering all bases can be the difference between recovering from a catastrophic failure and job hunting. Mixing logical, physical and incremental backups while adding in some offsite copies can provide you with the safety net in the event that a small mistake like a dropped table is met or worse, all working copies of data and backups are lost in a SAN failure. It happens so be prepared.
* Percona XtraBackup
* mydumper
* mysqldump
* mysqlbinlog (5.6)
* mylvmbackup

We had some great questions from the attendees and regrettably were unable to answer them all, so here are some of them with my response.

Q: I use MySQL on Amazon RDS. Isn’t much of the operations automated or do these tips still apply?
A: It’s not completely automated. There are still challenges to address and configuration opportunities, but understanding the limitations of RDS is key. For example, the location and size of the tmpdir is something you are unable to customise on RDS. You would typically review this config in a production environment if your workload required it. Any costly queries that perform operations requiring tmp area to sort (think OLAP) might not be a good fit on RDS due to this limitation. Getting to know the limitations around hosted or DBaaS services is time well spent to avoid explaining what keeps taking the application down in peak hours.

Q: What other parts of Percona Toolkit do you recommend for MySQL operations?
A: Percona Toolkit is a well-evolved suite of tools that all MySQL DBAs should familiarize themselves with. In particular I will fit many tools into my weekly workflow:

Operations

  • pt-online-schema-change
  • pt-table-checksum
  • pt-table-sync

Troubleshooting

  • pt-stalk
  • pt-pmp
  • pt-config-diff

Knowledge Gathering

  • pt-summary
  • pt-mysql-summary
  • pt-duplicate -key-checker

The key with Percona Toolkit is that many common tasks or problems that could cause you to reinvent the wheel are covered, mature and production ready. As with any tool, you should always read the label or in this case the documentation so you’re well aware what the tools can do, the risks and the features that you can make use of.

Q: HA – are there any solutions that you would stay away from?
A: Using any particular HA solution is going to be another R&D exercise. You will need to understand the tradeoffs, configuration options and compare between products. Some might have a higher TCO or lack functionality. Once the chosen solution is implemented it’s pertinent that the engineers understand the technology to be able to troubleshoot or utilize the functionality in the situation where failover needs to be instigated. I like HA solutions to be fast to failover to and some entail starting MySQL from cold.

Q: You mentioned having tested backups. How do you perform this?
A: Percona’s method is using a dedicated host with access to the backup files. Then with a combination of mysqlsandbox and pt-table-checksum we can discover if we trust the files we capture for disaster recovery. Many people underestimate the importance of this task.

Q: Percona Cloud Tools – how much does it cost?
A: Right now it’s a free service. Visit cloud.percona.com for more information, but in a nutshell Percona Cloud Tools is a hosted service providing access to query performance insights for all MySQL uses.

Q: Is there API access to Percona Cloud Tools for application integration?
A: There is currently not a public API available. It is on the roadmap, though. We’d be interested to hear more about your use case so please sign up for the service and try it out. After signing in, all pages include a Feedback link to share your thoughts and ideas such as how you’d like to use a public API.

Q: Can you use MHA with Percona XtraDB Cluster?
A: MHA is not something that can be used with Percona XtraDB Cluster (PXC). It’s common to partner PXC with HAProxy for making sure your writes are going to the appropriate node.

Q: Can MHA make automatic failover? If MHA has automatic failover, what do you recommend? Configure it for automatic failover?
A: MHA can make an automatic failover. Personally I prefer managed failover. When working with automated failover it’s important that failback is manual to avoid “flapping.” “Splitbrain” is an ailment that you don’t want to suffer from as well and auto failover removes the human judgment from the decision to relocate operations from a failed node onto a standby node. If you are going to vote for an automatic failover it is advised to test all potential failure scenarios and to employ a STONITH method to really ensure that the unresponsive node is not serving read/write traffic.

Q: What is the best way to detect database blocking from DML statements? Is there a tool that will show blocking after the fact so you don’t have to catch it real-time?
A: Once again, Percona has a tool called pt-deadlock-logger that can detect and log deadlocks. Detecting locking can be achieved using “SHOW ENGINE INNODB STATUS” or utilizing the information_schema.innodb_locks table. Some engineering might be required for this to be logged but those resources exist for use.

Q: Since you mentioned tinkering with ELK I was wondering if you had any tips on good Kibana dashboards to build to monitor MySQL databases/clusters?
A: ELK is something that I’m looking to publish some information on soon so watch this space!

Thanks again everyone for the great questions! And as a reminder, you can download my slides and view the recorded webinar here.

The post Tips from the trenches for over-extended MySQL DBAs appeared first on MySQL Performance Blog.

Faster restarts for MySQL and Percona Server 5.6.21+

Mo, 2014-12-01 12:34

By default in MySQL 5.6, each time MySQL is started (regular start or crash recovery), it iterates through all the binlog files when GTIDs are not enabled. This can take a very long time if you have a large number of binary log files. MySQL and Percona Server 5.6.21+ have a fix with the simplified-binlog-gtid-recovery option. Let’s explore this issue.

Understanding the issue

It was first reported by Yoshinori @ Facebook (bug #69097).

Let’s start by looking at a MySQL 5.6 instance where binary logging is enabled but GTIDs are disabled.

If we restart MySQL with strace, we’ll see:

# strace -e open service mysql start [...] open("/var/lib/mysql5621/mysql-bin.000004", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000003", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000002", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000001", O_RDONLY) = 13 [...]

MySQL opens all binary log files in reverse order. This can bite if you have lots of binlog files or if the binlog files are large.

This does not happen with MySQL 5.5, so why such a change? The reason is … GTIDs. If you look at the bug report, MySQL tries to initialize a few GTID-related settings even if gtid_mode = OFF

The same kind of issue happens when you have binlog files with GTIDs and binlog files without GTIDs:

# strace -e open service mysql start [...] open("/var/lib/mysql5621/mysql-bin.000010", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000001", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000002", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000003", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000004", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000005", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000006", O_RDONLY) = 13 -> First binlog file with GTIDs [...]

Actually you can see that MySQL will do 2 scans: a reverse scan and a forward scan. Here not all binlogs need to be opened: MySQL will stop scanning files as soon as it finds GTID information. But that can again bite if you have just turned on GTIDs (and therefore most binlog files do not contain any GTID information).

Now what happens if you set gtid_mode = ON from the start or if all the binlog files without any GTID information have been removed?

# strace -e open service mysql start [..] open("/var/lib/mysql5621/mysql-bin.000011", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000006", O_RDONLY) = 13 [...]

Only the newest and the oldest binlog files are opened, which is expected.

The fix

The fix is easy: simply add simplified-binlog-gtid-recovery = 1 in your configuration file. When this is set, MySQL will open at most 2 binlog files: the newest one and the oldest one. See the documentation.

Let’s see what happens with our server containing some binlog files without GTID information:

# strace -e open service mysql start [...] open("/var/lib/mysql5621/mysql-bin.000012", O_RDONLY) = 13 open("/var/lib/mysql5621/mysql-bin.000001", O_RDONLY) = 13 [...]

What is the performance overhead of the binlog scans? Of course YMMV, but I did a quick test on my laptop by creating 80x 100MB binlog files: by default, startup takes 6s while with simplified-binlog-gtid-recovery=1 it only takes 2s. 3x improvement with a single setting, not bad!

Conclusion

It is good to see that regressions introduced in MySQL 5.6 are being fixed over time. This one is pretty nasty as most people not using GTIDs will never think that something related to GTIDs can actually create performance issues. Is there any drawback if you enable this setting? I can’t think of any, so I’m hoping it will be enabled by default in 5.7.

The post Faster restarts for MySQL and Percona Server 5.6.21+ appeared first on MySQL Performance Blog.

MySQL benchmarking: Know your baseline variance!

Mi, 2014-11-26 11:00

Often enough I find MySQL benchmark results where the difference between results is 1% or even less and some conclusions are drawn. Now it is not that 1% is not important – especially when you’re developing the product you should care about those 1% improvements or regressions because they tend to add up. However with such a small difference it is very important to understand whenever this is for real or it is just the natural variance for your baseline test.

Take a look at this graph:
Click the image for a larger view)

 

This is the result for a simple in-memory, read-only “select by primary key” SysBench benchmark on dedicated physical hardware that is otherwise idle, simple 1 socket system. I tried to stabilize it as much as possible, for example disabling CPU frequency scaling. But still I saw some 3% difference between “good runs” and bad runs.

What is the difference between those runs? Simply mysqld restarts.

Does this mean you can’t measure a smaller difference? You can by setting the appropriate test plan. Often having several runs makes sense, in others you need to make sure the system warms up before taking measurements or having benchmark runs that are long enough. Whatever method you use it is a good idea to apply your test methodology by conducting several runs of your baseline run to ensure the results are stable enough for your purpose. For example If I decided to do five 30-minute runs and average the results, if they all run within 0.1% I will consider 0.3% differences as meaningful.

Another practical trick that often helps me to separate real differences from some side effects is mixing the tests. Say if I have configurations I’m testing A and B instead of doing AAA BBB I would do ABABAB which helps with the case when there is some regression that can accumulate over time, such as with Flash.

You should also note that in modern systems there is almost always something happening in the background that can change performance – the SSD is doing garbage collection, MySQL (or Kernel) is flushing dirty pages, the CPU can even simply cool off and as a result being able to support Turbo-boost operations a little longer. So when you are stat running your benchmarks make sure you keep the idle time between runs the same – scripting benchmarks and iterating scenarios helps here.

Happy MySQL benchmarking!

The post MySQL benchmarking: Know your baseline variance! appeared first on MySQL Performance Blog.

Percona XtraDB Cluster 5.6.21-25.8 is now available

Di, 2014-11-25 15:45

Percona is glad to announce the new release of Percona XtraDB Cluster 5.6 on Novmeber 25th 2014. Binaries are available from downloads area or from our software repositories. We’re also happy to announce that Ubuntu 14.10 and CentOS 7 users can download, install, and upgrade Percona XtraDB Cluster 5.6 from Percona’s software repositories.

Based on Percona Server 5.6.21-70.1 including all the bug fixes in it, Galera Replicator 3.8, and on Codership wsrep API 25.7, Percona XtraDB Cluster 5.6.21-25.8 is now the current General Availability release. All of Percona‘s software is open-source and free, and all the details of the release can be found in the 5.6.21-25.8 milestone at Launchpad.

New Features:

  • Galera 3.8 introduces auto-eviction for nodes in the cluster experiencing network issues like packet loss. It is off by default and is turned on with evs.auto_evict option. This feature requires EVS protocol version (evs.version) 1. During the EVS protocol upgrade all membership changes are communicated over EVS protocol version 0 to preserve backwards compatibility, protocol is upgraded to the highest commonly supported version when forming a new group so if there exist a single node with older version in the group, the group protocol version remains as 0 and auto-eviction is not functional. (#1274192).
  • Percona XtraDB Cluster now supports backup locks in XtraBackup SST (in the default xtrabackup-v2 wsrep_sst_method). Backup locks are used in lieu of FLUSH TABLES WITH READ LOCK on the donor during SST. This should allow for minimal disruption of existing and incoming queries, even under high load. Thus, this should allow for even faster SST and node being in ‘donor/desynced’ state. This also introduces following constraints: Percona XtraDB Cluster 5.6.21 requires Percona XtraBackup 2.2.5 or higher; An older (< 5.6.21) joiner cannot SST from a newer (>= 5.6.21) donor. This is enforced through SST versioning (sent from joiner to donor during SST) and logged to error log explicitly. (#1390552).
  • Percona XtraDB Cluster is now shipped with Galera MTR test suite.

Bugs Fixed:

  • Percona XtraDB Cluster now shows a warning in case additional utilities, like pv which may not affect critical path of SST, are not installed. Bug fixed #1248688.
  • mysqldump SST can now use username/password from wsrep_sst_auth under group of [sst] in my.cnf in order not to display the credentials in the error log. Bug fixed #1293798.
  • Normal shutdown under load would cause server to remain hanging because replayer failed to finish. Bug fixed #1358701.
  • wsrep_causal_reads variable was not honored when declared as global. Bug fixed #1361859.
  • garbd would not work when cluster address was specified without the port. Bug fixed #1365193.
  • If mysqld gets killed during the SST it will leave an unclean data directory behind. This would cause Percona XtraDB Cluster to fail when the server would be started next time because the data directory would be corrupted. This was fixed by resuming the startup in case wsrep-recover failed to recover due to corrupted data directory. The old behavior is still achievable through --exit-on-recover-fail command line parameter to mysqld_safe or exit-on-recover-fail under [mysqld_safe] in my.cnf. Bug fixed #1378578.
  • Percona XtraDB Cluster now reads environment variables for mysqld from following files (if present): /etc/default/mysql in Debian/Ubuntu; /etc/sysconfig/mysql in CentOS 6 or lower; /etc/sysconfig/mysql in CentOS 7 with mysql.service; /etc/sysconfig/XYZ in CentOS 7 with mysql@XYZ.service (/etc/sysconfig/bootstrap is supplied by default). Bug fixed #1381492.
  • gvwstate.dat file was removed on joiner when Xtrabackup SST method was used. Bug fixed #1388059.
  • Percona XtraDB Cluster now detects older joiners which don’t have the backup lock support. Bug fixed #1390552.
  • Inserts to a table with auto-increment primary key could result in duplicate key error if another node joined or dropped from the cluster during the insert processing. Bug fixed #1366997.
  • TOI now skips replication if all tables are temporary. Bugs fixed #11 and #13.
  • Bootstrapping a node tried to resolve gcomm address list specified in wsrep-cluster-address. Bug fixed #88.
  • In case stored procedure containing a non-InnoDB statement (MyISAM) performed autocommit, that commit would be entered two times: at statement end and next time at stored procedure end. Bug fixed #2.
  • Two appliers conflicting with local transaction and resulting later in (acceptable) BF-BF lock conflict, would cause cluster to hang when the other BF thread would not grant the lock back after its local transaction got aborted. Bug fixed #7.

Other bugs fixed #1378138, #1377226, #1376965, #1356859, #1026181, #1367173, #1390482, #1391634, and #1392369.

Release notes for Percona XtraDB Cluster 5.6.21-25.8 are available in our online documentation along with the installation and upgrade instructions.

Help us improve our software quality by reporting any bugs you encounter using our bug tracking system. As always, thanks for your continued support of Percona!

The post Percona XtraDB Cluster 5.6.21-25.8 is now available appeared first on MySQL Performance Blog.

(More) Secure local passwords in MySQL 5.6 and up

Di, 2014-11-25 08:00

I log into a lot of different servers running MySQL and one of the first things I do is create a file in my home directory called ‘.my.cnf’ with my credentials to that local mysql instance:

[client] user=root password=secret

This means I don’t have to type my password in every time, nor am I tempted to include it on the command line with -p and get the dreaded (but completely accurate):

Warning: Using a password on the command line interface can be insecure.

MySQL 5.6 introduces a utility to make this easier and more secure. First, let’s start with a new mysqld instance with a blank root password and make it more secure:

[vagrant@localhost ~]$ mysqladmin -u root password New password:secret Confirm new password:secret [vagrant@localhost ~]$ mysql -u root ERROR 1045 (28000): Access denied for user 'root'@'localhost' (using password: NO)

Ok, so I’ve added a password, now I want to create my .my.cnf file:

[vagrant@localhost ~]$ mysql_config_editor set --user=root --password Enter password:secret [vagrant@localhost ~]$ mysql Welcome to the MySQL monitor. Commands end with ; or g. Your MySQL connection id is 10 Server version: 5.6.21-70.0 Percona Server (GPL), Release 70.0, Revision 688 Copyright (c) 2009-2014 Percona LLC and/or its affiliates Copyright (c) 2000, 2014, Oracle and/or its affiliates. All rights reserved. Oracle is a registered trademark of Oracle Corporation and/or its affiliates. Other names may be trademarks of their respective owners. Type 'help;' or 'h' for help. Type 'c' to clear the current input statement. mysql>

What did ‘mysql_config_editor set’ actually do? It creates a .mylogin.cnf file (which stands in for a .my.cnf) in my home directory that contains my credentials, just in encrypted form:

[vagrant@localhost ~]$ ls -alh .my* -rw-------. 1 vagrant vagrant 100 Nov 19 16:20 .mylogin.cnf -rw-------. 1 vagrant vagrant 29 Nov 19 16:20 .mysql_history [vagrant@localhost ~]$ cat .mylogin.cnf ��>NTv�&�S���/�, >ј$%KZ 9i�V�jK䉦H[��� k. [vagrant@localhost ~]$ mysql_config_editor print [client] user = root password = *****

The mysql client picks this up right away and will use it by default. This file has good default filesystem permissions, is local to my homedir, and is a whole lot better than specifying it on the command line or typing it in every time.

This utility also supports a feature called ‘login-path’ wherein you can add multiple mysql logins (perhaps to different servers) and refer to them with the —login-path option in the mysql client:

[vagrant@localhost ~]$ mysql_config_editor set --login-path=remote --host=remote --user=remote --password Enter password:secure [vagrant@localhost ~]$ mysql_config_editor print --all [client] user = root password = ***** [remote] user = remote password = ***** host = remote [vagrant@localhost ~]$ mysql --login-path=remote ERROR 2005 (HY000): Unknown MySQL server host 'remote' (0)

The ‘remote’ host doesn’t exist here, but you get the idea. You can create as many login-paths as you want with varied hostnames, credentials and other login parameters and quickly access them with any client supporting login-path.

Now, how secure is this really?  This isn’t secure from anyone who roots your DB server.  I would say the benefits are more about reducing careless password storage and tidier management of local credentials.

The post (More) Secure local passwords in MySQL 5.6 and up appeared first on MySQL Performance Blog.

Percona Server 5.6.21-70.1 is now available

Mo, 2014-11-24 15:28

Percona is glad to announce the release of Percona Server 5.6.21-70.1 on November 24, 2014. Download the latest version from the Percona web site or from the Percona Software Repositories.

Based on MySQL 5.6.21, including all the bug fixes in it, Percona Server 5.6.21-70.1 is the current GA release in the Percona Server 5.6 series. Percona Server is open-source and free. Complete details of this release can be found in the 5.6.21-70.1 milestone on Launchpad.

Bugs Fixed:

  • A slave replicating in RBR mode would crash, if a table definition between master and slave differs with an allowed conversion, and the binary log contains a table map event followed by two row log events. This bug is an upstream regression introduced by a fix for bug #72610. Bug fixed #1380010.
  • An incorrect source code function attribute would cause MySQL to crash on an InnoDB row write, if compiled with a recent GCC with certain compilation options. Bug fixed #1390695 (upstream #74842).
  • MTR tests for Response Time Distribution were not packaged in binary packages. Bug fixed #1387170.
  • The RPM packages provided for CentOS 5 were built using a debugging information format which is not supported in the gdb version included with CentOS 5.10. Bug fixed #1388972.
  • A session on a server in mixed mode binlogging would switch to row-based binlogging whenever a temporary table was created and then queried. This switch would last until the session end or until all temporary tables in the session were dropped. This was unnecessarily restrictive and has been fixed so that only the statements involving temporary tables were logged in the row-based format whereas the rest of the statements would continue to use the statement-based logging. Bug fixed #1313901 (upstream #72475).

Other bugs fixed: #1387227, and #1388001.

Release notes for Percona Server 5.6.21-70.1 are available in the online documentation. Please report any bugs on the launchpad bug tracker

The post Percona Server 5.6.21-70.1 is now available appeared first on MySQL Performance Blog.