I’m often asked how one can evaluate IO subsystem (Hard drive RAID or SAN) performance for MySQL needs so I’ve decided to write some simple steps you can take to get a good feeling about it, it is not perfect but usually can tell you quite a lot of what you should expect from the system.
What I usually look for MySQL is performance in random reads and random writes. Sequential reads and writes are rarely the problem for OLTP workloads, so we will not look at them.
I also prefer to look at performance with O_DIRECT flag set to bypass OS cache. This may execute separate code path in kernel and so has a bit different performance pattern compared to buffered IO (even followed by fsync regularly) , but it allows to easily bypass OS cache both for reads and for writes and so does not require creating large working sets for boxes with significant amounts of memory (or reducing amount of usable memory).
The system I’m testing this on has 256MB BBU (Battery Backed up Cache) on RAID controller so we will test two workloads size – first one is small which fits in the cache and the second one will be large enough so it does not. This allows us to see both by RAID cache and uncached IO performance.
We’re interested in cached reads because they may show latency to the RAID cache if RAID read cache is enabled. If it is disabled you may be reading from Drives cache or even drives themselves which will affect performance significantly. In any case especially playing with data size a bit you will well learn how your cache behaves.
Were even more so interested about writes which show us how many cached writes per second we can do – this is important for log writes, which are synchronous IO operation database performance which typically have rather close data locality.
We’re also interested in uncached reads and writes because this correspond to general database workload.
It is worth to run the test with 1 thread and with some higher number (say 64) to see how things scale.
The tool I’m using for this is SysBench which was designed by my team when I still worked for MySQL and we specially implemented bunch of tests to ease hardware evaluation for things which are important for MySQL.
To prepare small 128MB single file working set we can use the following command:
[root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M prepare
And when we can run the test:
./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M --file-test-mode=rndrd run sysbench v0.4.8: multi-threaded system evaluation benchmark Running the test with following options: Number of threads: 1 Extra file open flags: 16384 1 files, 128Mb each 128Mb total file size Block size 16Kb Number of random requests for random IO: 1000000 Read/Write ratio for combined random IO test: 1.50 Calling fsync() at the end of test, Enabled. Using synchronous I/O mode Doing random read test Threads started! Time limit exceeded, exiting... Done. Operations performed: 773835 Read, 0 Write, 0 Other = 773835 Total Read 11.808Gb Written 0b Total transferred 11.808Gb (201.52Mb/sec) 12897.22 Requests/sec executed Test execution summary: total time: 60.0001s total number of events: 773835 total time taken by event execution: 59.0285 per-request statistics: min: 0.0001s avg: 0.0001s max: 0.0086s approx. 95 percentile: 0.0001s Threads fairness: events (avg/stddev): 773835.0000/0.00 execution time (avg/stddev): 59.0285/0.00
The output is pretty verbose so it is quite scary for many people, however there is basically one number important here 12897.22 req/sec – so the RAID cache on this drive can do about 13.000 16K reads/sec from the cache, not bad at all.
With 64 threads I get 23727.46 req/sec which shows even in such cached mode you can get better performance by having many outstanding requests.
Running with more than one threads you may also take a look at another table SysBench displays:
per-request statistics: min: 0.0008s avg: 0.0027s max: 0.0112s approx. 95 percentile: 0.0030s
So we get about 3ms 95 percentile request time – not bad – all request were executed with pretty uniform performance.
Let us now do the test with single thread and write-through RAID cache (as if there would not be any BBU)
[root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M --num-threads=1 --file-test-mode=rndwr run sysbench v0.4.8: multi-threaded system evaluation benchmark
Auch… We get 142.62 Requests/sec which is extremely poor considering this is 8 drives array. You would likely get a bit better with sequential log writes but it is not going to be much higher.
Lets see what we get with 64 threads… hm again 142.83 Requests/sec and looking at VMSTAT I can see only 1 blocked process all the time which does not sounds right.
Honestly I did not expect this result while running benchmarks for this articles (and this is always very exciting to find something unexpected while doing Benchmarks). Fortunately I’ve already seen this some time ago though I hoped this issue is long fixed since that time… Though this box is running CentOS4 which is not the newest OS out where.
The problem is basically in case of O_DIRECT writes there is only one IO can be happening per file at any given time. To show this is the problem let us create working set consisting out of number of files and repeat the run:
[root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=128 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M --num-threads=64 --file-test-mode=rndwr cleanup sysbench v0.4.8: multi-threaded system evaluation benchmark Removing test files... [root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=128 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M --num-threads=64 --file-test-mode=rndwr prepare sysbench v0.4.8: multi-threaded system evaluation benchmark 128 files, 1024Kb each, 128Mb total Creating files for the test... [root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=128 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M --num-threads=64 --file-test-mode=rndwr run
Now we get 827.94 Requests/sec requests per second, which is not bad considering this is RAID5 volume.
Let us now repeat the tests for WriteBack cache configuration:
We get 4735.27 Requests/sec from 1 thread and single file; 4740.19 Requests/sec from 64 threads and 1 file and 5228.68 Requests/sec with 64 threads and 128 files.
As you can see numbers are a lot better especially when there is single thread doing IO (just as with database log). We can see serialization with O_DIRECT however the impact is much less in this case compared to Write Through mode because we’re mostly limited by RAID controller capacity.
Let us now do test with larger test set. We’ll use 2GB in 128 files (because we already discovered synchronization issue with O_DIRECT). This sounds small but as controller cache is just 256M it should be good enough.
[root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=1 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=128M --num-threads=64 --file-test-mode=rndwr cleanup sysbench v0.4.8: multi-threaded system evaluation benchmark Removing test files... [root@DB10 sysbench]# ./sysbench --test=fileio --max-time=60 --max-requests=1000000 --file-num=128 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=2G --num-threads=1 --file-test-mode=rndwr prepare sysbench v0.4.8: multi-threaded system evaluation benchmark
There is also one trick running such larger scale benchmarks. SysBench by default has repeatable sequence of requests, because it does not initialize random generator. If you’re doing multiple runs make sure to add –init-rng=1 option otherwise results can be skewed.
So we do run as:
[root@DB10 sysbench]# ./sysbench --init-rng=1 --test=fileio --max-time=60 --max-requests=1000000 --file-num=128 --file-extra-flags=direct --file-fsync-freq=0 --file-total-size=2G --num-threads=1 --file-test-mode=rndrd run
For single thread we get 259.28 Requests/sec which approximately matches what single drive can do. This is important lesson – even if you have many hard drives you can’t really get advantage of them unless you can get multiple outstanding requests.
Performance with 64 threads was 2334.63 Requests/sec which is perfect scalability to 6 hard drive this array has. We have even a bit more than 8x because 64 outstanding requests allow RAID controller to optimize seeks.
Lets us see how writes are doing:
For writes we’re getting 583.43 Requests/sec for 1 thread and 605.88 Requests/sec for 64 threads.
As you can see we got more writes for single thread than we had reads and also using multiple threads for writes did not improve performance as dramatically as for reads. This is because as we have BBU writes are just stored in controller cache and flushed to the disk in background – in this case there is little difference if we use one thread or multiple threads.
As you also can see BBU is helpful even for random writes if they are performed from single thread – so it does not only help database log writes but has much larger positive impact on workload.
I should also note it is very good idea to watch VMSTAT while doing the test. Here is the snipped from good VMSTAT from the read test:
0 64 90560 3196492 86832 12631072 0 0 37760 23 3402 6932 0 12 64 24 0 64 90560 3196500 86836 12631068 0 0 37222 26 3328 7694 0 1 75 24 0 64 90560 3196436 86836 12631068 0 0 36998 6 3340 7716 0 1 74 24 0 64 90560 3196428 86840 12631064 0 0 37395 12 3320 7774 0 1 74 24 0 64 90560 3196436 86840 12631064 0 0 37498 6 3365 7758 0 1 74 24 0 64 90560 3196436 86840 12631064 0 0 37702 6 3370 7680 0 2 74 24
As you can see all 64 threads are waiting for IO. Now lest see our write test which seems to have some serialization issues
2 39 90560 5305556 86920 10534204 0 0 0 9345 1625 4587 0 0 70 30 1 1 90560 5305548 86920 10534204 0 0 0 9017 1661 4934 0 0 74 26 1 1 90560 5305556 86924 10534200 0 0 0 9223 1638 4881 0 0 73 27 1 3 90560 5305556 86932 10534192 0 0 0 9634 1715 5033 0 0 74 26 1 1 90560 5305556 86932 10534192 0 0 0 9235 1643 5035 0 0 73 26
You can get similar information by analyzing iostat numbers and that well may be more accurate. Though looking at VMSTAT is often good enough to see the problem.
P.S If you’re wondering which hardware I used for benchmarks – it is Dell PowerEdge 2950 with PERC5 and 6 10K RPM hard drives in RAID5.
The post Evaluating IO subsystem performance for MySQL Needs appeared first on MySQL Performance Blog.