Thursday, December 12, 2013

Quick and dirty concurrent operations from the shell

Let’s say that you want to measure something in your database, and for that you need several operations to happen in parallel. If you have a capable programming language at your disposal (Perl, Python, Ruby, PHP, or Java would fit the bill) you can code a test that sends several transactions in parallel.

But if all you have is the shell and the mysql client, things can be trickier. Today I needed such a parallel result, and I only had mysql and bash to accomplish the task.

In the shell, it’s easy to run a loop:

for N in $(seq 1 10)
do
    mysql -h host1 -e "insert into sometable values($N)" 
done

But this does run queries sequentially, and each session will open and close before the next one starts. Therefore there is no concurrency at all.
Then I thought that the method for parallel execution in the shell is to run things in the background, and then collect the results. I just needed to find a way of keeping the first session open while the others are being started.

Here’s what I did: I ran a loop with a countdown, using the seq command, and I included a sleep statement in each query, waiting for a decreasing amount of seconds. If I start with 10 seconds, the first query will sleep for 10 seconds, the second one for 9 seconds, and so on. I will run each command in the background, so they will eat up the time independently.

#!/bin/bash
mysql -h host1 test -e 'drop table if exists t1'
mysql -h host1 test -e 'create table t1 (i int not null primary key, ts timestamp)'

for N in $(seq 10 -1 1)
do
    query1="set autocommit=0"
    query2="insert into test.t1 (i) values($N)"
    query3="select sleep($N) into @a; commit"
    mysql -h host1 -e "$query1;$query2;$query3" &
done

wait

mysql -h host1 test -e 'select * from t1'

The effect of this small script is that the commit for these 10 commands come at the same time, as you can see from the resulting table:

+----+---------------------+
| i  | ts                  |
+----+---------------------+
|  1 | 2013-12-12 18:08:00 |
|  2 | 2013-12-12 18:08:00 |
|  3 | 2013-12-12 18:08:00 |
|  4 | 2013-12-12 18:08:00 |
|  5 | 2013-12-12 18:08:00 |
|  6 | 2013-12-12 18:08:00 |
|  7 | 2013-12-12 18:08:00 |
|  8 | 2013-12-12 18:08:00 |
|  9 | 2013-12-12 18:08:00 |
| 10 | 2013-12-12 18:08:00 |
+----+---------------------+

This is a very good result, but what happens if I need to run 500 queries simultaneously, instead of 10? I don’t want to wait 500 seconds (8+ minutes). So I made an improvement:

for N in $(seq 5000 -10 1)
do
    echo $N
    query1="set autocommit=0"
    query2="insert into test.t1 (i) values($N)"
    query3="select sleep(concat('0.', lpad($N,4,'0'))) into @a; commit"
    mysql -h host1 -e "$query1;$query2;$query3" &
done

Now each SLEEP command is called with a fractional argument, starting at “0.5000”, and continuing with “0.4999,” and so on. You can try it. All 500 rows are committed at the same time.

However, the same time is a bit fuzzy. When we use timestamps with second granularity, it’s quite easy to show the same time. But with microseconds it’s a different story. Here’s what happens if I use MySQL 5.6 with timestamp columns using microseconds (TIMESTAMP(3)):

+----+-------------------------+
| i  | ts                      |
+----+-------------------------+
|  1 | 2013-12-12 18:27:24.070 |
|  2 | 2013-12-12 18:27:24.070 |
|  3 | 2013-12-12 18:27:24.069 |
|  4 | 2013-12-12 18:27:24.068 |
|  5 | 2013-12-12 18:27:24.065 |
|  6 | 2013-12-12 18:27:24.066 |
|  7 | 2013-12-12 18:27:24.062 |
|  8 | 2013-12-12 18:27:24.064 |
|  9 | 2013-12-12 18:27:24.064 |
| 10 | 2013-12-12 18:27:24.064 |
+----+-------------------------+

For the purpose of my tests (the actual queries were different) this is not an issue. Your mileage may vary.

No comments: