You would expect identical SQL queries, running against the same database at the same time to give the same results, right? Well, it turns out that such an impossibility can occur. Alas, this wasn’t some easter egg or heisenbug hidden in MySQL, the issue turned out to be perfectly sane. This is how the confusion unfolded…
A colleague and I were analysing a particular set of results from a database and some of our numbers didn’t quite match when we expected them to. No big deal, one set of numbers was coming from a hand crafted query, the other from a web interface. We figured there was most likely a difference in the query which accounted for the mis-match.
We started comparing those queries, looking for discrepancies. Everything appeared fine. We started modifying the manual query to be simpler, whilst still giving relevant numbers, which resulted in a query similar to this:
SELECT foo, DATE(bar), COUNT(*) FROM table_A WHERE bar >= some-date GROUP BY 1,2
We hit run on the query and checked against the web interface again. The count values still didn’t match for each
I started to suspect some sort of caching, perhaps the query wasn’t even being run for the interface side? That turned out to be a dead-end though, as this particular set of results was generated ‘on the fly’ as far as the interface was concerned.
The database didn’t have any slaving setup either, so there wasn’t any chance of us reading from an out-of-date data set.
Back to Basics
When a program isn’t behaving correctly start by demonstrating the issue in the simplest form possible. Aside from ending up with a clear bug description, the process of boiling the problem down to a single issue aids you, the developer, in understanding and debugging the program. If you submit a bug to Mozilla for example, they suggest that you should be able to describe your bug in 10 words. (As an aside, their bug writing guidelines are worth reading).
I started adjusting the query for the simplest/smallest broken count I could find. At this point I realised that grouping by something other than the DATE() function always gave counts that matched. Whereas if I started grouping by a time period things would go awry.
The penny started to drop at this point. It certainly seemed related to how each query perceived time. Both queries were using the same database, but from different machines. I looked up which timezone each machine was using by running
date on the command line. This gave the same results — I started to think the timezone might also be a red herring.
Question: What timezone is your MySQL instance and connection using? You can find out by running:
SELECT @@global.time_zone, @@session.time_zone;
I checked the timezone from the web interface database handle and from the machine executing the manual query. Sure enough they didn’t match! One was running in UTC, the other in BST. This meant that
DATE gave different ranges for each database handle. The results? Identical queries executed on the same database at the same time gave different results and were both doing ‘the right thing’. Setting the timezones to match solved the mystery.
A final thought here is that this bug only shows up during British Summer Time. If we’d been comparing our queries at a different time of year they would have been identical even with the timezone mismatch.