Things break. We don’t usually intend them too, they just do. Sometimes, maybe we do intend them to break. I’m still suspicious about cars breaking on purpose so we will buy a new one. Other times, they break in ways we don’t expect, like these times I made queries respond too fast.
For example, several years ago I was a fairly new DBA. I wasn’t used to the enterprise class roasting one gets when things are not quite right. We had a multi tier application, DB, app server with a desktop application that started to respond with errors after a hardware upgrade of the SQL Server.
Darn. We tested everything we could. We reviewed the code and sent it through upgrade analyzer. We restored production backups and tested those on the new hardware.
But it wasn’t enough load for the system to break in a peculiar way. The server had a limited number of threads to deal with a response from SQL. Seriously, hard-coded value of 3 thread to go to SQL, get a visual basic script stored in the database, and execute that on the app server. I kid you not, sometimes the sausage factory isn’t worth the tour.
This dynamic script was now loading up these 3 threads all day, and it was a lengthy script so sometimes it would hit the other hard coded timeout values and the users would start to get errors in large numbers instead of just waiting a little longer that usual.
We tried, and tried some more to work with the vendor. But in the end I had to make SQL slower so the app could keep up. We moved this SQL Server back to old hardware until we could also upgrade the app server.
The most recent time I made SQL too fast was a couple days ago. Another third party vendor who is XML happy. Surprisingly, the problem had nothing to do with XML.
Re-wind several years ago, I convinced a co-worker that SQL 2012 would be a great choice for their application even though the vendor didn’t technically support it yet. Who cares! Microsoft supports 2012, not this vendor. We tried and tried to get the thing to work but the application just cooperate with SQL 2012. Fortunately, we didn’t have any data so we just reinstalled the app and used SQL 2008R2. I conceded defeat, probably too soon and rolled it back after delaying their upgrade a couple weeks.
Back to a few days ago, that same co-worker fell for my tricks again with a different app. This time SQL 2016, because Bob Ward says it just runs faster ( http://www.sqlpass.org/summit/2016/Sessions/Details.aspx?sid=51194 ). Let me tell you how hilarious this error is as briefly as I can.
This XML happy application has a boatload of tables. Fairly normalized but lots of chatter with XML back and for the the application even with just a single user. We are testing a new version and a single user was getting a primary key violation. Consistently happens on SQL 2016 so I had him send it back to the vendor, even though he was threatening physical violence. Given our prior experience of jumping on a new version, I don’t blame him. No way could this be SQL 2016, primary keys are are similar across all versions right?
Vendor comes back with some confused email. “We can’t recreate the issue.” Blah blah, co-worker convinces me to give him access to create the database on a SQL 2012 instance and bam, no error. The beatings start. Even though he knows its bogus, still I have bruises and my fear of the new shiny is reinforced.
Are you ready for the shocking news? SQL 2016 is fast. Really fast. Even with absolute garbage design and code, its fast. That is how we got primary key violations.
The table we are working with has a primary key. But the interesting part is, it is a composite key meaning there are a few columns that create uniqueness between rows. So two of the columns can be identical across rows, as long as the third is different. Then we are unique.
Well, in an absolutely vomit enduing way, the vendor chose a datetime datatype to create this uniqueness. Well, when several rows come back in less than 3ms, even with a boatload of garbage XML to process, we no longer have uniqueness because datetime isn’t precise enough.
Not sure what they might consider a fix, but I explained to them what is happening. Hopefully, they don’t jump on the datetime2, knee jerk reason and go with an actually unique key.