RSS

5 9s Lead to Nestfrastructure (and fewer 9s)

Off the top of my head,

Microsoft DNS issue a handful of hours before xbox one launch(http://redmondmag.com/articles/2013/11/21/windows-azure-outages.aspx)

Widespread Amazon outages (http://www.zdnet.com/amazon-web-services-suffers-outage-takes-down-vine-instagram-flipboard-with-it-7000019842/)

NASDAQ (http://www.bloomberg.com/news/2013-08-26/nasdaq-three-hour-halt-highlights-vulnerability-in-market.html)

The POTUS’s baby (http://www.healthcare.gov)

I learned about 5 9’s in a college business class. If a manufacturer wants to be respected as building quality products, they should be able to build 99.999% of them accurately. That concept has translated to IT as some kind of reasonable expectation of uptime. (http://en.wikipedia.org/wiki/High_availability)

I take great pride in my ability to keep servers running. Not only avoiding unplanned downtime, but developing a highly available system so it requires little to no planned downtime. These HA features add additional complexity and can sometimes backfire. Simplicity and more planned downtime is often times the best choice. If 99,999% uptime is the goal, there is no room for flexibility, agility, budgets or sanity. To me, 5 9s is not a reasonable expectation of uptime even if you only count unplanned downtime. I will strive for this perfection, however, I will not stand idly by while this expectation is demanded.

Jaron Lanier, the author and inventor of the concept of virtual reality, warned that digital infrastructure was moving beyond human control. He said: “When you try to achieve great scale with automation and the automation exceeds the boundaries of human oversight, there is going to be failure … It is infuriating because it is driven by unreasonable greed.”
Source: http://www.theguardian.com/technology/2013/aug/23/nasdaq-crash-data

IMHO the problem stems from dishonest salespeople. False hopes are injected into organizations’ leaders. These salespeople are often times internal to the organization. An example is an inexperienced engineer that hasn’t been around for long enough to measure his or her own uptime for a year. They haven’t realized the benefit of keeping track of outages objectively and buy into new technologies that don’t always pan out. That hope bubbles up to upper management and then propagates down to the real engineers in the form of an SLA that no real engineer would actually be able to achieve.

About two weeks later, the priority shifts to the new code release and not uptime. Even though releasing untested code puts availability as risk, the code changes must be released. These ever changing goals are prone to failure.

So where is 5 9s appropriate? With the influx of cloud services, the term infrastructure is being too broadly used. IIS is not infrastructure, it is part of your platform. Power and cooling are infrastructure and those should live by the 5 9s rule. A local network would be a stretch to apply 5 9s to. Storage arrays and storage networks are less of a stretch because the amount of change is limited.

Even when redundancies exist, platform failures are disruptive. A database mirroring failover (connections closed), webserver failure (sessions lost), a compute node (os reboots) and even live migrations of vms require a “stun” which stops the cpu for a period of time(a second?). These details I listed in parentheses are often omitted from the sales pitch. The reaction varies with each application. As the load increases on a system these adverse reactions can increase as well.

If you want to achieve 5 9s for your platform, you have to move the redundancy logic up the stack. Catch errors, wait and retry.

stack

Yes, use the tools you are familiar with lower in the stack. But don’t build yourself a nest at every layer in the stack, understand the big picture and apply pressure as needed. Just like you wouldn’t jump on every possible new shiny security feature, don’t jump on every redundancy feature to avoid nestfrastructure.

 

vMotion, an online operation?

There are two types of vMotions, storage and regular. Storage vMotion moves VM files or a single .vmdk file to another datastore. The regular vMotion moves the VMs memory from one host to another and then stuns the VM in order to pause processing so the new host can open the file and take ownership of the VM. Today I’ll be referring mostly to the regular vMotion.

These are both fantastic technologies that allow for rolling upgrades of all kinds and also the ability to load balance workloads based on usage. The Distributed Resource Scheduler (DRS) runs every 5 minutes by default to do this load balancing. Datastore clusters can be automated to balance VMs across datastores for space and usage reasons. Like I said, these technologies are fantastic but need to be used responsibly.

“VMware vSphere® live migration allows you to move an entire running virtual machine from one physical server to another, without downtime” – http://www.vmware.com/products/vsphere/features/vmotion

That last little bit is up for debate. It depends on what your definition of downtime is. This interesting historical read shows that vMotion was the next logical step after a pause, move and start operation was worked out. Even though VMware is now transferring the state over the network and things are much more live, we still have to pause. The virtual machine memory is copied to a new host, which takes time, then the deltas are copied over repeatedly until a very small amount of changed memory is left and the VM is stunned. This means no CPU cycles are processed while the last tiny little bit of memory is copied over, the file is closed by that host and the file is opened on the new host which allows for the CPU to come back alive. Depending on what else is going on, this can take seconds, yes that is plural. Seconds of an unresponsive virtual machine.

What does that mean? Usually in my environment, a dropped ping, or maybe not even a dropped ping but a couple slow pings in the 300ms range. This is all normally fine because TCP is designed to re-transmit packets that don’t make it through. Connections generally stay connected in my environment. However, I have had a couple strange occurrences in certain applications that have lead to problems and downtime. Downtime during vMotion is rare and inconsistent. Some applications don’t appreciate delays during some operations and throw a temper tantrum when they don’t get their CPU cycles. I am on the side of vMotion and strongly believe these applications need to increase their tolerance levels but I am in a position where I can’t always do that.

The other cause of vMotion problems is usually related to over committed or poorly configured resources. vMotion is a stellar example of super efficient network usage. I’m not sure what magic sauce they have poured into it but the process can fully utilize a 10Gb connection to copy that memory. Because of that, vMotion should definitely be on its own vLan and physical set of NICs. If it is not, the network bandwidth could be too narrow to complete the vMotion process smoothly and that last little bit of memory could take a longer time than normal to copy over causing the stun to take longer. Very active memory can also cause the last delta to take longer.

Hardware vendors advertise their “east-west” traffic to promote efficiencies they have discovered inside blade chassis. There isn’t much reason for a vmotion from one blade to another blade in a chassis to leave the chassis switch. This can help reduce problems with vMotions and reduce the traffic on core switches.

In the vSphere client, vMotions are recorded under the tasks and events. When troubleshooting a network “blip” the completed time of this task is the important part. Never have I seen an issue during the first 99% of a vMotion. If I want to troubleshoot broader issues, I use some T-SQL and touch the database inappropriately. Powershell and PowerCLI should be used in lieu of database calls for several reasons but a query is definitely the most responsive of the bunch. This query will list VMs by their vMotion frequency since August.


SELECT
MAX([VM_NAME]) as 'VM',
count(*) as 'Number of vmotions'
FROM [dbo].[VPXV_EVENTS]
WHERE
EVENT_TYPE = 'vm.event.DrsVmMigratedEvent' and
CREATE_TIME > '2014-8-14'
GROUP BY vm_name
ORDER BY 2

This query can reveal some interesting problems. DRS kicks in every 5 minutes and decides if VMs need to be relocated or not. I have clusters that have DRS on but don’t ever need to vMotion any VMs because of load and I have clusters that are incredibly tight on resources and vMotion VMs all the time. One thing I have noticed is that VMs that end up on the top of this query can sometimes be in a state of disarray. A hung thread or process that is using CPU can cause DRS to search every 5 mintues for a new host for the VM. Given the stun, this isn’t usually a good thing.

IMHO, a responsible VM admin is willing to contact VM owners when they are hitting the top of the vMotions list. “Don’t be a silent DBA.” That is some advice I received earlier on in my career. Maintenance and other DBA type actions that can be “online” but in actuality cause slowdowns in the system that other support teams may never find the cause for. The same advice can be applied to VMware admins as well.

 
Leave a comment

Posted by on September 16, 2014 in Virtual

 

SQL Saturday Columbus Recap #SQLSAT299

I decided to take a brief trip down memory lane for this recap.

http://www.sqlsaturday.com/84/schedule.aspx Attendee, Volunteer
http://www.sqlsaturday.com/160/schedule.aspx Attendee, Volunteer
http://www.sqlsaturday.com/204/schedule.aspx Attendee, Volunteer, Speaker
http://www.sqlsaturday.com/256/schedule.aspx Attendee, Volunteer
http://www.sqlsaturday.com/292/schedule.aspx Attendee, Volunteer Coordinator, Speaker
http://www.sqlsaturday.com/299/schedule.aspx Attendee, Speaker

Some of those session titles are amusing after 3 years, especially anything that has “new” in the title. That first SQL Saturday in 2011 was pretty special. I realized that volunteering helped my more introverted personality get a chance to network with others.

At the Kalamazoo84 SQLSat I was having a conversation about the pains of double hop authentication and another speaker asked me what my session was about, but I was only a volunteer. I didn’t think I was ready to speak(I wasn’t). That person thought for some strange reason I knew my stuff and suggested I whip up a session and try it out. It was some advice that I remember but didn’t act on for quite a while. This was also another interesting question because it is a total bait question. It is something that the speakers are thinking about and is a great icebreaker.

The Detroit SQL Saturday in 2013 was the first time I was a speaker at a SQL Saturday. I had found my niche that I was passionate enough about to actually enjoy getting up in front of people and presenting. The basic SQL topics are great but I didn’t feel I had enough ground breaking experience and depth on any of those topics to present until I found a way to make security interesting. It was my in because nobody else seemed to be talking about it. I saw other presenters doing a bit of cross training into virtualization and storage so I figured a bit of offensive security and networking concepts would be totally acceptable. A couple user groups of practice and I was ready for a larger audience. I packed a smaller room full of very interested and thankful people. I’m glad the first time went well because it was very nerve racking. I may not have continued to challenge myself in this way had it went poorly.

Kalamazoo, Detroit and now Columbus. These SQL Saturday conferences have all been very rewarding. I always learn something, meet at least a few new awesome people and give as much back to the community as I can. Getting a reasonably sized, semi-interested audience is priceless to me when I am trying to practice my presentation and public speaking skills. There is only so much I can teach my wife about computers until she murders me in my sleep!

My session in Columbus went well sans one whoopsie. I have learned I need to get an accurate start and stop time from multiple sources. I started my session at 3:30 thinking the 3:34 was a typo in the handout. Unfortunately it was a typo but in the other direction and was supposed to start at 3:45 according to the website. I started at 3:30 and someone kindly got up and shut the door. A little less than 10 minutes in I noticed a small crowd peeking in the small glass part of the door and someone finally opened it. This nearly doubled the people in attendance so I started over but didn’t show the video ( https://www.youtube.com/watch?v=c36UNSoJenI ) again. Anyways, the slides and demo scripts are posted on the schedule link above.

I decided to attend sessions at this SQL Saturday. Below are the sessions I attended. I particularly liked Kevin Boles SQL Injection session because of the hands on approach. He developed a great demo that showed several different methods of attack and defense. It is also very complimentary to my session because I avoid that particular topic for the most part.

299_attendance

Also, I would like to thank Mark https://twitter.com/m60freeman for organizing a great speaker dinner and event. I’m happy they were able to give me the opportunity to present.

I sometimes imagine where would I be today had I not started attending user groups and events like SQL Saturday. I would most likely be a mess. I have supported an environment that has grown from ~15 SQL servers 5 years ago to almost 200. Without the skills and drive to make SQL Server the best possible platform at my organization I’m not sure I would have as much responsibility. Business users would have run away instead of diving into SQL Server. I imagine myself still being a “DBA” but constantly putting out fires instead of scripting our build and auditing processes. I imagine myself never having the time to research storage and virtualization and becoming confident enough to take on these new administration challenges. I definitely would not have begun the journey of improving my public speaking skills that have improved my overall quality of life. The place without PASS in my life is a scary place.

 
Leave a comment

Posted by on June 24, 2014 in PASS

 

Tags:

I’m Speaking in Columbus June 14th

Free training, free networking and only $10 for lunch. Best you cancel your plans for June 14th and find your way to Columbus, OH.

More details can be found here: http://www.sqlsaturday.com/299/eventhome.aspx

This presentation is similar to the presentation that I delivered at SQL Saturday Detroit.

Hacking SQL Server – A Peek into the Dark Side
The best defense is a good offense. Learn how to practice hacking without going to jail or getting fired. In this presentation we’ll be demonstrating how to exploit weak SQL servers with actual tools of the penetration testing trade. You will learn why the SQL Service is a popular target on your network and how to defend against basic attacks.

Hope to see you there!

 
Leave a comment

Posted by on May 29, 2014 in PASS

 

SQL Saturday Detroit 292 Recap

And it is all over way too soon.

I normally don’t like to whine and complain to anyone other than my wife and mom when I am sick, but man, was I sick leading up to this SQL Saturday. I picked up some kind of stomach flu, probably from Vegas the week prior at EMCWorld. The thought crossed my mind about warning people that I might be unable to make it if I got any worse. Fortunately, the sickness passed by Friday morning and I was able to muscle through.

Volunteer Coordinator

Volunteer coordinator sounds fancy but just getting a list from the coordinator and lots of communication. I decided to use http://www.volunteerspot.com that worked well for the Bsides Detroit conference I helped at the previous summer. You can sign up for free and setup tasks lists on different days. Then you simply paste in your list of volunteer emails and they can choose what items they want to volunteer for. Room proctors, registration desk slots and a few miscellaneous tasks added up to 38 tasks the day of the event which was a bit of a bear to enter. Friday, I had one 3 hour task to make sure I had a list of people to help setup the rooms and stuff the bags.

Allowing the volunteers to pick their own tasks is something that I didn’t think would work out that well but actually did. It is much more efficient just to auto-pick all the slots and then do any trades later, but with the help of volunteer spot it was easy to allow them the chance to pick their own so they could attend sessions they wanted to attend. This is the second year so we had some experience on the team which helped this process go smoothly. Two days before the event, while I lay sick in agony, I filled the last 5 or so tasks.

One thing I could improve on is using the report feature they provided. I didn’t think there was one, but there is a giant button on the left side of the UI. Using my giant phablet proved to be a bit more cumbersome than I had anticipated to pull up a list of tasks to find out who was doing what. Printing off that task list and actually taking attendance first thing the day of the event is something I would recommend.

Presenter

I’m writing to you today nearly a week without coffee or any other substantial form of caffeine. My mental state is surprisingly sound considering I was up to a steady 4 cups a day. I don’t usually start the caffeine intake until around 9 in the morning which was when my presentation started. I was feeling well and no headaches but I did get a couple comments that the presentation was slow at the start which may or may not be related.

I chose to try something I wasn’t sure would work out too well at the start. I showed a 4 minute video from BBC about the honey badger. Not the crazy and dated honey badger doesn’t give a crap video but one I find hilarious and shocking from BBC. It shows how honey badgers escape their confinement no matter how hard the zookeeper tries to keep them cadged. I watch this and can’t help compare hackers to honey badgers. Also, getting that camera in the pen to show how they escape is what I am trying to achieve by showing people how SQL Server is hacked. I intended to use this metaphor throughout my presentation, but I sticking forgot all about it. O well, better luck in Columbus :]

This was the largest room I have spoken to yet with roughly 60 people. The chalkboard was a nice addition which allowed me to illustrate the network which is something I am still working out. I was happy to find out I got the larger room because the previous year the 40 person room was completely packed. I am satisfied with how I did and am really happy to get a large majority of positive feedback and some really good advice from the attendees. My complex demos that require typing all worked and the projector didn’t have any issues so I would say I lucked out.

Attendee

Even though the event was in the same place as last year we got an upgrade in the classrooms that were available to us. Now furnished with chalkboards and I think we had more seating than the previous year. My session was the first of the day and then Grant Fritchy’s followed in one of the larger rooms. I was in a zombie state so I settle in to the nearest seat and vegitated for a bit. The session was Titled Building a Database Deployment Pipeline and covered reasons to improve and team up database deployments with code deployments. It didn’t really get into the how, other than mention a few tools that I have heard of but am unfamiliar with.

Lunch was in another building which gave me a chance to walk by the vendor tables. They were a bit out of the way and seemed cramped. I wonder what we may have done better in this area. Had the vendors been setup at the beginning of the day that would have been the prime time to get most attendees passing through but from what I hear that wasn’t the case.

I got to see David Klee’s hitch impersonation after lunch. Not sure what happened but he had a terrible looking allergy attack. With some help from Tim Ford and Grant Fritchy he continued on with his session, “How to Argue with Your Infrastructure Admins – and Win”. I do like stories of strife, especially when they don’t involve me. I’m not sure I really got what I expected out of the session but it was enjoyable.

Grant’s session on execution plans is something every SQL Saturday needs. T-SQL and how database internals work can be explained much easier with the GUI view of a query plan. He has some really good advice on how to read query plans.

I walked in late to the T-SQL For Beginning Developers session and sat next to my wife who is a T-SQL absolute beginner. We both felt it was a little too advanced for her. She does have a small amount of experience writing code but doesn’t have any database experience. A lot of the nuances that were covered were not that valuable to her or I. Inserts, Updates, Deletes and Selects with some joins should have been covered more. I see so many 3rd party software products that doesn’t take advantage any functions because they want to support all the major database platforms. The session missed my expectations.

Wrap-Up

We were expecting a higher turnout this year because the previous year had a bit of a freak snowstorm. But the initial estimates put us a little under last year in attendance. I feel I could have done a better job promoting the event, especially at my place of employment but it just wasn’t in the cards. Overall, the event went very well and I look forward to Columbus and maybe a West Michigan event later this year.

 
Leave a comment

Posted by on May 23, 2014 in PASS

 

#SQLSatDet has made the front page

The short list of upcoming events now includes SQL Saturday #292 in Detroit http://www.sqlsaturday.com/.

Free training, free networking and only $12 for lunch. Best you cancel your plans for May 17 and find your way to Lawrence Technological University.

The speakers who submitted by the original deadline have been confirmed for at least one session. That means you will have a chance to listen to me talk about SQL Server Security in my Hacking SQL Server session. I really enjoyed speaking last year at this event and look forward to this years event including all the pre and post activities.

Here is my recap from last year: http://nujakcities.wordpress.com/2013/03/20/sqlpass-sqlsatdetroit-recap/

 
Leave a comment

Posted by on April 10, 2014 in PASS, Security, SQL Admin

 

Tags:

Toying with In-Memory OLTP

In six days the bits for SQL 2014 RTM will be available for download. I decided to fling myself into its hot new feature of In-Memory OLTP with the CTP2 release. I’ve attended one user group that gave an overview of the feature set ( Thanks @brian78 ) but other than that I have not read much technical information about In-Memory OLTP.

One advantage point that seems to pop up in literature surrounding the release is the ease of implementation. Not all tables in a database have to be In-Memory and a single query can seamlessly access both classic disk based tables and In-Memory tables. Since the product isn’t released yet, the information available on the featureset is heavily weighted towards sales. I wanted to see if achieving the 5x-20x performance boost was really as easy as it sounds. Instead of my usual approach of collecting lots of information and reading tutorials, I decided to blaze my own trail.

The first thing to do is create a new database. I noticed a setting that I heard referenced in the overview called delayed durability.

delayed_durability

Scripting the new database out in T-SQL also shows this new setting. I’m assuming this will make things faster since they don’t have to be persisted to disk right away.

delayed_durability_script

Before I run that script I decide to poke around a bit more. I see some In-Memory settings over on filestream. I’m not sure if that is a necessary requirement or not, but I am going to add a filegroup and file just in case.

filestream

file_stream_script

Now that the database is created I want to create a table. There is a special option in the Script-to menu for In-Memory optimized tables. I’ll create a few dummy columns and try to run it.

first_error

There seems to be a problem with my varchar column. “Indexes on character columns that do not use a *_BIN2 collation are not supported with indexes on memory optimized tables.” Well that is unfortunate, I suppose I will change the collation in this test but that won’t be easy in real life.

collation

After changing the collation I am able to create my memory optimized table.

in_memory_test2_success

I wondered if there would be any way to tell in my query plan if I’m actually optimized. It doesn’t appear so…

index_seek

Was that a 5x performance boost?? I’m I doing it right?? Not sure, but for now I need to take a break.

I’m hoping ISVs start supporting this feature but it might be a lot more work than advertised. After getting that error I found a list of many things that are not supported on these tables and in the compiled stored procedures. http://msdn.microsoft.com/en-us/library/dn246937(v=sql.120).aspx

This list does not encourage me to blaze new trails and start testing this as soon as it comes out. I prefer to wait a bit and let the other trail blazers blog about the issues they have.

 
Leave a comment

Posted by on March 26, 2014 in SQL Admin

 

Arp Spoofing with arpspoof

Consider someone has hijacked your DNS server. That person modifies the record for “prod_database.domain.com” to point to their IP address. At no additional charge, after capturing all the packets, they will kindly forward them on the the real prod_database. That would be a layer 7 to layer 3 link switcheroo.

Arp spoofing is a similar concept but instead of names to IPs, we modify the IP(layer 3) to MAC(layer 2) relationship. To demonstrate a successful spoof, I have to tell another client on my LAN that I am the gateway, and tell the gateway that I am that client.

In order to see the damage of arp spoofing you can look at your arp table while being spoofed. Type “arp -a” at a windows command prompt in order to view the contents of your arp cache. Arp spoofing is also called arp poisoning because of the false records that the tool is able to get added to a victim’s arp cache.

Using Kali Linux as the attacker, a fresh trial of Windows 2012 R2 as the victim, VMware Player, and the command line tool arpspoof I was able to successfully capture the victim’s traffic. For the traffic to flow through Kali, the first step is to turn on port forwarding. Then in step 2 and 3 we tell the subnet some lies.

arp_spoof_01

The poising has started. If you want to see this traffic you can use the “arp” filter in Wireshark.

arp_spoof_02

Finally, to offer some proof, I browse to Wikipedia on the victim’s machine and view the traffic on the attacker’s machine.

arp_spoof_03

The defenses to this attack include SSL/TLS, OS hardening and duplicate MAC detection among other things. Unfortunately this is how some proxy like tools work and you might not be able to use all of those methods to stop the attack.

Reference: http://www.irongeek.com/i.php?page=security/arpspoof

 
Leave a comment

Posted by on March 10, 2014 in Network Admin, Security

 

Comparing Server Processors

To pick the best processor for your server, a cursory understanding of how the application works is helpful. In virtulaized environments, that gets interesting because multiple applications run together on the same socket. Generalized compute clusters have their benefits, but some applications have “special needs.” Minimum support requirements could throw a wrench into your choice.

Single Threaded Applications

Multi-threaded code is challenging to write. That is why, even 10 years after multi-core processors became main stream, developers still write single threaded applications that require performance. If you are interested in the code side of this discussion, I wrote a simple post comparing single and multi threading http://nujakcities.wordpress.com/2010/11/16/single-vs-multi-threading/

If you have ever added a core to a VM and watched a process that used to run at 100% CPU on one core, now run at 50% CPU on two cores, you probably have a single threaded application. A multi-threaded process would be able to utilize all cores to 100%. Single threaded applications that cannot be upgraded to utilize multiple threads will benefit from higher clock speeds and not more cores.

Launch Date

This is the first thing I look at after I’ve flushed out any special requirements. I take the processor number from the server quote and find it on the manufacturer’s website. That way I can find out if I am looking at the latest processors. Since Moore’s law is still in play, I can be somewhat comfortable that the newer processor is going to be a lot faster in some way shape or form. If you are not comfortable jumping on the latest technology, you will also save quite a bit of money. Don’t let your hardware vendor talk about new processor features and slip an old one in the quote.

Clock Speed and Core Count

These are the two main factors. They are in the same heading because of the heat balancing act. Faster is hotter, and more cores is also hotter. We unfortunately can’t have the best of both worlds or the box will catch on fire.

If we compare two of the Q’14 high end Intel CPUs, we find the balance was found at 3.4 Ghz/6 cores and 2.8 Ghz/15cores. As a virtualization admin I see an obvious choice here going with 15 physical cores because 99% of my applications would be happier. Even if they are a single threaded applications, the additional .6GHz will be barely noticeable.

Knowing your application can come in handy when designing clusters and buying hardware. Enterprise code can sometimes be a decade old so be cautious of purchasing max cores over high GHz. You may run into more single threaded, latency sensitive application that you want to admit still exist.

If you have a need to run higher core count VMs, a higher number of physical cores per socket is best. I have a general rule of thumb to avoid creating VMs that have more vCPUs than pCores on a single socket. Even if it is a dual or quad socket server, you will see diminishing returns and sometimes problematic ready times if you are spanning a VM across several physical sockets.

Hyperthreading and Turbo

I remember owning a beige PC that had a nice button and red LED display. Press it once and the display would read 33 press it again and it would read 66. Even today, processors have an automatic turbo button however there are some caveats. Turbo won’t work if all of your cores are in high demand. If only a single core is in high demand and the others are idle, then Intel’s “Max Turbo Frequency” value kicks in.

Hyper threading sounds like a great idea but doesn’t actually perform as advertised. I like to refer to these as fake cores. With it on, your OS will see twice the cores. Sometimes they can help but I can guarantee it won’t give you a 2x performance improvement. It really depends, but I have heard ballpark estimates in the range of 25-50% performance improvement by turning on hyper-threading. So I do recommend turning this feature on, but be careful not to mix up physical core count with the hyperthreaded core count.

Cache sizes

About the only thing I understand in this field is cache is good. If some is good, more is probably better. In this area, server CPUs tend to blow Desktop CPUs out of the water. I’ll have to do some more research and testing to figure out if this value should be of great concern which buying processors.

Price

With software like SQL Server licensing by physical core, more than just the hardware costs need to be reviewed. If you are paying by the core or socket for premium software, it makes a lot of sense (and cents) to buy the best processors you can find. If you can increase the consolidation rate, you won’t need so many software licenses.

That is a big IF though. Make sure you are actually getting solid returns on high end processors. Some benchmarks have shown diminishing returns as manufacturers push the limitations. Lay it all on the table, even smaller or abstract costs like reliability, power consumption and productivity before making your decision. Most of all, have fun shopping!

 
Leave a comment

Posted by on February 24, 2014 in Hardware

 

Guest Memory Dump From the Hypervisor

Part of VMware’s VMotion process copies all the guest system’s memory from one physical host to another over the network. Snapshots and VM Suspends will force a memory checkpoint making sure there is a persisted full copy of memory on disk. The point here is that the hypervisor is very much aware of the guest’s memory.

Without the hypervisor there are a few ways to capture data in RAM needed for some serious debugging. A single process is easy, just fire up the proper bitness of task manager.

process_dump

If the Windows computer is actually crashing, you can have it automatically create a dump file. One requirement is enough space for the page file. http://blogs.technet.com/b/askcore/archive/2012/09/12/windows-8-and-windows-server-2012-automatic-memory-dump.aspx

If the problem you are trying to debug doesn’t crash your computer, you have a little more reading to do. https://support.microsoft.com/kb/969028 There are several tools including a registry entry for CTRL+Scroll and a PS utility who’s name I love: NotMyFault.exe

But wait! It gets better!


The hypervisor checkpoint process. Just hit the pause button on your VM and viola. Browse the datastore and download the .vmss file. VMware has kindly written a Windows version of it’s application to handle the conversion https://labs.vmware.com/flings/vmss2core To convert this .vmss file to a windbg memory dump file just run this command

vmss2core.exe -W C:\pathtodmp\vm_suspend.vmss

You can also perform this same process using a snapshot instead. This can be an even better option to avoid downtime if your guest is still mostly working.

Now What?


Well, this is the point where I call in the experts. I generally do this to ship the file off for analysis by the developers of suspect code. As a teaser to some future posts, here are the ingredients we are going to have to collect:

The file we created is consumable by WinDBG http://msdn.microsoft.com/en-us/windows/hardware/hh852365.aspx

http://support.microsoft.com/kb/311503: Symbols help map out the functions

Commands for analysis in Windbg: http://msdn.microsoft.com/en-us/library/windows/hardware/ff564043(v=vs.85).aspx

 
Leave a comment

Posted by on February 7, 2014 in Virtual

 

Tags: ,

 
Follow

Get every new post delivered to your Inbox.

Join 151 other followers