Sometimes tech support is just lucky
As highly skilled technology professionals there are a few things that annoy us. One thing that annoys us is when things don't go to plan. We pride ourselves on our technical knowledge and ability. When it doesn't go as planned that can be frustrating. The second thing that annoys us is having to engage vendor support. Is it because vendor support is horrible and we don't want to deal with them? Well I'm not going to get into that; yes some are better than others. The biggest difficulty with having to take an issue to support is the time involved getting the problem resolved and getting the solution in play to deliver value. The third thing that is annoying about getting involved with vendor support? When they come back with an answer that has no vendor specific documentation backing it up......and it's CORRECT!!!
Let me set the stage for you a little bit. The project was to get a new Windows 10 image ready for a virtual desktop initiative. The current Windows 7 image is slated for replacement and Windows 10 needed to be deployed so that application testers could begin their validation. We build out the Windows 10 image, install the basic apps (Office, Adobe Reader, etc), patch it, and off to the races. Users begin testing and we make updates according to that testing. Everything seems fine. Then we get feedback that we never like to get. It's a feature that is so common that you just assume it is working. Users are unable to reconnect to their virtual desktop if they get disconnected.
So off we go on our standard troubleshooting. We revert to each snapshot of the VM to see where the problem starts occurring. This is a lengthy process because we have been making numerous updates based on feedback. We get back to the original VM and the problem continues to persist. This isn't looking good. We attempted various policy changes as well, but nothing begins to help. Given the timeline we have to engage vendor support quickly.
The initial interactions with support are not promising. Is this surprising? Not really. Between the customer and myself we have a combined 30 years of experience with the product. Also given the fundamental nature of the issue, we figured at best we were looking at a private hotfix which is never a short turnaround.
Thankfully we were able to quickly convince first level support to escalate the case. Once we got to escalation support we performed more intensive logging. Unfortunately, with all the logging and all the investigation, nothing was showing out of the ordinary. According to everything, all was working as designed. GRRRRR!!!!
So one afternoon an email comes from support that says 'would you look at Windows and see if a particular service is started?' The service was the 'Device Association Service.' I confirm that the service is not started and ask the reasoning for the question. The escalation engineer replies that they had found a blog post where a similar issue had happened with Windows 10 in a different virtual desktop deployment. Now normally this type of response bothers me. Not being one to take blog posts as sufficient answers from vendor support, the reaction is normally to request a very detailed explanation as to why I should trust a third party blog and why vendor support is not able to provide a solution. However, given the track of the case where nothing is reported as problematic, what's the worst that could happen?
So update the image to make sure that the service is set to automatic and starting. The image gets pushed out and before I even attempt to let the customer know, I am going to test it out. Wouldn't you know it works?! I inform the customer that it appears to be working but to have more end users test to increase the sample size. Not much longer after that I get the all clear. HALLELUJAH!!!
So what does this tell us? The phrase 'sometimes it is better to be lucky than good' is a true statement. Imagine how long it might have taken if the only thing acted upon were evidence we had? All evidence showed it is working. In any complex system, there can be unanticipated interactions that may not be well documented or documented at all. Sometimes you have to step out and try something. Particularly if you can prevent a change from having further consequences in the environment, why not experiment and be able to resolve things quicker. By doing so we are able to provide value faster, which is a good thing, and learn more about our systems, which is a great thing.
So next time you have to engage vendor support, don't immediately groan (though I can't say you won't groan at some point). If they ask you to test something, unless that test has further consequences, give it a shot. If you have a well designed system, you should be able to segment changes so that the impact is minimal and can easily be reverted if it doesn't help.
Lastly, in this case, a shout out to Citrix support for sticking with an issue that showed no evidence that the problem was something they can resolve. We have all been involved with too many support cases where it seems like support is just trying to prove that the issue is not theirs. In this case they drove the problem to completion. Thanks!