One day I came to work after semi-sleepless night just to find out that some of my colleagues thought they were incapable of performing simple troubleshooting tasks and waited for me to resolve fairly trivial issues. Naturally, frustration spilled into a long letter on the subject – with some mroodling on psychology and phylosophy of debugging. Here are some extras…
The main philosophical thing in troubleshooting is to remember that with these stupid metal things we call computers there are no mysteries: everything can be explained (eventually ), it’s just matter of time. Under the hood, these things are pretty stupid: on a given conductor fragment it’s either one (there is electric current) or zero (there is no current), that’s all there is to it. If there are differences in behavior then it means that there are differences in the binaries and therefore differences in the code. You just need to ask yourself what could be wrong and confirm or eliminate that possibility with whatever means you can think of (i.e., if you can’t use debugger – use printout statements, etc. Long time ago, I once had to make lights blink in different ways in different scenarios in order to identify which path the code took. Another time – had to make machine reboot on one path and had it stay online on another. These are just examples on to what extent you can take available equipment. After a while, after all false possibilities are eliminated, you will inevitably find the source of the problem (“Once you eliminate the impossible, whatever remains, no matter how improbable, must be the truth.” -Arthur Conan Doyle).
Applying all this to the current situation – if the issue is reproducible on the test bed only then that’s where you troubleshoot it. And make sure that you always can revert to “well-known” state: change things one at a time between tests; if new issues arise and you cannot explain them – remove your changes and test again to make sure that behavior reverts to previous controlled flow. Main thing is not to rush: take little steps but make sure that at any time you can both explain what’s happening and revert to previously established flow. If the code is throwing exception in specific place – make sure it’s always throwing it right there and nowhere else; don’t troubleshoot multiple issues – take them one at a time.
And you can add some spirituality to your approach too: those machines are actually alive and they have their own characters. They sense confidence and fear – don’t show them your doubts; instead, make them afraid of you – and they will give up sooner
This is not a job – this is mysterious adventure
You can leave a response, or trackback from your own site.