Last week has been a firefighting week at work (i.e. debugging and troubleshooting). I’m writing this article to share my experience with debugging a system configuration at work. In the end of article, I have placed some lessons learned points so hopefully someone can learn from it to improve their career.

Story

We have a Jenkins job that runs to test the application load. We run the test via a physical machine connected to Jenkins.

This week we realized that the load test jobs are continually failing. One of the devs has changed the configuration to use a different machine while he was on Jenkins.

He was highly confident that configuration he changed was how it suppose to be in the first place. Therefore the team was confused about why it is not working suddenly. So after debugging this issue for 4 days by different developers, we still didn’t get the job to work. 

I was not actually sure what is the correct configuration. Yes, I tend to be skeptical sometimes. It is then I thought to look back and see what was it like before the Job started to fail. How did it all fall apart?

The jobs started to fail after the configuration was changed. Thankfully, Jenkins has a console log of the jobs that were executed. From here we can also match the machine that was used before. So from here I managed to match the machine to the failed job and configuration history changes.

Furthermore, I thoroughly searched our dev conversations on Slack about why we switched to the machine it should have been. After pointing this out, the developer was realizing to believe his misunderstanding.  It is not easy to convince someone that something is not correct when they are over-confident and especially when they are senior and smarter. Therefore finding the facts and proof were needed to convince him.

What I also learned is that being over-confidence about something can mask you out from the actual problem.

The issue here was that he was trying to fix something that was not broken in the first place. But in the end, everything is fine now.

Lessons Learned

  • In debugging, it is good to be a bit skeptical. Even if someone says something is right don’t believe them because you have to. Ask questions or figure out by yourself.
  • When you are trying to fix something that used to work, but without any luck, then take a step back. See when it was working just before it started to fail – figure out the state of the system at that moment and map to what has been changed since. Something caused the issue.
  • If you find someone’s fault along the way and you want to correct them, make sure you have your facts and proof straight to avoid arguments. But make sure you tell about them in a humane way.