In this article, we look at how to use an impactful and systematic questioning technique to quickly eliminate wrong leads and zoom into the most probable root causes.
I used the fishbone diagram categories as a framework for technical equipment failures, however, there are various dimensions one can use depending on what sector one is fortunate to come across (problem-solving is after all is a fantastic addiction to have as there is no shortage in problems).
The technical equipment failure uses the categories derived from the Toyota Manufacturing System.
( People, Material, Machine, Methods, and Environment).
I prefer using mindmaps instead of an actual fishbone diagram as it is somewhat restrictive if one needs to represent it electronically. There are numerous mind mapping applications that allow you to remotely share ideas. A favorite is mindmeister.com which has great online sharing features and the ability to attach files. Mindly is a mobile application that also has good shareability. The above figure was generated with GitMind
Let's explore the topics.
1. Material
Material is usually the first suspect when it comes to mechanical failures. Questions to ask when eliminating this topic are:
1.1 Is the failure mode being experienced on other parts produced in the same batch?
If it is the first time the failure manifests, check the same component in other machines.
If possible, track other external users who use the same equipment and that was produced with the same batch. If the failure is localized, chances are the failure mode is not material-driven.
One has to take into consideration if the design is custom or mass-produced.
1.2 Has there been any re-design, upgrades, supplier at production, or last repair?
Most material failures are a result of changes that have not been tested well in operation.
Explore logistical issues such as storage and recent stock-outs that could have prompted changes in the usual supply route and emergency purchases for parts. Always be on the lookout for anamolies.
1.3 What are the characteristics of the failure surface?
There is no substitute for a good technical evaluation of mechanical parts.
2. Measurements
Measurements is very data-focused. Tune into all the measurable characteristics, both design and operational to zone in.
2.1 What are the process measurables?
Investigate if there was a discrepancy between the actual load and design capacity. Are there any tolerances that are outlined with limit values?
2.1 Was the process or equipment measuring correctly?
See if all gauges and sensors are functional. Is there any process calibration applicable to the failed equipment or the tools that work the process? Are there alarms in place that did not work? What measures are in place and is the trend observed?
3. Machine
Machine talks to the design and application of the equipment or component. Pay attention to anomalies and recent changes.
3.1 Was there a change in the application or is the equipment occasionally used outside its designed purpose?
3.2 How long has the machine been used at the operation?
A piece of new equipment creates ripples that affect the whole asset management system such as procurement, tools, training, licensing, procedure and logistics.
3.3 Have there been any updates, upgrades, and re-designs to the equipment?
This question explores material changes in the design configuration of the unit. Re-designs can solve old problems and create new ones altogether.
4. Methods
The best way to explore this topic is to observe the process that that failed equipment does in person. Compare with the existing working standard and note all deviations.
4.1 What are the governing procedures related to the equipment?
Note the deviations observed. Note also the sub-steps that are not catered for that operators may use their own discretion. Investigate if the deviations are of any consequence to the failure mode and what gaps are in the procedure. When was the procedure reviewed? Is there evidence that the procedures are proactively reviewed or are the procedures driven more by compliance rather than practicality?
5. Environment
The environment has many factors to consider, again, keep a keen eye for changes and differences.
Does location play a role in the failure mode?
Do a clustering analysis for the location. Separate the same model of machines that are location-assigned to observe differences in measurement, performance, and failure modes. If no difference in location data is evident, compare the failures with a completely different operation. If the failure mode persist in some locations, it most likely environmental issue. If the failure mode does not persist in other operations (independent asset management system), the cause may include issues with local operational behaviours. If the failure mode exist everywhere, move back to the Machine topic.
6. People
People should be the last item to check. This is simply because all systems and processes eventually point ot people and, at the same time, people's behavior is heavily influenced by technology, environment and many other factors in the system.
Before investigating the popular lack of training, ask high-level questions.
6.1 How long has the operator done this job?
Ensure that there were no changes to the equipment and no new process was introduced?
6.2 What is the general health and psychological state of the operator?
Explore sub-topics such as what time did the failure happen? Was it the start or end of the shift? What factors could indicate to exhaustion such as last time leave was taken and so on. Was the operating session behind on the production schedule?
6.3 Was the task performed with the right level of autonomy?
Is the task usually done with the same number of people, resources, information, and supervision?
Was it done on the normal schedule or overtime?
6.4 Are there any performance measurements on the operator or the artisan who works on the equipment?
Sudden changes in performance are early indicators of future deviations. Analyze data to see if there are any changes in the trends thereof.
The principle of asking eliminating questions is to quickly identify anomalies that can be verified by a quick preliminary investigation. This sets the tone and direction of the full investigation. All anomalies should be summarised in the closing root cause investigation report and to be proactively used in stopping current operational situations that mirror the failure mode.