Define Measure Acheive. Repeat

Archive

Archive for the ‘Uncategorized’ Category

Do you measure your incident backlog?

May 5th, 2010

Incident backlog provides an informative KPI that you should consider adding to your reporting repertoire. The KPI should measure the number of incidents outstanding that have missed an SLO or SLA. The KPI should be trended over time and should either be stable or decreasing.

In the example chart below, we can see the backlog rising over the last three months.

Incident Backlog

In this example we see the backlog increase by 50% over 3 months and should be investigated. To determine how urgent the issue is the first thing to explore is to breakdown the  backlog by incident priority.

Open Incidents

It would be highly unlikely that a backlog with high priority issues would persist over a period of time and the chart above now shows that a majority of the back log are priority 3 and 4. This is fairly common and is often systemically ignored. However, it is worthwhile to examine further. 

The reasons for the rising trend could include:

- Second line support personnel are not closing tickets in your ITSM tool.

- Resourcing level may not reflect the current volume of tickets being received on the Service Desk.

- The organizations Change and/or Release practice is causing unscheduled spikes in incident volume

- Are their particular workgroups with particularly high backlogs that could indicate a bottleneck?

  • Share/Bookmark

Charles Cyna Uncategorized , , , , , , , , , , , , , ,

Reporting is the soufflé of the IT Service Desk.

April 21st, 2010

For those of you that are culinarily inclined you will know that a soufflé is made with a couple of basic ingredients; a cream sauce and egg whites, and yet the final dish remains elusive to the many that simply don’t pay attention to the details for prerequisite success. Oh, and such success is wonderful to observe - fluffiness contained within a towering cloud of caloric goodness - it is truly an elusive culinary accomplishment.

Figure 1 – The rare object d’art itself – a light fluffy soufflé produced at L’Atelier by Joel Rubuchon.

Figure 1 - The rare object d’art itself - a light fluffy soufflé produced at L’Atelier by Joel Rubuchon.

Like the fracturable soufflé, good service desk reports are easy to order but more difficult to enjoy. 

Although the ingredients are simple, the execution is questionable and the ultimate result is often unsatisfying. The particular reports I am thinking of are not operational in nature (the wham bam thank you ma’am of reports). The ones, I am thinking of are tactical in nature. These require a little more finesse, they are the thinking persons’ report, a tactical view of service desk performance that can enable service improvement and actually inform decision making. In other words, reports that provide information that is ‘actionable’. How delicious!

Anyhow, so many of these failed attempts leave me wanting more. Inadequate execution reduces them to merely visually appealing, useless and perhaps even inconsequential. 

So perhaps we should examine the ingredients and execution that can turn a miserable meaningless humble report into something worth consuming.

Ingredient 1: Consistency

Consistency is one of the few things that matter when generating decision support material. Everyone should be saying the same thing when answering the telephone, asking the same questions, and documenting the information received in the same way.

Ingredient 2: Track the right stuff!

Set yourself up for success and build a support model. Outside of the obvious items like impact, customer information etc. there are three things that the service desk needs to capture:

#1 –what was the customers’ perception of the failure (i.e. the end to end service),
#2 - what was the underlying IT reason for the failure (i.e. the provider service) and,
#3 - finally what infrastructure item was involved in the failure (i.e. the component category).

See the figure below to see a breakdown of the critical criteria that should be captured in the incident.

figure-2-the-essential-elements-fo-information-capture-for-incident

Figure 2 - The essential elements of information capture for an Incident.

These items enable simple and easy information gathering from the customer plus makes escalation of the issue through the IT organization easier to manage.

Ingredient 3: Focus on the WHAT, the WHY and the ACTION.

Generating reporting for reporting sake doesn’t work. It sounds obvious but many of us get in the habit of reviewing the same reports every month and then do nothing with the information.

If this sounds like you, STOP! 

Ask yourself three things when looking at a report:

Do I care about what this report is telling me?

If your answer is NO, move on and deal with something more important.

If your answer is YES, then you need to figure out WHY the information in the report is occurring.

Once the WHY has been determined, implement a performance tweak or involve the relevant stakeholder group and share the information with them as part of the ACTION.

In my next blog (on Monday), I  will explore a real life example of how this process works.

  • Share/Bookmark

Charles Cyna Uncategorized , , , , , , , , , ,

Anatomy of a KPI – Mean Time to Restore Service (MTRS)

March 3rd, 2010

Mean Time to Restore Service is an important KPI that most help desks measure (or at least should). MTRS tells us about the average customer experience a user has when a service interruption is identified.

To calculate the MTRS you take the total amount of time of open incidents divided by the total number of incidents logged in a given time period (normally a month). I would recommend that the KPI only show the top 2 tiers of classification as performance on lower classification would probably reduce the usefulness of the KPI based on how most organization service their lower priority issues.

Now the usefulness of KPI is just that, ‘an indicator’. If it is going in the wrong direction (i.e. up) there is no reason to panic – the most important thing is to identify whether there really is an issue and if so then be in a position to address it as soon as possible.

practice-indicators1

The first thing to look at in regard to MTRS is to see whether Incident volume has spiked. When incident volume changes unexpectedly, the help desk doesn’t have a chance to change resourcing so the average time to restore service will often rise.

analyze-your-mtrs1

The second thing Read more…

  • Share/Bookmark

Charles Cyna Uncategorized , , , , , , , , , ,

Why Service Desk Managers have the most Challenging Position in IT

February 26th, 2010

The job responsibility itself is demanding.  Managing the group that provides support to end user’s IT related service interruptions, as well as being the face of IT.

Below are some of the key reasons that make fulfilling this role challenging.

1.  Right Staffing!  How do you staff a team effectively when the volume of communications can fluctuate widely?  And the cause of fluctuations are beyond the Service Desk Manager’s control (i.e. outage, change gone bad, new rollout, etc.)?

2.  Higher Staff Turnover!  Service desk staff, compared to other IT staff, tend to turnover at a fast rate.  The primary reasons include burnout from constantly dealing with frustrated end users, as well as, staff using the service desk as a stepping stone to other IT areas.

3.  Unrealistic End User Expectations!  Despite the magnitude and complexity of technology, the end users expect, that regardless of the nature of their communication / situation, the service desk analyst should be able to resolve their issue immediately. Read more…

  • Share/Bookmark

swaxler Uncategorized , ,

IT Service Delivery is a Journey not a Destination

December 8th, 2009

Organizations increasingly recognize that proven frameworks are key to improvement in IT Service Delivery and aligning IT operations to the needs of the business. Often the recognition that ‘things need to get better’ manifests itself through the purchasing of a new ITSM tool and/or the hiring of a consultant to implement some basic operational processes such as Incident, Problem and Change Management.  Meetings are held; documents are drafted, re-drafted and re-drafted again until everyone is happy with the outcome. The new tool gets implemented, the consultant leaves and those process documents that everyone spent weeks or months building get filed away in some electronic repository often never to be seen again.

If this scenario rings true, you’re not alone. Many IT departments treat Service Management or ITIL process implementation as a project and not a journey. Like losing weight, if you do not have a plan for ongoing success then the weight will come back and all the gains you made with hard work are simply lost.

Conveniently, the answer to help us through this scenario is given new prominence in ITIL’s latest incarnation of version 3. CSI or Continual Service Improvement, which was merely implied in previous ITIL frameworks, is now thrust into prominence and has its own book and its fair share of the limelight. CSI provides the process that drives the value out of your other ITIL practices. The Incident Management process in itself does not generate value – certainly, it would tell you something like how incidents can be escalated to reduce impact to the business. This in itself provides cost savings and improved productivity but it does not speak to the elements that drive a business; it does not tell us how many incidents we escalated last month, what the areas of improvement are and how we will do better next month. The incident management process assumes that everything remains constant – of course, the business changes.

CSI provides the wrapper to the ITIL processes you have in place. It enables you to baseline where you are today, where you need to be and to drive a path through to the goal. Getting a baseline for your existing performance is key, because it is difficult to get to your next destination if you don’t know where you are today. The good news is that there is a range of knowledge and resources that can be tapped inexpensively to help you through this process. For example, The ITSM Coach from ThinkITSM gives you the ability to assess your end user satisfaction and help desk maturity for free, highlighting the area’s most in need of addressing. Often we know much of this information informally but to have objective data enables the change process and gets disparate groups on board to adopt positive changes in how work is done.

CSI and Quality improvement has revolutionized manufacturing and have given automobile companies such as Toyota and Honda a fundamental competitive advantage that they translated into market share gains and profitability. It is now time for IT to embrace quality improvement processes and truly drive business value from IT Service Delivery.

  • Share/Bookmark

Charles Cyna Uncategorized , , , , , , , , ,

Keeping IT Real– risks of low maturity in incident, problem & change.

October 22nd, 2009

Risks for Organizations with Maturity less then Level 3

There are risks for organizations that operate in the Level 1-2 maturity range. If there is a plan to develop and mature the practice(s) to level 3 or higher the risks are somewhat mitigated. However, while at a level 0-2 some of the key risks to consider are:

Service Desk and Incident Management

  • Perception of IT as a whole is lowered and considered not customer focused
  • There is a danger of negatively impacting external customers and their perception of the business
  • There are costs (financial, reputational) when the business is interrupted while users and major services are down
  • There is an inefficient use of skilled IT technical resources
  • There is little incident reporting data because most of it is inaccurate and consequently little basis for improvement
  • Many of the same incidents are resolved repeatedly (re-inventing the wheel)
  • There will be a risk of high Staff burnout and high turnover of support staff

Change Management

  • The infrastructure is very unstable and has long term performance issues
  • There are frequent outages following unauthorized changes
  • Project implementations are delayed because changes cannot be coordinated
  • There are many failed changes that cause incidents
  • The requirement for changes outstrips the capacity to implement them
  • Support for third party applications expires due to inability to stay current

Problem Management

  • Common incidents are resolved repeatedly, lowering customer satisfaction and inflating support costs unnecessarily
  • Re-inventing the wheel when sporadic incidents occur over longer periods of time
  • Frequent interruptions or degradation of service
  • It is difficult to introduce new services when unknown errors may jeopardize the implementation.
  • The change practice gets bogged down due to higher rates of failed changes
  • Due to a lack of work around information the Service Desk regresses to a call dispatch function.
  • Share/Bookmark

Maria Ritchie Uncategorized , , , , , , ,

ITIL maturity - where do you want to be and why.

October 6th, 2009

As a rule, a practice is not considered mature unless it is at a level 3 or higher. This means that the single practice is mature enough to be working as designed and is being used by all relevant stakeholders. It also means that the data it is generating is mature and can be trusted for decision-making.

At Level 3 the practice has control points that provide management indication when and if intervention is required. The practice is end-to- end and collaboration across departments has been optimized. For many organizations reaching Level 3 seems to be the end of the journey.

However, the value proposition of an integrated ITSM practice approach is that when practices reach Level 4 they begin to interact with each other. This provides the ability to share common data and provide insight that is not available at Level 3.

By working cooperatively the practices become more efficient and effective and whole becomes greater than the sum of the parts. Level 4 is the desired target state for the practices in scope for most organizations.

A maturity of Level 5 is often seen in single departments but rarely across all departments. The effort required to get to this level of maturity is high and costly.

Many organizations do not have the ability or desire to reach Level 5. That is ok and not all organizations will have business need for this level of maturity.

  • Share/Bookmark

Maria Ritchie Uncategorized , , , , ,

Service Desk Measurement - Be careful what you wish for!

September 29th, 2009

Mini-series on measurements that can come back to bite you…..

The trouble with performance metrics is that they can actually encourage inefficiency, de-motivate resources and result in misinformed management decisions if not thoughtfully designed and carefully monitored!

Take for example an organization that has a published target for their Service Desk’s ability to resolve inquiries at the first point of contact (FPOC rate).

This measure is expected to positively impact two important facets of the service desk – customer satisfaction and cost.

Customer Satisfaction - The premise behind this measure is that customers’ satisfaction is positively impacted by quick resolutions.  After all, who doesn’t like to have their questions/issues resolved quickly and without being transferred or bounced around an organization?

Cost  - Generally organizations have tiered levels of support with resource costs increasing at each level. It makes sense that the more incidents/inquiries that can be resolved at FPOC, without handoffs to more expensive tiers of technical support, the more money that can be saved.

Also, organizations often use this measure to compare themselves to other service desk organizations and to communicate their service desk’s value proposition.

While on the surface this seems simple, practical and a no-brainer there are potential “gotchas”. Consider the following simple scenario.

An organization’s service desk proudly promotes an FPOC rate of 80% or better consistently month-over- month.  Remarkably the Desk has been able to maintain this for 6 straight months! Agent’s performance reports show that individually they are meeting or exceeding their assigned FPOC targets.  Comparisons to other service desks FPOC are favourable and management assumes this is a very positive measure – right?

Maybe not, a High FPOC rate may actually be signalling a lot of repetitive incidents.  Agents at the desk can get very skilled and efficient at resolving the same issues over and over again and the hidden cost can be easily overlooked.

Points to Ponder

While initially customers will be pleased with a timely restoration of the service, their satisfaction will drop quickly if they keep having the same problems over and over again.

Resolving the incident quickly with lower cost resources is, on the surface, efficient; however, if that incident is happening repeatedly what is the total cost of the repetitive incident?

From a value perspective, which desk would you choose?  One that has an 80% FPOC but is continuously resolving the same type of incidents or one that has a 50% FPOC but continuously reviews incidents trends and permanently removes repetitive incidents from their environment?

Simple Tips

1. Don’t look at your FPOC in isolation.  Always look for correlation to satisfaction trends and hidden costs.  

2. Seek out and destroy repetitive incidents. Analyze the types of incidents that are being resolved FPOC and identify repetitive incidents for permanent resolution.

3. Use your incident data to “expose the cost” of repetitive incidents and be sure to report the cost savings/avoidance achieved by removing the incidents right along beside your new, possibly lower FPOC number.

4. Your service desk agents are a great source of information on your top repetitive incidents – tap into their knowledge & experience.  Reward them for identifying repetitive incidents and other improvement opportunities.

One Final Thought with a Bit of a New Twist on an Old Measure

Many desks set high FPOC rates but then do not give the agents the tools or permissions they need to achieve it!

If you have a lower than desired FPOC resolution rate, you may want to consider measuring “designed for FPOC”.  You may be surprised to find that your service desk is actually not able to achieve an acceptable FPOC because of the way your support delivery is designed!

Too often the service desk is not provided with the necessary information (troubleshooting scripts, configuration information, etc) or the necessary permissions to actually resolve incidents at FPOC.

This very simple measure, “FPOC by Design”, reflects an organization that has designed support specifically for each product/service, and is monitoring how well its service desk is performing against achievable targets.

Low FPOC on a product that has been designed for FPOC provides an important area to analyze agent performance and training opportunities.  Conversely, high incident areas with low FPOC for products/services NOT designed for FPOC make be a great place to review the support model and look for opportunities to empower your service desk further.

Measurement is definitely an important part of continuously improving your Service Desk and driving up value and customer satisfaction.  Relentless review of relevant, insightful metrics will keep you at the forefront. Next time let’s look at Mean Time to Restore Service (MTRS)………

Question
Have you come across any measures that have done you more harm than good?  Do you have any really GOOD or really BAD measurement stories to share? I’d love to hear them……

  • Share/Bookmark

Maria Ritchie Uncategorized , , , , , ,

If at first you don’t succeed…improving incident assignment accuracy.

September 17th, 2009

One of the biggest incident management challenges for IT organizations today is ensuring that when an incident needs to be assigned to another support team, that this functional escalation is performed accurately. Assignments may be necessary if the help desk is unable to resolve at first point of contact, or if a Tier 2 technician detects a fault or degradation that they are unable to resolve. It is important to be able to measure assignment accuracy as it is one area where process improvements can really pay off, both in terms of improved resolution times and overall customer satisfaction.

One key element of a successful incident support model is the emphasis on the role of the help desk in functional and hierarchical escalations. There is often a tendency for Tier 2 support teams to want to assign incidents to their functional peer groups directly. This is not to be confused with assignment of an incident to individuals within their team, which is anticipated in some situations.

An example of inaccurate assignment would be when a server hosting team determines that the incident was assigned incorrectly to their group, but on inspection of the incident realize that the network team will likely be able to restore service, and they assign the incident directly to networking. This can often lead to ping pong assignments, where Tier 2 groups pass the hot potato until the incident finally reaches the right support team. And, if your customers receive notification on assignments, a useful and common mechanism to keep them in the loop on resolution progress, they will likely become frustrated and form a negative opinion about the effectiveness of the IT organization.

Why does this happen?

The core reason is that Tier 2 support teams are often composed of specialized, focused resources who have knowledge relative to their functional area. They are typically not equipped with knowledge that would ensure assignment accuracy, and as a result cannot be held accountable for inaccurate assignments. However, there is a perception that sending an incident back to Tier 1 is a step backwards. What Tier 2 may not realize is that if incidents are not returned to the desk, the following direct and indirect impacts can occur:

1) The desk may not be aware it is assigning inaccurately, and long term effects can lead to a more challenging task of changing this learned behaviour when this issue is tackled.

2) The incident may not be correctly re-categorized or classified on reassignment. This can affect reporting and may make it appear that one Tier 2 group is resolving incidents outside of their area of expertise. This makes incident trending for problem management identification challenging to say the least!

3) Escalations due to delayed resolutions may not be initiated unless the incidents are being tracked in real-time. While some enabling technologies perform this activity, they often depend on accurate incident categorization which may not be the case for mis-assigned incidents.

4) Service level treatments may be incorrectly applied, as the initial assignment from Tier 1 may result in the incident being monitored against the wrong service level than what a re-categorized incident should be.

It is not uncommon for the cluster in an IT organization to resist the model of reassignment back to Tier 1. That is expected, but this challenge can be overcome. Aside from training and communicating the value of a Tier 1 reassignment strategy, you may want to employ a top 10 mis-assignment strategy. Record the number of assignments and compare these two an average assignment count that you feel is representative of an accurately assigned incident. If your enabling technology assigns an incident to the help desk on initial save, or if your organization sends incidents back to Tier 1 so agents can confirm service restoration with the customer, you will need to factor this into your calculation. You should also factor into your analysis the situation where Tier 2 resources detect an outage prior to customers feeling a service impact. Then, report on the frequency of Tier 2 to Tier 2 assignments, sorted as a top 10 list by support group, where incidents were reassigned more times than your acceptable threshold. Report on this internally to all the support groups on a monthly basis until your top 10 list represents less than 5% of the total incidents assigned. This will provide both visibility and a motivation for support group queue managers to monitor and address inaccurate assignment activities. If the problem persists, you may choose to report on the specific resources, by name, in this ranking. This may encourage those who are resistant to change to avoid having their name in lights.

After introducing this first step, you should begin to see an increase in assignments back to the desk. This can be troubling at first but this is expected. Ensure that you have established a quality review procedure to address these re-assignments. This can be accomplished through additional training or the updating/introduction of a knowledge base, intended to ensure resources are leveraging assignment diagnostics. This may also result in the help desk requesting more detailed diagnostic information from Tier 2 groups or service owners as means to enhance assignment accuracy. Include a help desk assignment accuracy statistic after you have implemented the statistics at Tier 2. This will encourage the desk to reach out after Tier 2 to Tier 2 assignments begin to drop. The rationale for this staged approach is that both measures are interdependent. Improving assignment accuracy require a feedback mechanism to the desk (re-assign to Tier 1), and Tier 1 accuracy improvements require enhanced information from Tier 2 (diagnostic logic).

If this top 10 approach is communicated and sold with a focus on improvement of IT for your customer, this should not be viewed as a negative. Instead, this may foster some competitive spirit between your IT groups. Other metrics can be added over time. You would be surprised how effective sharing statistics can be on changing behaviour.

Assignment accuracy is just one outcome of a successful incident support model. I will provide some additional benefits to support modeling in future posts…

  • Share/Bookmark

Michael Oas Uncategorized , , , , , , ,

ITIL Case Study – If you want to Lose Weight, Get on the Scale.

August 25th, 2009

Like a successful weight reduction plan, service desk improvements need to be defined based on a good knowledge of where you are and how far you want/need to go. Taking a little time at the beginning of your improvement planning to baseline your service desk practice and inform your improvement priorities will provide you with a surprisingly valuable set of information!

This blog is a follow on to the “ITIL – Not a Cure for the Common Cold!” blog where I provided an overview of a large government’s service management journey and outlined their 5-Step Roadmap to improvement.  This article will focus on getting started, using the case study organization as a guide….

Where do you start with no money, no credibility and no time?

The answer is not really all that difficult.  You have to start with a solid understanding of where you are and knowledge of where you want & need to be.  The art is then to pick the combination of outcomes and activities that will generate momentum, produce useful improvements and build credibility. Read more…

  • Share/Bookmark

Maria Ritchie Uncategorized , , , , , , ,