Service Operation

Incident Management

Predecessors/Before You Begin

ITIL makes a big distinction in vocabulary between an incident and a problem. Please keep in mind these two are different: a problem is the unknown, underlying cause of one or more incidents.

Incident Management

Incident management is arguably the most important of any ITIL process. Service Strategy, Change Management, and the other processes are all needed, but incident management is the one process that (in my opinion) IT cannot live without. Incident management is the tracking of incidents, which are breaches or potential breaches to service level agreements. In plain English, incident management is about tracking stuff that is currently broken. Without incident management, IT has no way to ensure that users' services get restored when they break.

Virtually all IT organizations have some sort of ticket-tracking system, to keep track of contacts and to help IT staff hand off issues. Virtually all IT organizations also have some form of Service Desk as well, that receives calls, records them, and resolves incidents or escalates them.

Incident management tools ideally track service level targets--for example, when a user is having difficulty checking their e-mail, then the time it takes IT to restore service is tracked against the e-mail service level agreement.

ITIL recommends creating "incident models" for typical issues. For example, at Wake Forest University, our Service Desk often receives laptops that need to be repaired. ITIL recommends creating an incident model to describe the particular process of repairing a laptop.

ITIL also calls out "major incidents" as a particular form of important incident that may require a Major Incident team to resolve. Major incidents are different than problems--major incidents are still trying to get users up and running as quickly as possible (where a problem would be trying to find the underlying cause). Major incidents in an organization could be kicked off by a system-generated page, or whenever a certain number of users might be affected.

University-specific risks

Incident management tools allow you to track a lot of information about your users. It can be difficult to map users' department information (e.g. a professor may be teaching in two different departments) and status information (e.g. a student may also be a staff member).

Also, ITIL calls out the option for tracking "VIPs" and handling them separately--if your University marks certain people as VIPs, e.g. the Chancellor, consider this a policy decision. Publicize the policy internally and externally so IT staff understand the purpose of VIP status and so users do not think there is secret favoritism going on within IT.

Find the user groups for your incident management tool and ask your salesperson to put you in touch with other higher education institutions that use that tool.

Videos, Photos, and Presentations

Problem Management

Predecessors/Before You Begin

ITIL defines the word "problem" very specifically. Learning about problem management should go hand in hand with learning about incident management.

Problem Management

Problem Management is the process of investigating the underlying causes of incidents. Problem management relies on three key terms:

Problem
The unknown, underlying cause of one or more incidents.
Error
The known, underlying cause of one or more incidents.
Known Error
A problem, plus its root cause and a workaround.

A problem becomes an error when it's understood, and a "known error" when there is also a workaround. What does "understood" mean? One itSMFusion presentation recommended that understood means "when you can no longer ask 'why?'"

How is problem management different from incident management? Well, for one the motivations are different. Incident management wants to resolve incidents quickly so users can do their work. Problem management wants to address problems thoroughly so the problems never occurs again.

Problem management is also much more time-intensive than incident management. For example, if a user can't read the text on a web site because they are running Netscape 1.0, incident management might be able to resolve the incident by asking the user to use another browser. Problem management, on the other hand, would have to spend a lot of time debugging Netscape 1.0 to fix the issue. For this reason, prioritization is very important in problem management.

Problems need to be prioritized, then selectively investigated, and then requests for change raised to change management for the errors to be corrected.

University-specific risks

Sometimes people "just want it fixed" and are not interested in standing up a process to perform investigation. However, University users, due to their background and varied computing environments, may raise widely different incidents, resulting in a large number of potential problems to investigate and an acute need for problem prioritization.

Problem investigation should ideally be justified by data showing the cost of the problem not being resolved. However, actual cost vs perceived cost can be difficult to account.

Contacts and Resources

Access Management

Predecessors/Before You Begin

ITIL defines Access Management differently from IT Security Management. IT Security Management occurs in Service Design, creates policies, and informs Service Level Management on the access configuration for each service.

Access Management sits in Service Operation, and manages access as defined by IT Security Management.

"Identity management" is very closely related to ITIL's access management. Identity management can be an especially big subject at Universities, tied to both IT Security Management and access management. Additionally identity management does not necessarily address the rights granted to individuals (authorization), which is the core of ITIL's access management process.

Access Management

Access management maps rights to identities. "Identities" are the people in your organization, being able to prove they are who they say they are. "Rights" are the abilities that identities have on various systems, e.g. the right to create new data or the right to delete data.

Access management works closely with Request Fulfillment and/or Incident Management to receive access requests from the Service Desk. These requests follow a standard process:

  • Request access
  • Verify the access
  • Provide the rights

Additionally, access management is responsible for "access monitoring and control." Access management should ensure that the access provided continues to be appropriate, for example watching out for potential conflicts of interest.

Access management should work with Human Resources to coordinate access removal and suspension as people change jobs, are put on leave, or leave the organization.

University-specific risks

Universities may have a large, difficult-to-define population including students, faculty, staff, alumni, parents, and visitors. In this context getting consistent identity information is in itself very difficult.

Additionally, people are more likely to play multiple roles at a University than in a corporation. For example, a student may also be a teaching assistant, an alumnus, and a parent.

Videos, Photos, and Presentations

Google Tech Talk: Introduction to Identity Management

Further Information

Event Management

Predecessors/Before You Begin

Your organization most likely already has systems generating events, such as syslog messages and network traps. Event management is the idea that the events from all these systems could be put together, correlated, and then incidents or other records generated as appropriate.

Event Management

Event management tracks automatically-generated events in your environment that are "significant for the management of the IT Infrastructure" or otherwise relevant for services being delivered. It's up to your institution to decide what's "significant." Event management then filters and classifies the events into one of three categories:

  • Informational: FYI, e.g. "User X just logged into system Y"
  • Warning: Something bad may happen, e.g. "Disk space is at 70% on the file server"
  • Exception: Something bad happened, e.g. "Disk space is full on the file server"

Ideally event management performs correlation--the buzz-word here is a "correlation engine." This correlation should help people understand that event #2 (service is back up) is related to to event #1 (service went down). This correlation could also work between systems, e.g. a server's database and applications are generating exceptions because the underlying server is generating exceptions.

Event management can call incident management: exceptions and warnings could then generate incidents for real people to review.

Conversely, other processes can query event management: incident management and problem management could also search the event history for more information about why something broke.

Event management is ideally executed by an automated system. People may design and query the event management tool, but no person reviews every event that is generated.

Event management cannot stand on its own--it needs to work closely together with Service Design to ensure systems are designed to generate appropriate event, and it needs to work with Incident Management and perhaps even Change Management to ensure that appropriate events are escalated.

University-specific risks

Vendors want to sell event management tools. These tools need to fit well within your ITSM suite.

Make sure that any event management tool you support can receive events from the systems you already use. An event management tool should be able to gather events from your other systems tracking events.

Request Fulfillment

Predecessors/Before You Begin

"Request fulfillment" is an ITIL v3 concept. In ITIL v2, it was considered part of Incident Management. However now the two are separate: incidents are for things that are breaking or about to break, and service requests are for new things.

Request fulfillment is tightly coupled to service catalog management--request fulfillment is when services are actually ordered from the service catalog.

Request Fulfillment

Request fulfillment is the process of providing "normal" services to users, such as the creation of a listserv or a Blackboard course. Request fulfillment calls these requests "service requests" because they are requests for a particular existing service. Service requests should be predictable, and popular service requests can be optimized and automated and "request models" defined for them.

Request fulfillment includes asking questions, such as "How much email quota do I have?"

Vendors try to sell request fulfillment tools, usually bundled as part of a service catalog solution. Request fulfillment and incident management are particularly linked (e.g. when a user is calling you can't be sure whether they're calling about something that's broken or something they want to order), so it is important to consider the integration needs between your request fulfillment tool and your incident management tool.

University-specific risks

Request fulfillment often is associated with charging--if you provide a "desktop printer request" service, then people will want to order desktop printers and someone will need to pay for them. However, based on the charging model for a University IT department, it may be more difficult for IT to fund the "goods" supporting a service request. Ironically by doing a good job of request fulfillment an IT department's costs can go up.