IT Service Management

10 min read Published on 07/01/2026

Problem Management

Identify root causes and prevent incident recurrence

In brief

Problem management identifies the root causes of recurring incidents. KaliaOps provides a dedicated workflow with investigation phases, workaround documentation, Known Error Database (KEDB) management, and permanent solution implementation tracking.

Overview

A problem is the root cause of one or more incidents.

Incident vs problem

Incident: Symptom, visible impact, quick fix
Problem: Root cause, underlying issue, permanent fix

Problem management goal

According to ITIL:

Identify the root cause of incidents
Document workarounds to speed up future resolution
Implement permanent solutions to prevent recurrence

Example

Incident: "Application X crashed at 10am"
Problem: "Memory leak in module Y causing crashes under load"
Workaround: Restart service daily
Solution: Deploy fix in next release

Creating a problem

Access the Problems module

Menu ITSM → Problems.

Click "New problem"

Open the creation form.

Describe the problem

Fill in:

Title: Clear summary of the underlying issue
Description: Context, symptoms, observations
Priority: Based on impact and frequency

Link related incidents

Associate incidents that revealed this problem.

Submit

The problem is created with "New" status.

Tip: Create a problem when you see recurring incidents on the same item, or an incident with a workaround that doesn't address the root cause.

Workflow and statuses

Available statuses

Status	Description
New	Problem identified, not yet investigated
Assigned	Assigned for investigation
Under Investigation	Root cause analysis in progress
Root Cause Identified	Cause found, planning solution
Known Error	Documented in KEDB, workaround available
Resolved	Permanent solution implemented
Closed	Problem completed and validated

Standard workflow

NEW → ASSIGNED → UNDER_INVESTIGATION → ROOT_CAUSE_IDENTIFIED
                                              ↓
                                        KNOWN_ERROR (with workaround)
                                              ↓
                                          RESOLVED → CLOSED

Root cause analysis (RCA)

Root Cause Analysis identifies why incidents occurred.

RCA methods

5 Whys: Ask "why?" repeatedly until you reach the root cause
Fishbone diagram: Categorize potential causes
Timeline analysis: Trace events leading to the incident
Log analysis: Review system logs for evidence

Documenting RCA

In KaliaOps, document:

Root cause description: Clear explanation of the underlying issue
Evidence: Logs, screenshots, test results
Contributing factors: Conditions that enabled the failure

Example

Root cause: Database connection pool exhaustion

Evidence:
- Connection count reached max (100) at 09:58
- First errors logged at 09:59
- Pool configured 5 years ago for lower load

Contributing factors:
- Traffic increased 300% in last year
- No monitoring on connection pool

Tip: A good root cause is specific, evidence-based, and actionable. "Human error" is rarely a good root cause - dig deeper.

Documenting workarounds

A workaround is a temporary solution that restores service without fixing the root cause.

Why document workarounds?

Faster resolution: Technicians can apply known fix immediately
Consistency: Everyone uses the same approach
Service continuity: Users get service restored quickly

Good workaround documentation

Include:

Steps: Clear, numbered instructions
Prerequisites: Required access, tools
Side effects: Any limitations or impacts
Duration: How long does the fix last?

Example

Workaround: Restart the application service

Steps:
1. Connect to server SRV-APP-01
2. Run: systemctl restart app-service
3. Verify service is running: systemctl status app-service
4. Monitor for 5 minutes

Side effects:
- 30 seconds of downtime during restart
- Active sessions are terminated

Duration: Fixes for ~24 hours until memory leak recurs

Known Errors (KEDB)

The Known Error Database (KEDB) records problems with identified root causes.

What is a Known Error?

A Known Error is:

A problem with an identified root cause
A documented workaround
Awaiting permanent solution (or no fix planned)

Benefits of KEDB

Faster incident resolution: Technicians search KEDB first
Knowledge sharing: Expertise is documented
Onboarding: New team members learn common issues

Marking as Known Error

Complete root cause analysis
Document workaround
Change status to "Known Error"
The problem is now searchable in KEDB

Using KEDB

When handling an incident:

Search KEDB for matching symptoms
If found, apply documented workaround
Link incident to the known error

Implementing permanent solutions

The permanent solution eliminates the root cause.

Solution documentation

Record:

Solution description: What was done
Implementation date: When it was applied
Change reference: Link to associated change ticket
Validation: How we verified it worked

Typical solutions

Code fix deployed
Configuration change
Infrastructure upgrade
Process improvement
Training provided

Workflow

Develop/plan the solution
Create a change ticket for implementation
Implement the change
Validate the fix
Update problem status to "Resolved"
Document the solution

Tip: Always create a change ticket for permanent solutions. This ensures proper testing, approval, and rollback planning.

Linking associated incidents

Link related incidents to the problem.

Why link incidents?

Scope assessment: How many users were affected?
Pattern detection: When do incidents occur?
Communication: Update all affected users at once
Metrics: Cost/impact of the problem

Creating links

From the problem:

Go to "Related Incidents" section
Click "Link incident"
Search and select incidents

From an incident:

Open the incident
In "Related Problem", select the problem

Automatic detection

KaliaOps can suggest links based on:

Same affected assets
Similar symptoms (keywords)
Time proximity

Impact on resolution

When the problem is resolved:

All linked incidents can be updated
Users receive notification
Statistics reflect the resolution

Key points

Clear distinction: incident (symptom) vs problem (root cause)
Dedicated workflow for investigation and RCA
Reusable Known Error Database (KEDB)
Workaround + permanent solution documentation
Automatic link to recurring incidents