Troubleshooting Tasks

The documentation in this section of the troubleshooting chapter is intended to complement the documentation on tasks from the management UI chapter. To familiarize yourself with tasks it is recommended to start reading there. The subsubsection on task attributes is of particular interest when troubleshooting tasks.

Many actions performed in orcharhino are performed as so called tasks. Tasks run asynchronously, can be scheduled to run at a specific time, and can perform background actions over an extended period of time.

Tasks are a feature of the foreman-tasks plugin (which is always included with orcharhino) and runs as the foreman-tasks background service.

Tasks are of particular importance for troubleshooting purposes, since they are used for the interaction between different system components. (Tasks allow for services to schedule actions performed by another service). Any number of things can go wrong with such interactions, and the foreman-tasks plugin provides inbuilt functionality for troubleshooting and recovery.

Resource Locks

Tasks may lock various resources they require access to. This mechanism exists to prevent multiple tasks from making changes to the same resource at the same time. There are two types of resource locks. Exclusive resource locks (indicated by a closed lock icon symbol_lock_exclusive) prevent other tasks from accessing the resource. (Another task attempting to lock the same resource will fail with a relevant error message). Non-exclusive resource locks (indicated by an open lock symbol_lock_non_exclusive) do not block access to the resource. They merely indicate a relation between task and resource.

When viewing a task you can see any associated resource locks on the locks tab:

Viewing resource locks
  • Note how this example task blocks both read and write access to a specific product with an exclusive resource lock (symbol_lock_exclusive).

  • The user, provider, and organization are indicated with non-exclusive resource locks (symbol_lock_non_exclusive).

  • Any other tasks trying to write to the same product at the same time will fail with an error result. If that task supports resuming, you can do so once the lock is gone.

Tasks Reporting Errors

A failed task will report an error result. It will also change it’s state to stopped or paused. A stopped task with error result cannot be resumed. If need be, investigate the errors encountered and then start a new task as appropriate. A paused task with error result may still be manually resumed.

Either way, investigate the encountered errors first. When viewing a task, you can see any errors encountered on the errors tab:

Errors of a task
  • In this example we attempted to discover a repository of type Docker where no Docker repository is available (1), resulting in orcharhino requesting a resource that does not exist (404 Not Found) (2).

  • In this case we could try again selecting the correct repository type.

  • If the reported error is not illuminating and is preventing an important task from succeeding, it is probably best to contact ATIX Support.

Tasks That Appear Unresponsive

In rare cases a task may appear to be stuck in a running state indefinitely. (Perhaps it is waiting for some service that has crashed). If you suspect a task of having stalled, you can start by checking its running steps.

When viewing a task, you can see its running steps on the running steps tab (note that the content on this tab is only available while a task is running):

Running steps of a task
  • In this case we have a bulk action task, that is still waiting for its child task to finish.

  • If we did have a task that was genuinely stuck we could try to restart the relevant background services. Alternatively, we might cancel the task.

Resuming a Paused Task

As mentioned already, some tasks support resuming after they have encountered an error. Finding such tasks can be easily achieved by filtering the tasks page using the filter key state = paused.

When viewing a paused task, a Resume button will be available as follows:

Resuming a paused task
  • It only makes sense to resume the task if you have reason to believe it may now finish successfully. (Perhaps the task encountered a temporary network error before).

  • If you do resume a task, and it does complete successfully, it will change its state to stopped and its result to warning.

Cancelling a Task

Some tasks support manual cancellation. This may be important to free up the resource locks of a task that has stalled.

When viewing a task that supports manual cancellation, a Cancel button will be available as follows:

Cancelling a task

It is up to a task to support manual cancellation. If the action is unavailable, there is nothing you can do about that.

Manually cancelling a task can potentially cause as inconsistent system state. Cancel a task only as a last resort manually.