Failover

Manage failover

Failover and revert (redundancy)

Failover mechanisms ensure service continuity. If a whole server fails, or if critical alarms are detected by the processing application, services can be moved to a backup server. The Controller automatically manages failover when the automatic mode is configured. You can also manually failover, or revert once the primary server is repaired. Both server-level and service-level failover are supported.

  • For server-level failover, when an alarm is received which is configured to trigger failover:
    • All services on the involved server will be moved to one of the configured backups.
    • That backup server will no longer be available for further failovers
  • For service-level failover, when an alarm is received which is configured to trigger failover:
    • If it is a server-level alarm: all services on the involved server will be moved to one of the configured backups (as for a server-level failover).
    • If it is a service-level alarm: only the affected service will be moved.
    • The backup server in use will be reserved for further failovers from the same primary

Packager redundancy is managed by the Packager application (cluster of servers). The Controller is not involved in Packager redundancy.

Timestamps

  • All timestamps are in Universal Time (UTC).
  • Failover servers are selected and grouped according to processing type.
  • User permissions are required to edit failover groups, or to failover or revert.
  • If the server has no services assigned, then default user rights for failover apply.

Triggers

  • Automatic failovers are triggered by alarms that are marked as “triggers failover”.
  • By default, all critical alarms will trigger failovers. See Alarm overrides for more details.

Display failover groups

  1. Click Failover from the left-side menu pane. Failover groups display in the table.

💡

The alarm icon displays the alarm status for the highest severity alarm based on the servers in the failover group.

  1. Click in the Actions column to display details.

All time stamps are in Universal Time (UTC).

Display a failover summary

The failover summary table shows progress and status for failover procedures.

  1. Click Failover from the left-side menu pane. Failover groups display.

  2. Scroll to the bottom of the page to display the Failover summary table.

All time stamps are in Universal Time (UTC).

  1. Click in the Actions column to display details.

Create a failover group

  1. Display failover groups.

  2. Click Add group...

  3. Enter a Group name.

  4. Select a Failover level.

  5. Select a Group processing type for the servers in the group.

  6. Select a Group failover mode.

The default Group failover mode is Automatic. You can manually trigger a failover, or revert, even in Automatic group failover mode.

  1. Select available servers, and use the arrows to define the list of primary and backup servers. The new failover group displays in the list.

You can use the search option to display and select specific servers (for example, per processing type).

Edit a failover group

  1. Display failover groups.

  2. Click . The failover setup page displays.

Edit options are only available when Server settings are modifiable. If there is no button, check user rights or contact an administrator.

  1. Edit the group settings configuration.
  • Failover level can only be changed if there is no active failover for this group (no services assigned to a backup).
  • Group processing type cannot be changed.

  1. Edit the server selection for primary and backup servers, then click Save and exit. The group is added to the list.

Request a failover or a revert between servers manually

Before you start: At least one Failover group exists.

⚠️

User permissions are required to request a failover or a revert. If the server has no services assigned, then default user rights for failover apply.

  1. Display failover groups.

  2. Click to display details.

  3. Click to failover to backup servers, or revert to primary servers. You are prompted to continue.

  4. A modal pop-up is displayed, showing progress and success/failure of this operation.

  5. When complete, confirm that the assigned servers are correct, by checking the services page.

For a service-level group, manual failover/revert behave exactly as for a server-level group; all services on the primary server will be moved to the backup. All services on the backup server will be moved back to the configured primary (whether all services due to a manual failover, or service(s) which have been automatically moved in response to an alarm).

If the failover group is service-level, manual failover and revert can be done from the services page.

Manual service failover/revert is only available for services that are assigned to a single server.

  1. From the Services page, click to show more actions on the service you want to failover or revert.

  2. Click on the (failover) or (revert) icon to perform the action. Only the icon for the available action is displayed.

  3. Click Yes to confirm the action.

  4. A modal pop-up is displayed, showing progress and success/failure of this operation.

If multiple services are selected, the failover/revert action is only available if all selected services are on the same server.

Select failover trigger settings

By default, a failover is triggered for any critical alarm. You can select the failover trigger option for any alarm. This also applies to alarms that you have changed from the default alarm severity.

⚠️

Changes to severity are only applied to future occurrences. Any current or previously existing alarms are unchanged.

  1. Click Alarms in the left hand menu.

  2. Click Alarm overrides to display options. The following page displays.

  3. Select a response for Trigger failover.