Synoptix Disaster Recovery Plan (DRP)

1. Purpose

The purpose of this Disaster Recovery Plan (DRP) is to define Synoptix’s procedures for restoring IT systems, data, and operations after a disruptive event (data-center outage, major hardware failure, ransomware/data corruption, environmental event, etc.). The DRP focuses on technical recovery steps, verification, and returning services to a secure, operational state in line with the Recovery Time Objectives (RTOs) and Recovery Point Objectives (RPOs) defined by Synoptix.

2. Scope

This DRP covers Synoptix-managed infrastructure and services that support customer-facing applications and critical internal systems, including:

  • Synoptix Cloud production systems and databases (when customers host with Synoptix)
  • Configuration and application stacks required for service delivery (API endpoints, integration services, middleware)
  • Backup repositories and off-site/third-party backup media
  • Supporting infrastructure (DNS, VPN, authentication/AD, jump boxes, monitoring)
  • Recovery actions that require coordination with third-party providers (cloud/data-center vendors, off-site backup storage vendors, couriers)

It does not cover customer-managed environments unless Synoptix has explicit contractual responsibility for recovery.

3. Objectives & Recovery Targets

3.1 Objectives

  • Restore critical systems to an operational state while preserving data integrity and confidentiality.
  • Minimize customer impact by meeting published RTO/RPO targets where contractually required.
  • Ensure recovery steps maintain compliance with Synoptix security controls (encryption, access controls, logging).
  • Provide clear, timely communication during recovery events.

3.2 RTO / RPO (Target)

System Category

RTO

RPO

Critical production systems (transactional DBs, customer-facing APIs)

4 hours

1 hours

Support & incident triage (ticketing, communications)

4 hours

≤4 hours

Non-critical systems (internal tools)

24 hours

24 hours

Backups & archived configurations

Restore within 24 hours

Nightly snapshots

Note: Customer contracts may specify different RTO/RPOs. Always check the applicable DPA/SLA.

4. Roles & Responsibilities

Role

Primary DR Responsibilities

Executive Sponsor (CEO)

Approve DR activation, allocate resources, approve public/customer communications.

Incident Commander / IRT Lead

Own decision to trigger DR procedures, coordinate technical and business recovery teams.

Business Continuity Coordinator (BCC)

Logistics, alternate site coordination, vendor communications.

Infrastructure / DBA Team

Execute data restores, validate backups, perform failover/failback.

DevOps / Application Owners

Redeploy application stacks, apply configuration changes, validate functionality.

Support & Communications

Draft and send customer/internal communications, manage support triage.

InfoSec Program Lead

Ensure recovery actions maintain security posture; approve credential use from KMS; oversee forensic preservation where required.

Legal / Compliance

Advise on regulatory notification requirements.

All Employees

Follow direction for DR tasks; assist if assigned.

5. DR Activation & Escalation

5.1 Activation Criteria

Activate the DR plan when one or more of the following is true:

  • Production services unavailable and outage expected to exceed SLA thresholds.
  • Confirmed data loss or corruption affecting production databases.
  • Primary cloud region or data center unreachable for extended periods.
  • Confirmed ransomware or severe compromise that impacts availability or data integrity.
  • Any event the Incident Commander and Executive Sponsor determine requires DR activation.

5.2 Activation Procedure

  1. Incident detected (internal log review, customer report, or staff observation).
  2. IRT Lead performs initial impact assessment and recommends DR activation.
  3. Executive Sponsor authorizes formal DR activation.
  4. Incident Commander opens War Room channel, creates DR incident ticket, and notifies key stakeholders per Appendix A contact list.
  5. Notify customers as required (see Communication section). Note: Synoptix typically notifies affected customers within 48 hours of confirmation per the Security Incident Response Program; DR communications may be faster for availability incidents.

6. Recovery Strategies & Playbooks

For each major scenario below, the runbook steps are: Detect → Activate DR → Contain → Recover → Validate → Communicate → Lessons Learned.

6.1 Cloud Region / Data-Center Outage (Primary Region Failure)

Detect: Outage reported by cloud provider or internal failure metrics; services unreachable.

Immediate (0–1 hour):

  • Incident Commander verifies impact and scope, notifies Exec Sponsor and BCC.
  • Lock down any administrative changes to avoid compounding the outage.

Contain / Failover (1–4 hours):

  • If automatic multi-region failover exists: initiate failover (DNS or provider failover).
  • If manual: provision standby infrastructure in secondary region from golden images; restore latest available snapshot to DR region.
  • Update DNS TTLs as required; coordinate with DNS provider.
  • Retrieve encrypted keys and credentials from KMS (dual control if required) for restored services.

Recovery & Validation (4–24 hours):

  • Run application smoke tests and data-integrity checks (checksum comparisons, test transactions).
  • Monitor logs and metrics for anomalous behavior.

Communicate: Send initial customer notification (impact scope and expected cadence) and follow-ups every 4–8 hours.

Failback: Once primary region is available and validated, plan failback: resynchronize data, test, schedule cutover during maintenance window, and update customers.

6.2 Ransomware / Data Corruption

Detect: Data encryption detections, ransom note, abnormal file behavior, corruption noticed in DB.

Immediate (0–1 hour):

  • Isolate infected systems; disable network access to prevent spread.
  • Preserve forensic evidence (system images, volatile memory) per IR playbook. Coordinate with InfoSec before restoration.
  • Suspend backup rotations temporarily to protect backups from being overwritten.

Contain & Clean (1–24 hours):

  • Identify clean backup snapshot (pre-compromise).
  • Rebuild affected machines from clean images.
  • Rotate credentials (especially admin and service accounts) and revoke tokens.
  • Engage third-party incident response consultants if severity warrants.

Recovery & Validation (24–72 hours):

  • Restore databases from pre-compromise backups to DR environment; validate integrity.
  • Perform extended monitoring for persistence or reinfection.

Communicate: Notify affected customers per IR Program (within 48 hours of confirmation), include guidance (password rotations, log review). Coordinate with Legal.

Post-Recovery: Enhance immutable/air-gapped backup strategy, increase backup retention/segregation.

6.3 Database Failure / Corruption (Non-malicious)

Detect: Failed restores, integrity check failures, abnormal transaction logs.

Immediate (0–2 hours):

  • Stop services that may write to affected DB to prevent further corruption.
  • Identify most recent consistent backup/snapshot.

Recovery (2–8 hours):

  • Restore DB to staging environment; run consistency checks and sample queries to confirm correctness.

Validation (8–24 hours):

  • Run application-level tests; compare reconciliation totals with customer data if applicable.

Communicate: Target customer notification within 24–48 hours if their data or service is impacted.

6.4 Application Stack Failure (Code / Config)

Detect: Deployment causes failures, high error rates.

Immediate (0–2 hours):

  • Roll back to last known-good deployment (version-controlled artifacts).
  • Disable any faulty feature flags.

Recovery (2–8 hours):

  • Re-deploy stable artifact to production or DR environment.
  • Fix pipeline or configuration issue, test, and re-deploy.

Validation: End-to-end tests and sanity checks.

6.5 Hardware Failure (Host / Storage)

Detect: Host hardware errors, degraded RAID arrays, storage controller faults.

Immediate (0–2 hours):

  • Failover to redundant hardware (if HA exists) or provision replacement hosts.
  • If physical disk failed, request replacement via provider or hardware vendor.

Recovery (2–24 hours):

  • Rebuild arrays from parity or restore from backups; validate data integrity.

7. Backup Management & Procedures

7.1 Backup Strategy

  • Nightly full/incremental backups for production (retention: 30 days for DB snapshots; configuration/source code 90 days).
  • Backups are encrypted at rest (AES-256) and keys managed in the KMS.
  • Immutable snapshots or offline copies are maintained where feasible to mitigate ransomware.

7.2 Backup Storage & Chain-of-Custody

  • Backups stored in primary and off-site locations; off-site physical media transported under chain-of-custody controls.
  • Infrastructure & DBA Team maintains a Backup Inventory (media ID, creation date, retention, location).
  • Any handoff to couriers is logged with seals and tracking numbers.

7.3 Backup Restoration

  • Follow the documented Restore Procedure: identify clean snapshot → provision staging environment → restore snapshot → run DB integrity checks → perform application smoke tests → promote to production or route traffic to DR environment.
  • Record start/finish times, checksums, and validation steps in DR incident ticket.

8. Alternate / Temporary Operations

8.1 Alternate Site Options

  • Remote work is the primary alternate workspace; critical staff must have company-issued, encrypted devices and VPN access.
  • Maintain vendor contracts for temporary co-working space or secondary office for extended outages.

8.2 Minimal Viable Operations

  • If full restoration cannot be completed within RTO, implement a Minimal Viable Product (MVP) mode: essential APIs and read-only reporting to allow critical customer functionality (document specifics per-service).

9. Security Considerations During Recovery

  • Retrieve credentials and secrets from the centralized KMS; follow dual-control procedures for privileged retrieval.
  • Enforce MFA and RBAC for all accounts used during recovery; do not use personal accounts.
  • Preserve forensic evidence when a security incident is the cause; coordinate with InfoSec before wiping or reimaging systems.
  • Ensure logs are archived (immutable where possible) for post-incident analysis.

10. Testing, Exercises & Maintenance

10.1 Testing Schedule

  • Quarterly: Restore from backups to a staging environment (DBA) – verify data integrity and restore time.
  • Semi-Annual: Tabletop technical exercise for DR scenarios (technical leads, support, communications).
  • Annual: Full DR test simulating primary-region outage and performing complete restore/failover; include customer-notification simulation.

10.2 Test Procedures

  • Tests are planned with scope, objectives, success criteria, rollback plan, and communications plan.
  • Results documented in a Test Report including timings (time-to-restore), issues, and remediation owners.

10.3 Metrics & KPIs

  • Mean time to restore (per system), success rate of restores, backup integrity fail rate, and time-to-detect malicious backups. Report quarterly to Executive Leadership.

11. Communication & Notification

11.1 Communication Principles

  • Timely, factual, coordinated with Legal and Exec Sponsor. Use pre-approved templates.
  • The Incident Commander / Support team is primary sender for customer updates; Legal reviews public statements.

11.2 Notification Cadence

  • Initial internal notification: immediately upon DR activation.
  • Customer initial status (if impacted): within 4–8 hours for availability incidents; for confirmed security incidents, follow IR Program notification timelines (noting Synoptix typically notifies within 48 hours of confirmation).
  • Follow-ups: at least every 4–8 hours, or as decided by Exec Sponsor.

12. Post-Incident: Lessons Learned & Improvements

  • Hold a Post-Incident Review within 10 business days including IRT, DevOps, DBA, Support, Legal, and Executive Sponsor. Document root cause, timeline, recovery effectiveness, communication effectiveness, and action items.
  • Prioritize remediation (technical, process, training) with owners and deadlines; track until closed.
  • Update DR runbooks, backup retention, and any automation to reduce recovery time.

13. DR Documentation & Version Control

  • Store authoritative DR documents, runbooks, and checklists in the internal policy portal and backup copies in the DR repository.
  • Sensitive items (DNS credentials, KMS access procedures) are stored in KMS and only accessible to authorized personnel.
  • Maintain revision history and require Exec Sponsor sign-off for material changes.

14. Dependencies & Vendor Management

  • Maintain an up-to-date vendor contact list (cloud provider, DNS, backup storage, courier, hardware vendor) in Appendix A.
  • For high-risk vendors, keep copies of SOC/Security reports where available and ensure SLAs include DR support.
  • Include contractual expectations in vendor agreements for notification and escalations.

15. Appendices

Appendix A — Contacts & Escalation Template

(Populate with live contacts; store sensitive contacts in KMS)

  • Executive Sponsor (CTO/CEO): David Andersen | 801-815-2877 | dandersen@synoptixsoftware.com
  • Incident Commander / IRT Lead: Dan Weatbrook | 801-918-1676 | dweatbrook@synoptixsoftware.com
  • Business Continuity Coordinator: Robby Hilder | 801-554-1416 | rhilder@synoptixsoftware.com
  • Support Lead: Pete Alberico | 801-201-3202 | support@synoptixsoftware.com
  • Infrastructure/DBA Lead: Denver Campbell | 801-608-4880 | dcampbell@synoptixsoftware.cominfra@synoptix.com
  • Infrastructure/DBA Lead: Denver Campbell | 801-608-4880 | dcampbell@synoptixsoftware.com infra@synoptix.com
  • Legal: Mike Black | 801-898-0341 | legal@synoptix.com mblack@mbmlawyers.com
  • Primary Cloud Provider Rep: company | rep name | emergency phone | account id
  • Backup Vendor: name | phone | chain-of-custody contact
  • DNS Provider / Registrar: name | phone | login admin
  • Third-Party IR Consultant: name | phone | email

Appendix B — Quick DR Activation Checklist (for Incident Commander)

  • Confirm detection & impact scope.
  • Recommend DR activation; obtain Exec Sponsor authorization.
  • Open War Room channel and DR incident ticket.
  • Identify critical systems and owners.
  • Assign recovery teams (Infra/DBA, DevOps, Support, Communications).
  • Verify availability of clean backup snapshot (identify timestamp).
  • Retrieve required keys from KMS (dual control if required).
  • Start restore/failover; log all actions/times.
  • Execute validation tests; report results to Exec Sponsor.
  • Communicate status to customers per cadence.

Appendix C — Backup Inventory Template (example columns)

  • Media ID | Backup Type (full/incremental) | Creation Timestamp | Retention | Location | Encrypted (Y/N) | Owner | Restoration Notes

Appendix D — Restore Runbook (DBA)

  1. Identify latest clean backup snapshot (pre-incident snapshot).
  2. Provision staging instance in DR region.
  3. Restore DB snapshot to staging.
  4. Run DB consistency and checksum validation.
  5. Run application-level smoke tests against restored DB.
  6. If validated, schedule promotion to production or re-route traffic to DR endpoint.
  7. Document start/end times, validation results, and issues.

Appendix E — Test Report Template

  • Test name | Date | Systems included | Objective | Start time | End time | RTO met? (Y/N) | Issues found | Action items & owners | Sign-off

16. Revision History

Version

Date

Changes

Author

1.0

September 9, 2025

Initial DRP based on Synoptix operations and InfoSec controls.

Synoptix Infrastructure / InfoSec Lead