Skip to content

Feature Request: Plugin Shutdown Hooks for Graceful Resource Cleanup #7250

@jmacelroy

Description

@jmacelroy

Problem Statement

Currently, Tyk Gateway's graceful shutdown mechanism (graceful_shutdown_timeout_duration) only coordinates the gateway's own resources, but provides no hooks for Go plugins to perform cleanup during shutdown. This can lead to resource leaks and lost work for plugins that maintain:

  • Worker pools and background goroutines
  • Database connections and caches
  • Queued events or buffered data
  • External service connections

I wonder if plugins are not meant to use more goroutines or channels but it seems ideal to be able to do work in a plugin that does not block responses beyond the minimum time to get info.

Current Behavior

When Tyk receives SIGINT/SIGTERM:

  1. Gateway starts graceful shutdown with configured timeout
  2. Gateway waits for active requests to complete
  3. Gateway terminates process immediately when ready (potentially before timeout)
  4. Plugin resources are forcibly killed with no cleanup opportunity

Real-World Impact

We encountered this building an eventing plugin with a worker pool for async eventing in a response plugin
where at some point we will send an event with data from the request and response to some message broker.

// Plugin maintains background workers that process queued events
  type WorkerPool struct {
      workers   int
      queue     *WorkerQueue  // Unbounded queue of log events
      wg        sync.WaitGroup
  }

  // When Tyk shuts down, workers are killed mid-processing
  // Queued events are lost, connections leak

Even with plugin-side signal handling, there's no coordination - Tyk may exit before plugin cleanup completes.

Proposed Solution

Add plugin lifecycle hooks to the Go plugin interface:

  // Optional interface plugins can implement
  type PluginLifecycle interface {
      // Called during gateway shutdown before process termination
      // timeout indicates remaining time before force kill
      Shutdown(ctx context.Context, timeout time.Duration) error
  }

  // In plugin:
  func (p *MyPlugin) Shutdown(ctx context.Context, timeout time.Duration) error {
      log.Info("Plugin cleanup starting...")

      // Drain worker queues
      p.workerPool.Stop()

      // Close connections
      p.db.Close()

      select {
      case <-p.cleanupDone:
          log.Info("Plugin cleanup completed")
          return nil
      case <-ctx.Done():
          log.Warn("Plugin cleanup timeout")
          return ctx.Err()
      }
  }

Alternative Approaches

  1. Registry Pattern: Plugin registers cleanup functions during init()
  2. Context Propagation: Provide shutdown context to plugin functions
  3. Signal Coordination: Delay process exit until plugin cleanup signals completion
  4. Add a timer to not shutdown until a set duration, regardless on whether or not the business logic is ready to shutdown.

Benefits

  • Prevents Resource Leaks: Proper cleanup of connections, goroutines
  • Data Integrity: Allows plugins to flush buffers, complete transactions
  • Graceful Degradation: Plugins can save state for restart recovery
  • Production Reliability: No more lost work during deployments

Backward Compatibility

The PluginLifecycle interface would be optional - existing plugins continue working unchanged. Only plugins implementing the interface would receive shutdown notifications.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions