-
Notifications
You must be signed in to change notification settings - Fork 1.1k
Description
Problem Statement
Currently, Tyk Gateway's graceful shutdown mechanism (graceful_shutdown_timeout_duration) only coordinates the gateway's own resources, but provides no hooks for Go plugins to perform cleanup during shutdown. This can lead to resource leaks and lost work for plugins that maintain:
- Worker pools and background goroutines
- Database connections and caches
- Queued events or buffered data
- External service connections
I wonder if plugins are not meant to use more goroutines or channels but it seems ideal to be able to do work in a plugin that does not block responses beyond the minimum time to get info.
Current Behavior
When Tyk receives SIGINT/SIGTERM:
- Gateway starts graceful shutdown with configured timeout
- Gateway waits for active requests to complete
- Gateway terminates process immediately when ready (potentially before timeout)
- Plugin resources are forcibly killed with no cleanup opportunity
Real-World Impact
We encountered this building an eventing plugin with a worker pool for async eventing in a response plugin
where at some point we will send an event with data from the request and response to some message broker.
// Plugin maintains background workers that process queued events
type WorkerPool struct {
workers int
queue *WorkerQueue // Unbounded queue of log events
wg sync.WaitGroup
}
// When Tyk shuts down, workers are killed mid-processing
// Queued events are lost, connections leak
Even with plugin-side signal handling, there's no coordination - Tyk may exit before plugin cleanup completes.
Proposed Solution
Add plugin lifecycle hooks to the Go plugin interface:
// Optional interface plugins can implement
type PluginLifecycle interface {
// Called during gateway shutdown before process termination
// timeout indicates remaining time before force kill
Shutdown(ctx context.Context, timeout time.Duration) error
}
// In plugin:
func (p *MyPlugin) Shutdown(ctx context.Context, timeout time.Duration) error {
log.Info("Plugin cleanup starting...")
// Drain worker queues
p.workerPool.Stop()
// Close connections
p.db.Close()
select {
case <-p.cleanupDone:
log.Info("Plugin cleanup completed")
return nil
case <-ctx.Done():
log.Warn("Plugin cleanup timeout")
return ctx.Err()
}
}
Alternative Approaches
- Registry Pattern: Plugin registers cleanup functions during init()
- Context Propagation: Provide shutdown context to plugin functions
- Signal Coordination: Delay process exit until plugin cleanup signals completion
- Add a timer to not shutdown until a set duration, regardless on whether or not the business logic is ready to shutdown.
Benefits
- Prevents Resource Leaks: Proper cleanup of connections, goroutines
- Data Integrity: Allows plugins to flush buffers, complete transactions
- Graceful Degradation: Plugins can save state for restart recovery
- Production Reliability: No more lost work during deployments
Backward Compatibility
The PluginLifecycle interface would be optional - existing plugins continue working unchanged. Only plugins implementing the interface would receive shutdown notifications.