Skip to content

Unreleased Changes in The Mainline

Breaking Changes

  • Rocksdb-backed spool store() and remove() calls now time out after 30 seconds of backpressure rather than blocking indefinitely. Tunable via the new store_deadline rocks_params field.

  • The resolve-shaping-domain script's default output has changed. Pass --json-config to restore the previous byte-for-byte pretty-JSON output of the path config. Run resolve-shaping-domain --help for the full flag list.

  • DNS resolver configuration is now defined by a kumomta-owned schema rather than forwarding hickory option names. Existing valid configs continue to parse, but unknown fields in options are now a configure-time error rather than being silently ignored, and the simple 'IP:PORT' form of a name server entry now configures both UDP and TCP for that server (previously UDP only). See configure_resolver and the Resolver Options reference for the supported fields.

Other Changes and Enhancements

  • Upgraded the embedded hickory-resolver 0.26 and libunbound 1.25.1. New kumo.dns.load_resolv_conf reads a resolv.conf-format file into a mutable resolver config table, so you can start from the system upstream list and layer your own options on top before calling configure_resolver.

  • Egress sources can now be configured to auto-suspend when their local bind address appears unplumbed (suspend_when_unplumbed) or when their configured proxy server appears unreachable (suspend_when_proxy_unhealthy). A suspended source is skipped during pool selection until the configured duration elapses. The trigger uses the same Immediate / Threshold("N/period") shape as TSA shaping rules.

  • ha_proxy_server and socks5_proxy_server now accept a DNS host name in addition to an IP literal. The name is resolved at connection time and each returned address is tried in turn, sharing the connect_timeout budget.

  • KumoMTA now proactively detects when the rocksdb-backed spool has reached a state that requires operator intervention (a missing or corrupt SST surfaced through a foreground read/write, or sustained background-error accumulation from compactions or flushes) and transitions into a load-shedding state. While the spool is unhealthy, the SMTP banner returns 421, HTTP injection and /api/check-liveness/v1 return 503, and delivery is paused. Pausing delivery limits the window in which a successful SMTP transaction could be followed by a failed spool remove(), which would otherwise cause that message to be redelivered. The diagnostic log records each transition that drives this: when the rocksdb background-errors counter grows, when a foreground read or write returns a fatal IOError or Corruption, when the load-shedding gate latches, and (where applicable) when the gate later auto-clears after sustained recovery. Each record names the spool path and points at the rocksdb LOG file in that directory for the underlying cause. The delivery pause itself can be toggled with the new kumo.suspend_delivery_when_spool_unhealthy policy function (default: enabled). Several new metrics expose the underlying state to monitoring: rocks_spool_load_shed_active, rocks_spool_background_errors, rocks_spool_write_stopped, rocks_spool_compaction_pending, rocks_spool_num_running_compactions, rocks_spool_estimate_pending_compaction_bytes, and rocks_spool_actual_delayed_write_rate.

  • New kcli spool-compact command (and matching /api/admin/spool-compact/v1 endpoint) forces a flush and full-keyspace compaction on a named rocksdb spool. Primarily a diagnostic and operational helper; surfaces underlying storage errors to the caller.

  • Ready queues now run a per-dispatcher progress watchdog that aborts dispatcher tasks that have stopped making forward progress, catching wedges that escape the normal SMTP timeouts. The threshold is configurable via dispatcher_progress_watchdog_timeout and aborts are surfaced via the dispatcher_watchdog_aborted_total metric. #539

  • Added the kcli inspect-ready-q command and corresponding admin/inspect-ready-q/v1 HTTP endpoint, which return a snapshot of a ready queue's state, effective configuration, the dispatcher tasks currently handling its connections, and the steady-state throughput ceilings implied by the egress path config.

  • Added the kcli abort-ready-q-conn command and corresponding admin/abort-ready-q-conn/v1 HTTP endpoint, which abort a specific dispatcher task by session_id, as shown by the inspect-ready-q output.

  • Added the kcli resolve-egress-path command and corresponding admin/resolve-egress-path/v1 HTTP endpoint, which report the effective egress path config, scheduled-queue config, MX resolution, ready-queue name and the throughput ceilings derived from both configs for a destination domain and egress source. Equivalent to running resolve-shaping-domain against the live runtime instead of a static policy file.

  • Added new lua functions: kumo.compute_egress_path_config_constraints, kumo.compute_egress_path_config_constraints, kumo.compute_queue_config_constraints, kumo.format_egress_path_config_constraints, kumo.format_egress_path_config_toml, kumo.serde.toml_encode_pretty_compact.

  • resolve-shaping-domain now shows the resolved configuration in a pretty toml output, including both the scheduled-queue config and the egress path config, and shows the same throughput-ceiling diagnostic as kcli inspect-ready-q with constraints from both configs folded in. A new --json-queue-config flag emits the queue config as pretty JSON.

Fixes

  • Message::save_to was silently discarding errors returned from the data and meta spool store() operations: the per-spool dirty flags were cleared regardless of success, so a message that failed to persist was still treated by the SMTP ingress path as accepted. Errors now propagate so the ingress path can reject (and the client retries) instead of producing a silent loss.