Frequently Asked Questions¶
Security & Gateway Proxy¶
Q: What does the agent do by default?¶
By default, the PipeOps agent provides secure admin access only:
- Establishes WebSocket tunnel to PipeOps control plane
- Enables secure cluster management without inbound firewall rules
- Does NOT expose your cluster externally
- Does NOT install monitoring components (when installed via manifest)
- Does NOT sync ingresses or register routes
Q: Is ingress sync (Gateway Proxy) enabled by default?¶
NO. For security reasons, the PipeOps Gateway Proxy feature is DISABLED by default (as of v1.x).
The agent will NOT expose your cluster externally unless you explicitly enable it with:
When disabled, you'll see:
When enabled, the agent automatically:
- Detects if the cluster is private (no public LoadBalancer on ingress-nginx)
- Starts the ingress watcher
- Syncs all existing ingresses to the control plane
- Watches for new/updated/deleted ingresses and syncs them in real-time
You'll see:
{"level":"info","msg":"Ingress sync enabled - monitoring ingresses for gateway proxy"}
{"level":"info","msg":"Initializing gateway proxy detection..."}
{"level":"info","msg":"Private cluster detected - using tunnel routing"}
{"level":"info","msg":"Starting ingress watcher for gateway proxy"}
{"cluster_uuid":"...","ingress_count":4,"msg":"Syncing ingresses with controller"}
Q: When should I enable Gateway Proxy?¶
Enable the PipeOps Gateway Proxy only if you want to:
- Expose services in private clusters without VPN
- Use custom domains for cluster services
- Provide external access to applications via PipeOps gateway
For secure admin access only, keep it disabled (default).
Q: How does cluster detection work?¶
When gateway proxy is enabled, the agent checks the ingress-nginx-controller service:
- LoadBalancer with External IP - Public cluster - Uses direct routing
- NodePort or no External IP - Private cluster - Uses tunnel routing
Note: Cluster detection only happens when enable_ingress_sync: true is set.
Installation & Component Auto-Install¶
Q: Why aren't monitoring components installed when I use Helm?¶
The agent's auto-installation feature is disabled by default for Helm and Kubernetes manifest deployments. This is intentional because we assume you're deploying to an existing cluster that may already have monitoring tools like Prometheus, Grafana, or Loki installed.
To enable auto-installation with Helm:
helm install pipeops-agent ./helm/pipeops-agent \
--set agent.pipeops.token="your-token" \
--set agent.cluster.name="my-cluster" \
--set agent.autoInstallComponents=true # Enable auto-install
Why this design? - Fresh installations (bash script): Auto-install is enabled for quick setup - Existing clusters (Helm/K8s): Auto-install is disabled to prevent conflicts - Gives you full control based on your environment
See the Component Installation Behavior section for more details.
Q: What happens after ingress sync (when enabled)?¶
Routes are registered with the control plane: - Private clusters: Traffic routes through WebSocket tunnel to agent - Public clusters: Traffic routes directly to LoadBalancer IP (3-5x faster)
The agent sends:
- routing_mode: "tunnel" or "direct"
- public_endpoint: LoadBalancer IP (if available) or empty
- cluster_uuid: Cluster identifier
- Ingress rules (host, path, service, port, TLS, annotations)
Agent Health & Heartbeat¶
Q: Why don't I see heartbeat/ping logs?¶
Heartbeats ARE running - they just log at different levels:
- Success: DEBUG level (not visible in INFO logs)
- Failure: WARN/ERROR level (visible)
You'll see heartbeat failures like:
{"error":"failed to send heartbeat: WebSocket not connected","level":"warning","msg":"Heartbeat failed, retrying with backoff..."}
But successful heartbeats are silent at INFO level.
Q: How can I confirm the agent is healthy?¶
Check for these INFO-level logs:
-
Registration:
-
Connection state:
-
Ingress sync (if enabled):
-
Prometheus metrics (if monitoring enabled):
If you see these periodically, the agent is healthy.
Q: How often does the agent send heartbeats?¶
Every 30 seconds to match control plane expectations.
If heartbeat fails, it retries with exponential backoff (5s, 10s, 30s) up to 3 attempts.
Monitoring Stack¶
Q: Why do I keep seeing "Discovered Prometheus service" every 30 seconds?¶
This is normal and expected when the monitoring stack is enabled. The agent:
- Sends a heartbeat to the control plane every 30 seconds
- Each heartbeat includes monitoring information (Prometheus URL, credentials, etc.)
- To get this information, the agent discovers the Prometheus service dynamically
- Logs at INFO level when successfully discovered
Why dynamic discovery? Different Kubernetes distributions (K3s, managed clusters, vanilla K8s) deploy Prometheus with different service names. The agent detects the actual service name and port automatically.
Other services (Grafana, Loki) are discovered once at startup because they don't need to be included in heartbeat messages.
Note: This only happens if you've installed the monitoring stack. If you haven't enabled monitoring, you won't see these logs.
Q: Can I disable these periodic logs?¶
Not directly, but you can:
- Reduce log level to WARN (hides INFO logs)
- Filter logs in your monitoring system (Loki/Grafana)
- The logs are harmless and indicate healthy monitoring
Q: Why only Prometheus is logged repeatedly?¶
Because only Prometheus information is sent with each heartbeat (every 30 seconds) to the control plane. This allows the control plane to: - Access Prometheus metrics via the tunnel or directly - Monitor cluster health without polling - Get real-time access credentials
Other services: - Grafana: Accessed via ingress proxy (discovered once at startup) - Loki: Logs forwarded by Promtail (no agent involvement)
Technical detail: The log appears in internal/components/manager.go::discoverPrometheusService() which is called by GetMonitoringInfo() on every heartbeat cycle.
Region Detection¶
Q: How does the agent detect region?¶
Detection order:
- Node labels (most reliable):
topology.kubernetes.io/region, provider-specific labels - Provider ID:
aws://,gce://,azure://, etc. - Metadata service: AWS IMDSv2, GCP metadata, Azure IMDS
- Local environment detection: K3s, kind, minikube, Docker Desktop
- GeoIP detection: For bare-metal/on-premises clusters
Q: What if region can't be detected?¶
Defaults:
- Provider: "bare-metal" or "on-premises"
- Region: "on-premises" or "agent-managed"
- Registry Region: "us" (unless GeoIP detects Europe)
Q: How is registry region determined?¶
For cloud providers:
- EU regions (eu-west-1, eu-central-1, etc.) → "eu"
- All other regions → "us"
For bare-metal/on-premises:
- GeoIP: Europe + Africa → "eu"
- GeoIP: Other continents → "us"
- No GeoIP: "us" (default)
Troubleshooting¶
Agent not connecting to control plane¶
Check logs for:
Solutions:
1. Verify PIPEOPS_API_URL is correct
2. Check AGENT_TOKEN is valid
3. Ensure network connectivity to control plane
4. Check firewall rules (WebSocket requires outbound HTTPS)
Ingress sync not working¶
Check logs for:
If you see:
Your cluster has a public LoadBalancer - ingress sync is disabled (direct routing instead).
Monitoring stack not starting¶
Check:
1. Storage class available: kubectl get storageclass
2. Helm installed: helm version
3. CRDs installed: kubectl get crd | grep monitoring.coreos.com
4. Namespace exists: kubectl get ns pipeops-monitoring
WebSocket disconnections¶
{
"error": "websocket: close 1006 (abnormal closure): unexpected EOF"
}
{
"msg": "Attempting to reconnect to WebSocket",
"base_delay": "4s",
"jitter": "892ms",
"total_delay": "4.892s",
"next_delay": "8s"
}
This is normal - network hiccups, control plane restarts, etc. The agent auto-reconnects and re-registers.
Reconnection Behavior: - Uses exponential backoff with jitter (±25%) - Maximum retry delay: 15 seconds (caps at 15s after 6 failures) - Typical reconnection time: 15-45 seconds for control plane outages - Brief network blips (<5s): Reconnects in ~1 second
Advanced Configuration¶
Enable verbose heartbeat logging¶
Set log level to DEBUG:
Or via environment variable:
How to enable ingress sync¶
Add to your configuration file:
Or set via environment variable:
Then restart the agent:
How to check if ingress sync is enabled¶
Check agent logs on startup:
Output examples:
- "Ingress sync disabled" = NOT exposing cluster
- "Ingress sync enabled" = Monitoring and exposing ingresses
Disable gateway proxy (force direct routing)¶
Gateway proxy is enabled by default and automatically detects your cluster type. For private clusters, it uses tunnel routing. For public clusters with LoadBalancers, it uses direct routing. To disable ingress sync completely, set enable_ingress_sync: false.
Custom Prometheus discovery interval¶
Not currently configurable - hard-coded to 30 seconds. However, discovery results are now cached for 5 minutes to reduce log frequency.
Log Levels Guide¶
| Level | What you see |
|---|---|
| ERROR | Critical failures only |
| WARN | Failures with retry, missing configs |
| INFO | Startup, registration, sync, discoveries (default) |
| DEBUG | Heartbeats, WebSocket messages, detailed flow |
| TRACE | Raw HTTP/WebSocket traffic, internal state |
Recommended: INFO (default) for production, DEBUG for troubleshooting.