7 Ways WmiAxon Improves System Monitoring

Troubleshooting Common WmiAxon Issues and Fixes

Symptom: Agent shows offline in dashboard or fails to register.
Quick fixes:
1. Network: Verify agent can reach server IP/hostname and required port (use ping/telnet).
2. Firewall: Open agent outbound port and server inbound port.
3. DNS: Confirm hostname resolves correctly.
4. Time sync: Ensure system clock/NTP is correct (certificate failures often follow drift).
5. Logs: Check agent logs for TLS/auth errors and rotate credentials if expired.

Symptom: Monitoring causes resource spikes or alerts about resource exhaustion.
Quick fixes:
1. Sampling rate: Lower polling frequency for heavy metrics.
2. Disable unused checks: Turn off nonessential plugins/collectors.
3. Batching: Enable metric batching or increase collection intervals.
4. Upgrade agent: Ensure latest agent with performance improvements is installed.
5. Profile: Use system profiler to find hot threads in the agent process.

Symptom: Expected metrics absent or show gaps/inconsistent values.
Quick fixes:
1. Collector health: Verify each collector/plugin is enabled and healthy.
2. Permissions: Ensure agent has permission to access required system resources (e.g., WMI, files, APIs).
3. Network drops: Check for packet loss between agent and server; enable retries.
4. Metric names/versions: Confirm metric names didn’t change after upgrades; update dashboards/queries.
5. Log inspection: Review agent/plugin logs for errors or timeouts.

Symptom: Noisy alerts or repeated notifications for the same condition.
Quick fixes:
1. Threshold tuning: Increase thresholds or add sustained-duration requirements (e.g., 5 min).
2. Aggregation: Aggregate metrics over a window before alert evaluation.
3. Suppression/maintenance windows: Configure suppression during known maintenance periods.
4. Dependencies: Use alert dependencies to avoid duplicate alerts from downstream services.
5. Flapping detection: Enable flapping detection or add cooldown periods.

Symptom: Dashboards load slowly or queries time out.
Quick fixes:
1. Query scope: Narrow time ranges and reduce high-cardinality group-bys.
2. Downsampling: Use pre-aggregated or downsampled metrics for long-range views.
3. Indexing: Ensure backend indexes are healthy and retention policies are appropriate.
4. Panel limits: Reduce number of panels or concurrent queries per dashboard.
5. Backend scaling: Scale query nodes or increase resources if chronic.

Symptom: Users cannot log in or access resources; agent auth failures.
Quick fixes:
1. Credentials: Confirm API keys, tokens, or service accounts are valid and unexpired.
2. Role mapping: Verify RBAC roles permit required actions.
3. SSO/SAML: Check identity provider connectivity and certificate validity.
4. Audit logs: Inspect auth logs for denied requests and trace causes.

Symptom: Connection refused, handshake failures, or “certificate expired” errors.
Quick fixes:
1. Certificate validity: Check expiry and renew as needed.
2. Chain and CA: Ensure full chain is presented and trusted by clients.
3. Hostname mismatch: Confirm cert SANs include server hostnames.
4. Protocol support: Ensure both sides support common TLS versions and ciphers.

Symptom: After upgrade, features break or agents stop reporting.
Quick fixes:
1. Compatibility matrix: Verify agent/server versions are compatible before upgrading.
2. Rollback plan: Keep backups/config exports and an easy rollback path.
3. Staged rollout: Upgrade a subset first to validate.
4. Migration notes: Follow vendor migration docs for schema or API changes.

Collect logs: Agent + server + system logs for the timeframe.
Reproduce: Try to reproduce the issue in a test environment.
Isolate: Disable nonessential plugins or components to narrow cause.
Confirm environment: Check network, DNS, time, permissions, and certificates.
Escalate: If unresolved, gather timestamps, logs, config files and open a support ticket.