-
-
Notifications
You must be signed in to change notification settings - Fork 7.1k
feat: system service (aka systemd/ windows service) monitor #6488
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Need discussion on how to design it properly, directly execute a command, will end up becoming the recent Next.js RCE issue.
Please create 2 & 3 in separate pull requests. |
This adds a new monitor type to check local services by executing a shell command. It also includes fixes for Prometheus errors when adding new tags and for the UI not updating when tags are changed.
07f47b6 to
d76ce4e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall, looks like a promising start.
There is some unrelated code which needs moving to other PRs, but that should be easy to remove.
I have two requests:
- could you add a very simple testcase that only executes in CI, so we all are sure that this code works in the future (even if new functionality gets added)
- could you make this a non-docker-only monitor (calling it in docker does not quite make senese). I think tailscale was similar.
- I think this will only work on linux right, so we need to also restrict to this.
| // systemctl is-active exits with 0 if the service is active, | ||
| // and a non-zero code if it is inactive, failed, or not found. | ||
| if (error) { | ||
| heartbeat.status = DOWN; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
otherwise retrys don't work
| heartbeat.status = DOWN; |
| // This is the name of the service to check e.g. "nginx.service" | ||
| const serviceName = monitor.local_service_name; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: please inline serviceName, as this way I think the code is more clear.
server/prometheus.js
Outdated
| * @returns {Promise<void>} | ||
| */ | ||
| static async init() { | ||
| PrometheusClient.register.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not seem related
| PrometheusClient.register.clear(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This looks like a leftover from my local debugging to prevent duplicate registration errors during hot-reloads. It is definitely not needed for production and I will remove it
server/server.js
Outdated
| await Prometheus.init(); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not seem related
| await Prometheus.init(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed. This was likely leftover debug code intended to force a metric refresh, but it is too aggressive/destructive for production as it clears the global register. I have removed it
| await server.sendUpdateMonitorIntoList(socket, monitorID); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does not seem related
| await server.sendUpdateMonitorIntoList(socket, monitorID); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ditto
src/pages/EditMonitor.vue
Outdated
| </option> | ||
|
|
||
| <option value="local-service"> | ||
| {{ $t("Local Service") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please name the monitor Systemd Service instead as it is a bit unclear what a local service is otherwise.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am seriously considering to extend this monitor to support Windows Services as well (using PowerShell's 'Get-Service'). Because of this hybrid support, 'Systemd Service' would be too specific. I believe that 'Local Service' or 'System Service' fits best as it implies checking the underlying OS service manager (systemd for Linux, SCM for Windows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"System service" if you add windows support, otherwise "Systemd service" 👍🏻
src/pages/EditMonitor.vue
Outdated
| <label for="local-service-name" class="form-label">{{ $t("Service Name") }}</label> | ||
| <input id="local-service-name" v-model="monitor.local_service_name" type="text" class="form-control" required placeholder="nginx.service"> | ||
| <div class="form-text"> | ||
| {{ $t("localServiceDescription") }} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lets add here (via i18n-t) how to debug this monitor.
This way it is much clearer what this monitor does.
|
I am not so familiar with Linux distros, but I have seen some Linux distros don't have systemctl/systemd. May need a better name instead of "Local Service". |
Systemd is used by Ubuntu, Debian, CentOS, RedHat, Fedora, SUSE to name the most known. Those systems covers the vast majority of the server OS market. To keep the PR clean and maintainable, I propose we stick to Systemd for Linux and later PowerShell for Windows. Chosing 'Local Service' or 'System Service' is a matter or taste. My personal preference would be 'System Service'. |
| // 1. Capture the raw output (prioritize stderr for errors) | ||
| let output = (stderr || stdout || "").toString().trim(); | ||
|
|
||
| // 2. Truncate if too long to ~200 chars | ||
| if (output.length > 200) { | ||
| output = output.substring(0, 200) + "..."; | ||
| } | ||
|
|
||
| if (error) { | ||
| // stderr often contains useful info like "service not found" | ||
| // Use the truncated output, or a default fallback if empty | ||
| heartbeat.msg = stderr || stdout || `Service '${monitor.local_service_name}' is not running.`; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| // 1. Capture the raw output (prioritize stderr for errors) | |
| let output = (stderr || stdout || "").toString().trim(); | |
| // 2. Truncate if too long to ~200 chars | |
| if (output.length > 200) { | |
| output = output.substring(0, 200) + "..."; | |
| } | |
| if (error) { | |
| // stderr often contains useful info like "service not found" | |
| // Use the truncated output, or a default fallback if empty | |
| heartbeat.msg = stderr || stdout || `Service '${monitor.local_service_name}' is not running.`; | |
| // stderr often contains useful info like "service not found" | |
| let output = (stderr || stdout || "").toString().trim(); | |
| if (output.length > 200) { | |
| output = output.substring(0, 200) + "..."; | |
| } | |
| if (error) { | |
| heartbeat.msg = output || `Service '${monitor.local_service_name}' is not running.`; |
|
PR Update Notice Based on the feedback and the cross-platform nature of this feature, I have expanded the scope and finalized the implementation: Renamed to "System Service": Changed the monitor name from "Local Service" to "System Service" to be OS-agnostic (supports both systemd and Windows Service Manager). Added Windows Support: The monitor now auto-detects the OS. On Windows, it uses powershell (via Get-Service) to check status, while maintaining systemctl for Linux. Added debug instructions with platform-specific commands. Verification: Verified manually on Linux I don't have Windows. Hence, I need someone to voluntarily test on a Windows platform |
|
Would writing a testcase for this be possible? |
❗ Important Announcements
Click here for more details:
🚫 Please Avoid Unnecessary Pinging of Maintainers
We kindly ask you to refrain from pinging maintainers unless absolutely necessary. Pings are for critical/urgent pull requests that require immediate attention.
📋 Overview
What problem does this pull request address?
systemdservices #1677), but previous proposals for dedicated monitors (like a systemd monitor) were turned down due to concerns about complexity and "clutter".What features or functionality does this pull request introduce or enhance?
New 'Local Service' Monitor Type:
Bug Fix for Prometheus Errors:
Bug Fix for Tag UI:
systemdservices #1677🛠️ Type of change
Disclaimer
📄 Checklist
📷 Screenshots or Visual Changes
UPDOWN