Prometheus與傳統程序、埠與內網域名檢查

新鈦雲服已累計為您分享822篇技術乾貨
背   景
公司A面臨著監控其IT基礎設施的需求,包括Windows和Linux平臺上的埠、程序和內網域名狀態。隨著業務的增長,維護系統的穩定性和安全性變得尤為重要。傳統的監控方法可能即時性和靈活性不夠好,因此採用Prometheus監控工具,以便更高效地獲取系統狀態和效能指標。
目  標
  1. 實現跨平臺監控:選擇合適的監控工具,能夠在Windows和Linux上均可安裝和執行。
  2. 即時監控埠和程序:對關鍵服務的埠和程序進行即時監控,確保服務可用性,並及時告警。
  3. 內網域名狀態檢查:定期檢查內網域名的解析和可達性,確保內部服務的正常執行。
  4. SSL證書監控:自動檢查SSL證書的有效性和到期時間,確保證書時間健康。
  5. 資料視覺化與告警:透過視覺化工具(如Grafana)展示監控資料,並配置告警機制,以便及時通知監控與開發相關技術人員。
業務流程
  1. 工具選擇與部署
    1. 選擇Prometheus作為監控系統,並利用介面卡(如node_exporter、process_exporter、自定義port採集器)來滿足不同的監控需求。
    2. 在Windows和Linux伺服器上部署Prometheus及其相關Exporter。
  2. 配置埠和程序監控
    1. 使用node_exporter來監控系統的埠和程序狀態。
    2. 配置Prometheus以抓取node_exporter提供的指標,確保可以監控特定的埠和程序。
  3. 告警機制
    1. 採用exporter來實現程序和埠狀態檢查。配置HTTP探測,確保服務的可用性。
    2. 將Prometheus資料來源連線到Grafana,建立儀表盤以視覺化監控資料。
  4. 測試與最佳化
    1. 在實施後進行測試,確保監控系統能夠準確捕捉到各項指標。
    2. 根據反饋進行最佳化,調整監控策略和告警規則,確保系統的高效執行。

Prometheus程序、埠配置

①Dockerfile編譯配置

catDockerfileFROMpythonENVLANG=C.UTF-8ENVTZ=Asia/ShanghaiRUNpip install pyyaml --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simpleRUNpip install requests --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simpleRUNpip install prometheus_client -i https://pypi.tuna.tsinghua.edu.cn/simpleRUNpip install Flask -i https://pypi.tuna.tsinghua.edu.cn/simpleRUNpip install pyyaml -i https://pypi.tuna.tsinghua.edu.cn/simpleRUNpip install asyncio -i https://pypi.tuna.tsinghua.edu.cn/simpleCOPYhost_port_monitor.py /opt/CMD["sleep","999"]
#編譯與推送映象到倉庫docker build -t harbor.export.cn/ops/service_status_monitor_export_port:v2 .docker push harbor.export.cn/ops/service_status_monitor_export_port:v2

②配置特定埠採集器與埠標籤規範化

cat host_port_monitor.py# -*- coding:utf-8 -*-import socketimport osimport yamlimport prometheus_clientfrom prometheus_client import Gaugefrom prometheus_client.core import CollectorRegistryfrom flask import Response, Flaskimport reimport asyncioapp = Flask(__name__)defget_config_dic():""" Load YAML config file and return it as a dictionary. """ pro_path = os.path.dirname(os.path.realpath(__file__)) yaml_path = os.path.join(pro_path, "host_port_conf.yaml")with open(yaml_path, "r", encoding="utf-8") as f: sdata = yaml.full_load(f)return sdataasyncdefexplore_udp_port(ip, port):try: loop = asyncio.get_running_loop() transport, protocol = await loop.create_datagram_endpoint(lambda: UDPProbe(), remote_addr=(ip, port) ) transport.close()return1except Exception:return0classUDPProbe:defconnection_made(self, transport): self.transport = transport self.transport.sendto(b'test')defdatagram_received(self, data, addr): self.transport.close()deferror_received(self, exc):passdefconnection_lost(self, exc):passdefexplore_tcp_port(ip, port):""" Check if the TCP port is open on the given IP. """try: tel = socket.socket(socket.AF_INET, socket.SOCK_STREAM) tel.connect((ip, int(port))) socket.setdefaulttimeout(0.5)return1except:return0defis_valid_label_name(label_name):""" Check if the label name is valid according to Prometheus conventions. """return re.match(r'^[a-zA-Z_][a-zA-Z0-9_]*$', label_name) isnotNonedefformat_label_name(label_name):""" Format invalid label_name to valid ones by replacing invalid characters. """return re.sub(r'[^a-zA-Z0-9_]', '_', label_name)defcheck_port():""" Check the ports for all configured services and apply different requirements. """ sdic = get_config_dic() result_list = []for sertype, config in sdic.items(): iplist = config.get("host") portlist = config.get("port") requirement = config.get("requirement") protocol_list = config.get("protocol", ["tcp"])# Extract dynamic labels and filter valid ones dynamic_labels = {key: value for key, value in config.items() if key notin ['host', 'port', 'requirement', 'protocol']} valid_labels = {format_label_name(key): value for key, value in dynamic_labels.items() if is_valid_label_name(key)} status_all = Truefor ip in iplist:for port in portlist:for protocol in protocol_list:if protocol == "tcp": status = explore_tcp_port(ip, port)elif protocol == "udp": status = asyncio.run(explore_udp_port(ip, port))else: status = explore_tcp_port(ip, port) result_dic = {"sertype": sertype, "host": ip, "port": str(port), "status": status}# Merge valid dynamic labels into the result dictionary result_dic.update(valid_labels) result_list.append(result_dic)if requirement == "all": status_all = status_all and result_dic["status"]elif requirement == "any":if result_dic["status"]: status_all = Truebreakif requirement in ["all", "any"]:for result in result_list:if result["sertype"] == sertype: result["status"] = int(status_all)return result_list@app.route("/metrics")defapi_response():""" Generate Prometheus metrics based on the checked ports. """ checkport = check_port() REGISTRY = CollectorRegistry(auto_describe=False)# Define the metric with labels dynamically base_labels = ["sertype", "host", "port"] dynamic_labels = set()# Collect all unique dynamic labelsfor datas in checkport: dynamic_labels.update(datas.keys()) dynamic_labels = dynamic_labels.difference(base_labels) # Exclude base labels# Create a Gauge with all valid labels muxStatus = Gauge("server_port_up", "Api response stats is:", base_labels + list(dynamic_labels), registry=REGISTRY)for datas in checkport:# Extract base label values sertype = datas.get("sertype") host = datas.get("host") port = datas.get("port") status = datas.get("status")# Prepare label values for dynamic labels label_values = [sertype, host, port] + [datas.get(label, "unknown") for label in dynamic_labels] muxStatus.labels(*label_values).set(status)return Response(prometheus_client.generate_latest(REGISTRY), mimetype="text/plain")if __name__ == "__main__": app.run(host="0.0.0.0", port=8080)
#透過yaml配置埠資訊、標籤、協議等等,告警更加直觀。cat host_port_conf.yaml# Prometheus monitor server port config.pw: env: "prod" applicationowner: "zhangsan" applicationname: "zhangsan" vendor: "export" techowner: "zhangsan" service: "pass" host: - "10.10.10.123" - "10.10.10.124" port: - 3389 requirement: "check"#正常返回通就通,不通就不通 protocol: "tcp"ack: env: "prod" applicationowner: "zhangsan" applicationname: "zhangsan" vendor: "export" techowner: "zhangsan" service: "good" host: - "10.10.10.128" - "10.10.10.129" port: - 1858 requirement: "any"# 只需滿足一個就算全通 protocol: "tcp"
#本地服務測試檢視監控特定埠資料返回狀態資訊curl -s -k http://ip:8080/metrics

③在Cronjob配置

catprometheus-service-monitor-export-port-deploy.yaml---apiVersion: apps/v1kind: Deploymentmetadata:name: service-monitor-export-portnamespace: monitorspec:replicas: 1selector:matchLabels:app: prometheus-service-monitor-export-porttemplate:metadata:labels:app: prometheus-service-monitor-export-portspec:containers:-name: service-monitor-export-portimage: harbor.export.cn/ops/service_status_monitor_export_port:v2imagePullPolicy: Alwayscommand: ["sh","-c"]args: ["python /opt/host_port_monitor.py"]resources:requests:cpu: 500mmemory: 500Milimits:cpu: 4000mmemory: 4000Miports:-containerPort: 8080volumeMounts:-name: host-port-conf-volumemountPath: /opt/host_port_conf.yamlsubPath: host_port_conf.yamlvolumes:-name: host-port-conf-volumeconfigMap:name: host-port-conf---apiVersion: v1kind: Servicemetadata:name: service-monitor-export-portnamespace: monitorspec:selector:app: prometheus-service-monitor-export-portports:-name: httpport: 80targetPort: 8080type: ClusterIP
catprometheus-service-monitor-export-port-configmap.yamlapiVersion: v1kind: ConfigMapmetadata:name: host-port-confnamespace: monitordata:host_port_conf.yaml: | # Prometheus monitor server port config.pw:env: "prod"applicationowner: "zhangsan"applicationname: "zhangsan"vendor: "export"techowner: "zhangsan"service: "pass"host:-"10.10.10.123"-"10.10.10.124"port:-3389requirement: "check" #正常返回通就通,不通就不通protocol: "tcp"crm:env: "prod"applicationowner: "wangwu"applicationname: "wangwu"vendor: "export"techowner: "wangwu"service: "good"host:-"10.10.10.128"-"10.10.10.129"port:-1858requirement: "any" # 只需滿足一個就算全通protocol: "tcp"
#推送到ack執行該服務kubectl apply -f prometheus-service-monitor-export-port-deploy.yamlkubectl apply -f prometheus-service-monitor-export-port-configmap.yaml
#配置自定義服務發現-job_name: service-status-monitor-export-portscrape_interval: 30sscrape_timeout: 30sscheme: httpmetrics_path: /metricsstatic_configs:-targets: ['service-monitor-export-port.monitor.svc:80']#埠PromQL語句 sumby( host, port, sertype, env, applicationname,applicationowner,techowner,vendor,service,status ) (server_port_up{instance="service-monitor-export-port.monitor.svc:80"}) !=1

④程序配置

Windows程序狀態查詢語句如下
windows_service_start_mode{instanceIp=~"10.10.10.76|10.10.10.252",start_mode="auto",name=~"mysqld|redis"}

⑤防火牆配置埠策略

如果Windows出現主機http://10.10.10.22:9400/metrics 為down的,檢視Windows的防火牆是否允許9400埠配置,開啟cmd命令輸入WF.msc進入Windows Defender 防火牆
Linux平臺防火牆配置
#列出系統中的 iptables 規則,同時顯示規則的行號iptables-L --line -n#INPUT和OUTPUT規則允許9400iptables-I INPUT 1 -p tcp --dport 9400 -m comment --comment "Allow arms prometheus - 9400 TCP" -j ACCEPTiptables-I OUTPUT 2 -p tcp --dport 9400 -m comment --comment "Allow arms prometheus - 9400 TCP" -j ACCEPT#防火牆儲存配置serviceiptables save
傳統方式程序、埠配置

①Linux平臺配置

可以使用阿里云云助手或者是ansible推送指令碼執行任務
#建立指令碼執行目錄和日誌存放目錄mkdir -p /opt/prot-monitormkdir -p /var/log/prot-monitor
監控特定程序與埠舉例程序sshd,埠22
cat port-monitor.sh#!/bin/bash# 設定要監控的服務declare -A MONITORS# 格式: MONITORS["服務名稱"]="程序1,程序2:埠1,埠2"MONITORS["ssh"]="sshd:"# 監控程序MONITORS["cyberark"]=":22"# 監控埠# 日誌檔案路徑LOG_DIR="/var/log/prot-monitor"DATE=$(date +"%Y-%m-%d")PORT_LOG="${LOG_DIR}/port-${DATE}.log"PROCESS_LOG="${LOG_DIR}/process-${DATE}.log"# 當前時間TIMESTAMP=$(date +"%Y-%m-%d %H:%M:%S")# 獲取主機名和IP地址HOSTNAME=$(hostname)IP_ADDRESS=$(hostname -I | awk '{print $1}') # 取第一個IP地址# 清理過期日誌(超過30天的日誌)find "$LOG_DIR" -name "*.log" -type f -mtime +30 -exec rm -f {} \;# 檢查埠狀態函式check_port_status() {local port=$1local port_status="UP"# 檢查埠狀態local port_list_output=$(netstat -tunlp | grep -v '@pts\|cpus\|master' | awk '{sub(/.*:/,"",$4);sub(/[0-9]*\//,"",$7);print $4}' | sort -n | uniq | egrep -w "$port")if [[ -z "$port_list_output" ]]; then port_status="DOWN"fiecho$port_status}# 檢查程序狀態函式check_process_status() {local process=$1local process_status="UP"if ! ps -aux | grep -v grep | grep -q "$process"; then process_status="DOWN"fiecho$process_status}# 檢查埠或程序狀態for SERVICE in"${!MONITORS[@]}"; do IFS=':'read -r PROCESSES PORTS <<< "${MONITORS[$SERVICE]}"# 檢查程序狀態if [[ -n $PROCESSES ]]; then IFS=','read -r -a PROCESS_ARRAY <<< "$PROCESSES"for PROCESS in"${PROCESS_ARRAY[@]}"; do PROCESS_STATUS=$(check_process_status "$PROCESS")echo"{\"timestamp\": \"$TIMESTAMP\", \"hostname\": \"$HOSTNAME\", \"ip_address\": \"$IP_ADDRESS\", \"service\": \"$SERVICE\", \"process\": \"$PROCESS\", \"process_status\": \"$PROCESS_STATUS\"}" >> "$PROCESS_LOG"donefi# 檢查埠狀態if [[ -n $PORTS ]]; then IFS=','read -r -a PORT_ARRAY <<< "$PORTS"for PORT in"${PORT_ARRAY[@]}"; do PORT_STATUS=$(check_port_status "$PORT")echo"{\"timestamp\": \"$TIMESTAMP\", \"hostname\": \"$HOSTNAME\", \"ip_address\": \"$IP_ADDRESS\", \"service\": \"$SERVICE\", \"port_status\": \"$PORT_STATUS\", \"port\": \"$PORT\"}" >> "$PORT_LOG"donefidone
Crontab定時執行任務計劃
#每分鐘執行指令碼port-monitor.shcrontab -l*/1 * * * * /bin/bash /opt/prot-monitor/port-monitor.sh#執行檢視日誌輸出bash /opt/prot-monitor/port-monitor.sh#埠日誌json輸出cat /var/log/prot-monitor/port-2024-11-18.log{"timestamp": "2024-11-18 13:51:01", "hostname": "iZufXXXXXXXXXXXXXXX4Z", "ip_address": "10.10.10.80", "service": "sshd", "port_status": "UP", "port": "22"}#程序日誌json輸出cat /var/log/prot-monitor/process-2024-11-18.log {"timestamp": "2024-11-18 13:50:01", "hostname": "iZufxxxxxxxxxxxxxx4Z", "ip_address": "10.10.10.80", "service": "sshd", "process": "sshd", "process_status": "UP"}#檢視下程序和埠日誌服務返回UP狀態資訊#啟動crond和檢視crond狀態systemctl start crondsystemctl status crond#檢視crond服務是否開啟啟動systemctl list-unit-files -t service | grep cron
阿里雲SLS對接伺服器日誌路徑採集日誌
#伺服器安裝logtail並執行服務wget http://logtail-release-cn-hangzhou.oss-cn-hangzhou.aliyuncs.com/linux64/logtail.sh -O logtail.sh; chmod +x logtail.sh#執行logtail服務./logtail.sh installautosudo /etc/init.d/ilogtaild startsudo /etc/init.d/ilogtaild status#檢視服務開機啟動systemctl list-unit-files -t service | grep ilo

②Windows平臺配置

監控特定程序與埠舉例程序PM.exe,CA.exe,rdpclip.exe,埠3389
#建立目錄md C:\monitor-logs
cat C:\monitor-logs\port-monitor.ps1# 設定要監控的服務$MONITORS = @{"cyberark" = "PM.exe,CA.exe,rdpclip.exe:"# 監控程序"rdp" = ":3389"# 監控埠}# 日誌檔案路徑$LOG_DIR = "C:\monitor-logs"if (!(Test-Path$LOG_DIR)) {New-Item -Path $LOG_DIR -ItemType Directory | Out-Null}$DATE = Get-Date -Format "yyyy-MM-dd"$PORT_LOG = Join-Path$LOG_DIR"port-$DATE.log"$PROCESS_LOG = Join-Path$LOG_DIR"process-$DATE.log"# 當前時間$TIMESTAMP = Get-Date -Format "yyyy-MM-dd HH:mm:ss"# 獲取主機名和IP地址$HOSTNAME = $env:COMPUTERNAME$IP_ADDRESS = (Get-NetIPAddress | Where-Object { $_.AddressFamily -eq'IPv4' -and $_.InterfaceAlias -ne'Loopback Pseudo-Interface 1' }).IPAddress# 清理過期日誌(超過30天的日誌)Get-ChildItem -Path $LOG_DIR -Filter "*.log" | Where-Object { $_.LastWriteTime -lt (Get-Date).AddDays(-30) } | Remove-Item# 檢查埠狀態函式function Check-PortStatus {param ( [int]$port )$port_status = "UP"if (-not (Get-NetTCPConnection -LocalPort $port -ErrorAction SilentlyContinue)) {$port_status = "DOWN" }return$port_status}# 使用 tasklist 和 findstr 查詢程序狀態function Check-ProcessStatus {param ( [string]$processNameToCheck )Write-Host"Checking process: $processNameToCheck"$process_status = "UP"# 執行 tasklist 並使用 findstr 來查詢程序$tasklist_command = "tasklist /FO CSV /NH | findstr /I `"$processNameToCheck`""$process_found = Invoke-Expression$tasklist_commandif (-not$process_found) {Write-Host"Process $processNameToCheck not found."$process_status = "DOWN" } else {Write-Host"Process found: $processNameToCheck" }return$process_status}# 檢查埠或程序狀態foreach ($SERVICEin$MONITORS.Keys) {$entry = $MONITORS[$SERVICE]$parts = $entry -split ':'$processes = $parts[0]$ports = $parts[1]# 檢查程序狀態if ($processes) {$process_array = $processes -split ',' | Where-Object { $_-ne"" }foreach ($processin$process_array) {$process_status = Check-ProcessStatus -processNameToCheck $process$log_entry = @{ timestamp = $TIMESTAMP hostname = $HOSTNAME ip_address = $IP_ADDRESS service = $SERVICEprocess = $process process_status = $process_status } ($log_entry | ConvertTo-Json -Compress) | Out-File -Append -FilePath $PROCESS_LOG } }# 檢查埠狀態if ($ports) {$port_array = $ports -split ',' | Where-Object { $_-ne"" }foreach ($portin$port_array) {$port_status = Check-PortStatus -port $port$log_entry = @{ timestamp = $TIMESTAMP hostname = $HOSTNAME ip_address = $IP_ADDRESS service = $SERVICE port_status = $port_status port = $port } ($log_entry | ConvertTo-Json -Compress) | Out-File -Append -FilePath $PORT_LOG } }}
#查詢特定程序資訊tasklist /FO CSV /NH | findstr /I "PM.exe"tasklist /FO CSV /NH | findstr /I "CA.exe"

③計劃任務

cmd開啟命令taskschd.msc開啟定時任務
程序和埠監控告警每1分鐘執行
powershell -WindowStyle Hidden -File "C:\monitor-logs\port-monitor.ps1"
④SLS程序與埠告警配置
#埠查詢語句*| select timestamp,ip_address,hostname,service,port_status,port from log where port_status is not null and port is not null ORDER BY timestamp DESC limit 1000#程序查詢語句*| select timestamp,ip_address,hostname,service,process_status from log where process_status is not null ORDER BY timestamp DESC limit 1000
⑤Grafana大屏展示
內網域名狀態檢查

①Dockerfile配置

FROMpythonRUNln -sf /usr/share/zoneinfo/Asia/Shanghai /etc/localtime# 指定工作目錄不存會自己建立RUNpip install pyyaml --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simpleRUNpip install requests --upgrade -i https://pypi.tuna.tsinghua.edu.cn/simpleCOPYdomain_return_code.py /opt/ENVTZ=Asia/Shanghai# 配置環境變數資訊CMD["sleep","999"]

②檢查指令碼調式

#!/usr/bin/env pyhon# 獲取站點返回狀態碼import requestsimport yamlimport jsonimport sslimport socketimport datetimedefget_cert_expiration_date(host, port):try: host = host.split("//")[1].split("/")[0] context = ssl.create_default_context()with socket.create_connection((host, port)) as sock:with context.wrap_socket(sock, server_hostname=host) as sslsock: cert = sslsock.getpeercert() expiration_date = ssl.cert_time_to_seconds(cert['notAfter'])return expiration_dateexcept Exception as e:#print("獲取證書過期時間失敗:", e)#證書不正確返回一個2008-08-08時間return1218168000defget_status_code(url,timeout):try: requests.packages.urllib3.disable_warnings(requests.packages.urllib3.exceptions.InsecureRequestWarning) response = requests.get(url,timeout=timeout,verify=False) response_time = response.elapsed.total_seconds()#獲取ssl到期時間 cert_expiration_date = get_cert_expiration_date(url, 443)#時間戳轉換成年月日 dateArray = datetime.datetime.fromtimestamp(cert_expiration_date, datetime.timezone.utc) cert_expiration_date_format = dateArray.strftime("%Y-%m-%d")return response.status_code, response_time, cert_expiration_date_formatexcept requests.exceptions.RequestException as e:# print("請求發生錯誤:", e)returnNonedefget_domain_returncode(url,timeout): status_code = get_status_code(url,timeout=timeout)if status_code isnotNone:# print("狀態碼:", status_code)return status_code[0], status_code[1], status_code[2]else:# print(url + ": " + "504")return504,100,'2008-08-08'defread_yaml(file_path):with open(file_path, 'r') as file:try: yaml_data = yaml.safe_load(file)return yaml_dataexcept yaml.YAMLError as e: print("讀取YAML檔案時發生錯誤:", e)returnNoneif __name__ == "__main__": file_path = "/opt/domain_returncode_conf.yaml" domain_status_dict = {} reade_config = read_yaml(file_path) config = reade_config["domain"]for i in config: code = get_domain_returncode(i["Name"],timeout=i["timeout"])#print('code--------')#print(code)# print(i)for k,v in i.items(): domain_status_dict[k] = v domain_status_dict["return_code"] = code[0] domain_status_dict["response_code"] = code[1] domain_status_dict["EndDate"] = code[2] json_str = json.dumps(domain_status_dict) domain_status_dict = {} print(json_str)
cat/opt/domain_returncode_conf.yamldomain:-Name: https://admin.export.cntimeout: 15Network: intranet-Name: https://api.export.cntimeout: 15Network: intranet

③Cronjob配置服務

cataliyun_domain_code.yamlapiVersion: batch/v1beta1kind: CronJobmetadata:name: aliyun-domaincode-monitor-servernamespace: monitorspec:schedule: "*/1 * * * *"concurrencyPolicy: ForbidjobTemplate:spec:parallelism: 1completions: 1backoffLimit: 3activeDeadlineSeconds: 60ttlSecondsAfterFinished: 600template:spec:volumes:-name: domain-returncode-configconfigMap:name: domain-returncodecontainers:-name: aliyun-domaincode-monitorimage: harbor.export.cn/ops/domain-return-code:v1 #imagePullSecrets:imagePullPolicy: Alwayscommand:-/bin/sh--c-python3 /opt/domain_return_code.pyvolumeMounts:-name: domain-returncode-configmountPath: /opt/domain_returncode_conf.yamlsubPath: domain_returncode_conf.yamlrestartPolicy: OnFailurestartingDeadlineSeconds: 300---apiVersion: v1kind: ConfigMapmetadata:name: domain-returncodenamespace: monitoringdata:domain_returncode_conf.yaml: |domain:-Name: https://admin.export.cntimeout: 15Network: internet+intranetenvironment: prodvendor: zhangsanapplication_name: adminapplication_owner: zhangsantech_owner: zhangsanAssignment_group: admin Operations-Name: https://api.test.cntimeout: 15Network: internetenvironment: prodvendor: wangwuapplication_name: apiapplication_owner: testtech_owner: testAssignment_group: api Support
④日誌輸出返回資訊與SLS SQL查詢
#輸出的日誌資訊{"Name": "https://admin.export.cn", "return_code": 200, "response_code": 5.087571, "EndDate": "2025-01-21", "timeout": 15, "Network": "internet+intranet", "environment": "prod", "vendor": "zhangsan", "application_name": "admin", "application_owner": "zhangsan", "tech_owner": "zhangsan", "Assignment_group": "admin Operations"}
SLS查詢語句配置告警
#查詢證書域名證書時間(EndDate: * and _namespace_ : monitoring and _container_name_: aliyun-domaincode-monitor)| select DISTINCT Name,return_code,EndDate,date_diff('day', date_parse(split(_time_, 'T')[1], '%Y-%m-%d'), date_parse(EndDate, '%Y-%m-%d')) as days having days > 0 AND days <= 60 ORDER BY days ASC LIMIT 1000#查詢5XX的狀態碼域名返回資訊(((return_code : 5?? ))) not name:"https://view.export.cn" | SELECT DISTINCT Name,return_code,content from log LIMIT 10000
總結來說,公司A透過採用Prometheus作為監控工具,成功地實現了對Windows和Linux平臺上埠、程序和內網域名狀態的監控。透過精心設計的流程,包括工具選擇與部署、配置埠和程序監控、告警機制的建立、以及資料視覺化和最佳化,公司能夠確保其IT基礎設施的穩定性和安全性。此外,透過使用Grafana進行資料視覺化和配置告警機制,公司能夠及時通知監控與開發相關技術人員,從而提高了系統的可用性和響應速度。透過這些措施,公司A能夠有效地管理和維護其業務關鍵服務,支援業務的持續增長和擴充套件。
如有相關問題,請在文章後面給小編留言,小編安排作者第一時間和您聯絡,為您答疑解惑。


相關文章