-
OpenTracing規範
-
為什麼需要OpenTracing
-
什麼是一個Trace
-
一個典型的Trace案例
-
Skywalking
-
功能介紹
-
整體架構
-
Tracing、Logging和Metrics
-
.NET6 對接 Skywalking
-
新增依賴
-
編輯Skywalking配置檔案
skyapm.json
-
在
launchSettings.json
檔案配置SK -
在
startup.cs
檔案中新增 -
安裝CLI(SkyAPM.DotNet.CLI)
-
自動生成skyapm.json檔案
-
手動編寫skyapm.json
-
自動生成Skyapm.json
-
獲取traceId
-
自定義呼叫鏈路的資訊
-
部署Skywalking環境
-
對接.NET6 程式
-
接入微服務閘道器+後臺微服務
-
新增依賴
-
複製配置檔案並簡單修改
-
在
launchsettings.json
新增環境變數 -
啟動訂單微服務
-
新增依賴
-
複製配置檔案並簡單修改
-
在
launchsettings.json
新增環境變數 -
修改閘道器配置檔案,新增
OrderServiceInstance
微服務的路由 -
啟動閘道器
-
閘道器接入
-
訂單微服務接入
-
使用者微服務接入
-
配置Skywalking告警
-
配置告警規則
-
查閱配置規則檔案及配置規則解讀
-
修改告警規則
-
告警API編寫
OpenTracing規範
為什麼需要OpenTracing
什麼是一個Trace


一個典型的Trace案例


Skywalking
功能介紹
-
多種監控手段。可以透過語言探針和 service mesh 獲得監控是資料。
-
多個語言自動探針。包括 Java,.NET Core 和 Node.JS。
-
輕量高效。無需大資料平臺,和大量的伺服器資源。
-
模組化。UI、儲存、叢集管理都有多種機制可選。
-
支援告警。
-
優秀的視覺化解決方案。
整體架構

-
探針基於不同的來源可能是不一樣的, 但作用都是收集資料, 將資料格式化為 SkyWalking 適用的格式.
-
平臺後端是一個支援叢集模式執行的後臺, 用於資料聚合, 資料分析以及驅動資料流從探針到使用者介面的流程. 平臺後端還提供了各種可插拔的能力, 如不同來源資料(如來自 Zipkin)格式化, 不同儲存系統以及叢集管理. 你甚至還可以使用觀測分析語言來進行自定義聚合分析.
-
儲存是開放式的. 你可以選擇一個既有的儲存系統, 如 ElasticSearch, H2 或 MySQL 叢集(Sharding-Sphere 管理), 也可以選擇自己實現一個儲存系統. 當然, 我們非常歡迎你貢獻新的儲存系統實現.
-
使用者介面對於 SkyWalking 的終端使用者來說非常炫酷且強大. 同樣它也是可定製以匹配你已存在的後端的
Tracing、Logging和Metrics

.NET6 對接 Skywalking
部署Skywalking環境
version: '3.3'
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:7.5.0
container_name: elasticsearch
restart: always
ports:
- 9200:9200
environment:
- discovery.type=single-node
- bootstrap.memory_lock=true
- "ES_JAVA_OPTS=-Xms256m -Xmx256m"
ulimits:
memlock:
soft: -1
hard: -1
oap:
image: apache/skywalking-oap-server:6.6.0-es7
container_name: oap
depends_on:
- elasticsearch
links:
- elasticsearch
restart: always
ports:
- 11800:11800
- 12800:12800
environment:
SW_STORAGE: elasticsearch
SW_STORAGE_ES_CLUSTER_NODES: elasticsearch:9200
ui:
image: apache/skywalking-ui:6.6.0
container_name: ui
depends_on:
- oap
links:
- oap
restart: always
ports:
- 8080:8080
environment:
SW_OAP_ADDRESS: http://oap:12800
安裝成功以後首頁地址:http://伺服器IP:8080
對接.NET6 程式
新增依賴
<ItemGroup>
<PackageReferenceInclude="SkyAPM.Agent.AspNetCore"Version="1.3.0" />
</ItemGroup>
編輯Skywalking配置檔案skyapm.json
手動編寫skyapm.json
{
"SkyWalking":{
"ServiceName":"MySkyWalkingDemoTest",
"Namespace":"",
"HeaderVersions":[
"sw8"
],
"Sampling":{
"SamplePer3Secs":-1,
"Percentage":-1.0
},
"Logging":{
"Level":"Information",
"FilePath":"logs\\skyapm-{Date}.log"
},
"Transport":{
"Interval":3000,
"ProtocolVersion":"v8",
"QueueSize":30000,
"BatchSize":3000,
"gRPC":{
"Servers":"192.168.3.245:11800",
"Timeout":10000,
"ConnectTimeout":10000,
"ReportTimeout":600000,
"Authentication":""
}
}
}
}
自動生成Skyapm.json
安裝CLI(SkyAPM.DotNet.CLI)
dotnet tool install -g SkyAPM.DotNet.CLI
自動生成skyapm.json檔案
dotnet skyapm config [service name] [server]:11800
#eg: dotnet skyapm config MySkyWalking_OrderService 192.168.3.245:11800
SkyAPM Config 配置說明ServiceName
服務名稱Sampling
取樣配置節點
SamplePer3Secs 每3秒取樣數 Percentage 取樣百分比,例如10%取樣則配置為10Logging
日誌配置節點
Level 日誌級別 FilePath 日誌儲存路徑Transport傳輸配置節點
Interval 每多少毫秒重新整理 gRPCgRPC配置節點 Servers gRPC地址,多個用逗號“,” Timeout 建立gRPC連結的超時時間,毫秒 ConnectTimeout gRPC最長連結時間,毫秒
在launchSettings.json
檔案配置SK
"profiles":{// 專案
"IIS Express":{// IIS部署項
"commandName":"IISExpress",
"launchBrowser":true,
"launchUrl":"weatherforecast",
"environmentVariables":{
"ASPNETCORE_ENVIRONMENT":"Development",
"ASPNETCORE_HOSTINGSTARTUPASSEMBLIES":"SkyAPM.Agent.AspNetCore",
"SKYWALKING__SERVICENAME":"MySkyWalkingDemoTest"
}
},
"SkyWalkingDemo":{// castrol部署項
"commandName":"Project",
"launchBrowser":true,
"launchUrl":"weatherforecast",
"applicationUrl":"http://localhost:5000",
"environmentVariables":{
"ASPNETCORE_ENVIRONMENT":"Development",
"ASPNETCORE_HOSTINGSTARTUPASSEMBLIES":"SkyAPM.Agent.AspNetCore",// 必須配置
"SKYWALKING__SERVICENAME":"MySkyWalkingDemoTest"// 必須配置,在skywalking做標識
}
}
}
在startup.cs
檔案中新增
public
voidConfigureServices(IServiceCollection services)
{
services.AddSkyApmExtensions();
// 新增Skywalking相關配置
services.AddControllers();
services.AddHttpClient();
}
獲取traceId
private readonly IEntrySegmentContextAccessor segContext;
public
SkywalkingController(IEntrySegmentContextAccessor segContext)
{
this.segContext = segContext;
}
/// <summary>
/// 獲取連結追蹤ID
/// </summary>
/// <returns></returns>
[HttpGet(
"traceId"
)]
public
stringGetSkywalkingTraceId()
{
return
segContext.Context.TraceId;
}
自定義呼叫鏈路的資訊
[HttpGet]
public async Task<IActionResult>
SkywalkingTest()
{
//獲取全域性的skywalking的TracId
var TraceId = _segContext.Context.TraceId;
Console.WriteLine($
"TraceId={TraceId}"
);
_segContext.Context.Span.AddLog(LogEvent.Message($
"SkywalkingTest---Worker running at: {DateTime.Now}"
));
System.Threading.Thread.Sleep(
1000
);
_segContext.Context.Span.AddLog(LogEvent.Message($
"SkywalkingTest---Worker running at--end: {DateTime.Now}"
));
return
Ok($
"Ok,SkywalkingTest-TraceId={TraceId} "
);
}
接入微服務閘道器+後臺微服務
閘道器接入
新增依賴
<ItemGroup>
<PackageReferenceInclude="SkyAPM.Agent.AspNetCore"Version="1.3.0" />
</ItemGroup>
複製配置檔案並簡單修改
{
"SkyWalking":{
"ServiceName":"MySkyWalking_Gateway",
#修改名稱就OK
"Namespace":"",
"HeaderVersions":[
"sw8"
],
"Sampling":{
"SamplePer3Secs":-1,
"Percentage":-1.0
},
"Logging":{
"Level":"Debug",
"FilePath":"logs\\skyapm-{Date}.log"
},
"Transport":{
"Interval":3000,
"ProtocolVersion":"v8",
"QueueSize":30000,
"BatchSize":3000,
"gRPC":{
"Servers":"192.168.3.245:11800",
"Timeout":10000,
"ConnectTimeout":10000,
"ReportTimeout":600000,
"Authentication":""
}
}
}
}
在launchsettings.json
新增環境變數
"profiles":{
"Zhaoxi.MicroService.GatewayCenter":{
"commandName":"Project",
"dotnetRunMessages":true,
"launchBrowser":true,
"launchUrl":"swagger",
"applicationUrl":"https://localhost:7141;http://localhost:5141",
"environmentVariables":{
"ASPNETCORE_ENVIRONMENT":"Development",
"ASPNETCORE_HOSTINGSTARTUPASSEMBLIES":"SkyAPM.Agent.AspNetCore",
#新增HOST變數
"SKYWALKING__SERVICENAME":"MySkyWalking_Gateway"
#新增服務名稱
}
},
"IIS Express":{
"commandName":"IISExpress",
"launchBrowser":true,
"launchUrl":"swagger",
"environmentVariables":{
"ASPNETCORE_ENVIRONMENT":"Development",
"ASPNETCORE_HOSTINGSTARTUPASSEMBLIES":"SkyAPM.Agent.AspNetCore",
"SKYWALKING__SERVICENAME":"MySkyWalking_Gateway"
}
}
}
修改閘道器配置檔案,新增OrderServiceInstance
微服務的路由
{
"DownstreamPathTemplate":"/api/{url}",//服務地址--url變數
"DownstreamScheme":"http",
"UpstreamPathTemplate":"/microservice/{url}",//閘道器地址--url變數
"UpstreamHttpMethod":["Get","Post"],
"UseServiceDiscovery":true,
"ServiceName":"OrderService",//consul服務名稱
"LoadBalancerOptions":{
"Type":"RoundRobin"//輪詢
}
啟動閘道器
dotnet run --urls=http
://*:6299
訂單微服務接入
新增依賴
<ItemGroup>
<PackageReferenceInclude="SkyAPM.Agent.AspNetCore"Version="1.3.0" />
</ItemGroup>
複製配置檔案並簡單修改
{
"SkyWalking":{
"ServiceName":"MySkyWalking_OrderService",
"Namespace":"",
"HeaderVersions":[
"sw8"
],
"Sampling":{
"SamplePer3Secs":-1,
"Percentage":-1.0
},
"Logging":{
"Level":"Debug",
"FilePath":"logs\\skyapm-{Date}.log"
},
"Transport":{
"Interval":3000,
"ProtocolVersion":"v8",
"QueueSize":30000,
"BatchSize":3000,
"gRPC":{
"Servers":"192.168.3.245:11800",
"Timeout":10000,
"ConnectTimeout":10000,
"ReportTimeout":600000,
"Authentication":""
}
}
}
}
在launchsettings.json
新增環境變數
"profiles":{
"Zhaoxi.MicroService.OrderServiceInstance":{
"commandName":"Project",
"dotnetRunMessages":true,
"launchBrowser":true,
"launchUrl":"swagger",
"applicationUrl":"http://192.168.3.105:7900",
"environmentVariables":{
"ASPNETCORE_ENVIRONMENT":"Development",
"ASPNETCORE_HOSTINGSTARTUPASSEMBLIES":"SkyAPM.Agent.AspNetCore",
"SKYWALKING__SERVICENAME":"MySkyWalking_OrderService"
}
},
"IIS Express":{
"commandName":"IISExpress",
"launchBrowser":true,
"launchUrl":"swagger",
"environmentVariables":{
"ASPNETCORE_ENVIRONMENT":"Development"
}
}
}
啟動訂單微服務
dotnet run
使用者微服務接入
配置Skywalking告警
配置告警規則
docker exec -it 12f053748e85 /bin/sh
ls -l

查閱配置規則檔案及配置規則解讀
cat alarm-settings.yml
可以查閱檔案內容,如下:
docker cp 12f053748e85:/skywalking/config/alarm-settings.yml .
# Sample alarm rules.
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
metrics-name: service_resp_time
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
service_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_sla
op: "<"
threshold: 8000
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
service_resp_time_percentile_rule:
# Metrics value need to be long, double or int
metrics-name: service_percentile
op: ">"
threshold: 1000,1000,1000,1000,1000
period: 10
count: 3
silence-period: 5
message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
service_instance_resp_time_rule:
metrics-name: service_instance_resp_time
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
database_access_resp_time_rule:
metrics-name: database_access_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
endpoint_relation_resp_time_rule:
metrics-name: endpoint_relation_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
# Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.
# Because the number of endpoint is much more than service and instance.
#
# endpoint_avg_rule:
# metrics-name: endpoint_avg
# op: ">"
# threshold: 1000
# period: 10
# count: 2
# silence-period: 5
# message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes
webhooks:
# - http://127.0.0.1/notify/
# - http://127.0.0.1/go-wechat/
rule name
_rule
metrics name
exclude-names
threshold
op
period
count
count
silence period
silence period
修改告警規則
rules:
service_test_sal_rule:
# 指定指標名稱
metrics-name: service_test_sal
# 小於
op: "<"
# 指定閾值
threshold: 8000
# 每2分鐘檢測告警該規則
period: 2
# 觸發1次規則就告警
count: 1
# 設定三分鐘內容相同告警,不重複告警
silence-period: 3
# 配置告警資訊
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
告警API編寫
public
classAlarmMsg
{
public
int
scopeId { get;
set
; }
public
string
? scope { get;
set
; }
public
string
? name { get;
set
; }
public
string
? id0 { get;
set
; }
public
string
? id1 { get;
set
; }
public
string
? ruleName { get;
set
; }
public
string
? alarmMessage { get;
set
; }
}
/// <summary>
/// 告警API
/// </summary>
/// <param name="msgs"></param>
/// <returns></returns>
[HttpPost(
"AlarmMsg"
)]
public
voidAlarmMsg(List<AlarmMsg> msgs)
{
string
msg =
"觸發告警:"
;
msg += msgs.FirstOrDefault()?.alarmMessage;
Console.WriteLine(msg);
SendMail(msg);
}
http://192.168.3.105:7900/api/Skywalking/AlarmMsg
# Sample alarm rules.
rules:
# Rule unique name, must be ended with `_rule`.
service_resp_time_rule:
metrics-name: service_resp_time
op: ">"
threshold: 1000
period: 10
count: 3
silence-period: 5
message: Response time of service {name} is more than 1000ms in 3 minutes of last 10 minutes.
service_sla_rule:
# Metrics value need to be long, double or int
metrics-name: service_sla
op: "<"
threshold: 8000
# The length of time to evaluate the metrics
period: 10
# How many times after the metrics match the condition, will trigger alarm
count: 2
# How many times of checks, the alarm keeps silence after alarm triggered, default as same as period.
silence-period: 3
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
service_resp_time_percentile_rule:
# Metrics value need to be long, double or int
metrics-name: service_percentile
op: ">"
threshold: 1000,1000,1000,1000,1000
period: 10
count: 3
silence-period: 5
message: Percentile response time of service {name} alarm in 3 minutes of last 10 minutes, due to more than one condition of p50 > 1000, p75 > 1000, p90 > 1000, p95 > 1000, p99 > 1000
service_instance_resp_time_rule:
metrics-name: service_instance_resp_time
op: ">"
threshold: 1000
period: 10
count: 2
silence-period: 5
message: Response time of service instance {name} is more than 1000ms in 2 minutes of last 10 minutes
database_access_resp_time_rule:
metrics-name: database_access_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of database access {name} is more than 1000ms in 2 minutes of last 10 minutes
endpoint_relation_resp_time_rule:
metrics-name: endpoint_relation_resp_time
threshold: 1000
op: ">"
period: 10
count: 2
message: Response time of endpoint relation {name} is more than 1000ms in 2 minutes of last 10 minutes
# Active endpoint related metrics alarm will cost more memory than service and service instance metrics alarm.
# Because the number of endpoint is much more than service and instance.
#
# endpoint_avg_rule:
# metrics-name: endpoint_avg
# op: ">"
# threshold: 1000
# period: 10
# count: 2
# silence-period: 5
# message: Response time of endpoint {name} is more than 1000ms in 2 minutes of last 10 minutes
webhooks:
- http://192.168.3.105:7900/api/Skywalking/AlarmMsg
# - http://127.0.0.1/go-wechat/
rules:
# 告警規則名稱,必須唯一,以_rule結尾
service_sla_rule:
# 指定metrics-name
metrics-name: service_sla
# 小於
op: "<"
# 指定閾值
threshold: 8000
# 10分鐘檢測一次告警規則
period: 10
# 觸發2次告警規則就告警
count: 2
# 設定的3分鐘時間段有相同的告警,不重複告警.
silence-period: 3
# 配置告警訊息
message: Successful rate of service {name} is lower than 80% in 2 minutes of last 10 minutes
webhooks:
- http://192.168.3.105:7900/api/Skywalking/AlarmMsg

文末福利

即將步入2025年,不少小夥伴在考慮來年的工作方向。
僅目前來說,傳統運維衝擊年薪30W+的轉型方向就是SRE&DevOps崗位。









