目录
article
Actuator Health & Nacos
Actuator Health & Nacos
Add Actuator
添加 spring-boot-starter-actuator 依赖
GET http://localhost:8080/actuator/health
cache-control: no-cache
---
{"status":"UP"}
展示 /health 细节
添加如下配置,/health 将返回消息信息。
management:
endpoint:
health:
show-details: always
默认的 Health Indicators
DataSourceHealthIndicatorMongoHealthIndicatorNeo4jHealthIndicatorCassandraHealthIndicatorRedisHealthIndicatorCassandraHealthIndicatorRabbitHealthIndicatorCouchbaseHealthIndicatorDiskSpaceHealthIndicatorElasticsearchHealthIndicatorInfluxDbHealthIndicatorJmsHealthIndicatorMailHealthIndicatorSolrHealthIndicator
自定义 Health Indicator
@Component
public class ServiceAHealthIndicator implements HealthIndicator {
private final String message_key = "Service A";
@Override
public Health health() {
if (!isRunningServiceA()) {
return Health.down().withDetail(message_key, "Not Available").build();
}
return Health.up().withDetail(message_key, "Available").build();
}
private Boolean isRunningServiceA() {
Boolean isRunning = true;
// Logic Skipped
return isRunning;
}
}
自定义 Health Indicator
@Override
public Health health() {
// check cache is available
Cache cache = cacheManager.getCache("mycache");
if (cache == null) {
LOG.warn("Cache not available");
return Health.down().withDetail("smoke test", "cache not available").build();
}
// check db available
try (Connection connection = dataSource.getConnection()) {
} catch (SQLException e) {
LOG.warn("DB not available");
return Health.down().withDetail("smoke test", e.getMessage()).build();
}
// check some service url is reachable
try {
URL url = new URL(resUrl);
int port = url.getPort();
if (port == -1) {
port = url.getDefaultPort();
}
try (Socket socket = new Socket(url.getHost(), port)) {
} catch (IOException e) {
LOG.warn("Failed to open socket to " + resUrl);
return Health.down().withDetail("smoke test", e.getMessage()).build();
}
} catch (MalformedURLException e1) {
LOG.warn("Malformed URL: " + resUrl);
return Health.down().withDetail("smoke test", e1.getMessage()).build();
}
return Health.up().build();
}
NacosConfigHealthIndicator
添加 Nacos 配置中心后,默认会自动添加 nacosConfig 组件的健康检查。
package com.alibaba.cloud.nacos.endpoint;
import com.alibaba.nacos.api.config.ConfigService;
import org.springframework.boot.actuate.health.AbstractHealthIndicator;
import org.springframework.boot.actuate.health.Health;
/**
* @author xiaojing
*/
public class NacosConfigHealthIndicator extends AbstractHealthIndicator {
private final ConfigService configService;
public NacosConfigHealthIndicator(ConfigService configService) {
this.configService = configService;
}
@Override
protected void doHealthCheck(Health.Builder builder) throws Exception {
builder.up();
String status = configService.getServerStatus();
builder.status(status);
}
}
Nacos 能正常访问时:
{
"status": "UP",
"components": {
"nacosConfig": {
"status": "UP"
}
}
}
Nacos 不能正常访问时:
{
"status": "DOWN",
"components": {
"nacosConfig": {
"status": "DOWN"
}
}
}
当 Nacos 配置服务区宕机时,服务中长连接会失败并触发重试:
com.alibaba.nacos.client.config.http.ServerHttpAgent [NACOS ConnectException httpPost] currentServerAddr: http://127.0.0.1:8001, err : Connection refused: connect
当达到最大重试次数时,会报如下错误,同时健康检查会变为 DOWN 。
com.alibaba.nacos.client.config.impl.ClientWorker longPolling error :
java.net.ConnectException: [NACOS HTTP-POST] The maximum number of tolerable server reconnection errors has been reached
由于在 K8S 使用健康检查的 /health 接口来验证服务的存活,本以为 Naocs 宕机时仅会影响之后创建的服务容器,使用了 /health 做健康检查之后,容器的健康检查会失败,并导致服务自动重启。而启动时由于连不上 Nacos 服务器,进而导致服务启动失败,如此往复。