Skip to content

如何设置 SkyWalking 中 Trace 记录的保留时间(过期时间)?

🏷️ SkyWalking

之前配置 SkyWalking 时没有注意,导致请求的详细信息在记录不久之后(2 个小时不到)就查不到了。

查到在官方文档 TTL 中的如下描述:

TTL

In SkyWalking, there are two types of observability data, besides metadata.

  1. Record, including trace and alarm. Maybe log in the future.
  2. Metric, including such as p99/p95/p90/p75/p50, heatmap, success rate, cpm(rpm) etc.
    Metric is separated in minute/hour/day/month dimensions in storage, different indexes or tables.

You have following settings for different types.

yaml
# Set a timeout on metrics data. After the timeout has expired, the metrics data will automatically be deleted.
enableDataKeeperExecutor: ${SW_CORE_ENABLE_DATA_KEEPER_EXECUTOR:true} # Turn it off then automatically metrics data delete will be close.
recordDataTTL: ${SW_CORE_RECORD_DATA_TTL:90} # Unit is minute
minuteMetricsDataTTL: ${SW_CORE_MINUTE_METRIC_DATA_TTL:90} # Unit is minute
hourMetricsDataTTL: ${SW_CORE_HOUR_METRIC_DATA_TTL:36} # Unit is hour
dayMetricsDataTTL: ${SW_CORE_DAY_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_CORE_MONTH_METRIC_DATA_TTL:18} # Unit is month
  • recordDataTTL affects Record data.
  • minuteMetricsDataTTL, hourMetricsDataTTL, dayMetricsDataTTL and monthMetricsDataTTL affects metrics data in minute/hour/day/month dimensions.

ElasticSearch 6 storage TTL

Specifically:
Because of the feature of ElasticSearch, it rebuilds the index after executing delete by query command.
That is a heavy operation, it will hang up the ElasticSearch server for a few seconds each time. The fact is there are above hundred indexes which may cause ElasticSearch out of service unexpected.
So, we create the index by day to avoid execute delete by query operation,
then delete the index directly, this is a high performance operation, say goodbye to hung up.

You have following settings in Elasticsearch storage.

yaml
# Those data TTL settings will override the same settings in core module.
recordDataTTL: ${SW_STORAGE_ES_RECORD_DATA_TTL:7} # Unit is day
otherMetricsDataTTL: ${SW_STORAGE_ES_OTHER_METRIC_DATA_TTL:45} # Unit is day
monthMetricsDataTTL: ${SW_STORAGE_ES_MONTH_METRIC_DATA_TTL:18} # Unit is month
  • recordDataTTL affects Record data.
  • otherMetricsDataTTL affects minute/hour/day dimensions of metrics. minuteMetricsDataTTL, hourMetricsDataTTL and dayMetricsDataTTL are still there, but the Unit of them changed to DAY too. If you want to set them manually, please remove otherMetricsDataTTL.
  • monthMetricsDataTTL affects month dimension of metrics.

对策

估计是因为用的版本(ElasticSearch 5)不一样,Trace 记录过期时间的配置项是 traceDataTTL,不是上面说的 recordDataTTL

默认配置的是 90 分钟,所以在也就查不到 1 个半小时之前的 Trace 记录了。
将其修改为自己想要的值再重启 SkyWalking 就可以更改 Trace 记录的保存时间了。

yaml
# Set a timeout on metric data. After the timeout has expired, the metric data will automatically be deleted.
traceDataTTL: 90 # Unit is minute
minuteMetricDataTTL: 90 # Unit is minute
hourMetricDataTTL: 36 # Unit is hour
dayMetricDataTTL: 45 # Unit is day
monthMetricDataTTL: 18 # Unit is month