<!-- # [Elasticsearch 5.5] 使用 analysis-smartcn 插件实现中文分词 --> <!-- elasticsearch-55-chinese-analyzer-smartcn --> [Elasticsearch](/blog?t=elasticsearch) 默认的分词是英文的,这会导致中文会以字为单位分组,而不是以词语来分组。 若要实现中文的分词,则需要借助中文的分析器插件。这里使用的是 *analysis-smartcn* 插件。 安装 *analysis-smartcn* 插件: ```bash bin\elasticsearch-plugin install analysis-smartcn ``` **注意**:本文皆是 Windows 下执行的命令, Linux 下命令稍有区别。 移除 *analysis-smartcn* 插件: ```bash bin\elasticsearch-plugin remove analysis-smartcn ``` 使用 *smartcn* 分析器的字段结构如下: ```json "description": { "analyzer": "smartcn", "type": "text" }, ``` NEST代码示例和完整的索引结构参考后面的 *附1. 使用 NEST 创建索引(.NET Core)* 和 *附2. 索引的完整结构*。 以 *学然后知不足* 为例,使用 *smartcn* 时会自动被分词为如下四个词组(默认分词时则是每个汉字一组) - 学 - 然后 - 知 - 不足 使用 `match`、`term`、`wildcard` 查询时需使用整个词组(如 *然后*),单使用 *然* 或 *后* 时查询不到数据。 **使用分词后可能会使查询结果变少,但查询结果会更精确。** **附1. 使用 NEST 创建索引(.NET Core)** 安装 NEST 包 ```powershell Install-Package NEST -Version 5.5.0 ``` *Program.cs* ```csharp using Nest; using System; namespace NestSample { class Program { static void Main(string[] args) { var settings = new ConnectionSettings(new Uri("http://localhost:9200")) .DefaultIndex("people"); var client = new ElasticClient(settings); var person = new Person { Id = 1, FirstName = "佳佳", LastName = "刘", Description = "学然后知不足", }; var createIndexResponse = client.CreateIndex("people", c => c .Mappings(ms => ms .Map<Person>(m => m.AutoMap()) ) ); var indexResponse = client.Index(person); } } } ``` *Person.cs* ```csharp using Nest; namespace NestSample { [ElasticsearchType(Name = "person")] public class Person { public int Id { get; set; } public string FirstName { get; set; } public string LastName { get; set; } [Text(Analyzer = "smartcn")] public string Description { get; set; } } } ``` **附2. 索引的完整结构** ```json { "state": "open", "settings": { "index": { "creation_date": "1570518070722", "number_of_shards": "5", "number_of_replicas": "1", "uuid": "bt-ghp4UTPqb1b6mrfEkoQ", "version": { "created": "5050099" }, "provided_name": "people" } }, "mappings": { "person": { "properties": { "firstName": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } } }, "lastName": { "type": "text", "fields": { "keyword": { "ignore_above": 256, "type": "keyword" } } }, "description": { "analyzer": "smartcn", "type": "text" }, "id": { "type": "long" } } } }, "aliases": [], "primary_terms": { "0": 1, "1": 1, "2": 1, "3": 1, "4": 1 }, "in_sync_allocations": { "0": [ "c_GjRliYSuyqunST3L2fLw" ], "1": [ "lPnj8fxhSyy97oSDEnlapA" ], "2": [ "FAeXWNHdRjWoAeAbUQJeCA" ], "3": [ "PAfwYeabTSKf06EWYfWqtA" ], "4": [ "UbCDcsqTQC-gUInZQacMyg" ] } } ``` [1]: https://www.elastic.co/guide/en/elasticsearch/plugins/5.5/analysis-smartcn.html (Smart Chinese Analysis Plugin) [2]: https://www.elastic.co/guide/en/elasticsearch/client/net-api/5.x/attribute-mapping.html (Attribute mapping) Loading... 版权声明:本文为博主「佳佳」的原创文章,遵循 CC 4.0 BY-NC-SA 版权协议,转载请附上原文出处链接及本声明。 原文链接:https://www.liujiajia.me/2019/10/8/elasticsearch-55-chinese-analyzer-smartcn 提交