对象及Nested对象

杨浩成收录于 elastickSearch

2022-06-18 约 998 字预计阅读 2 分钟次阅读

数据库的范式化设计

范式化设计的主要目的是”减少不必要的更新“
副作用：一个完全范式化设计的数据库会经常面临”查询缓慢“的问题
- 数据库越范式化，查询时需要join的表就越多
范式化节省了存储空间，但存储却越来越便宜
范式化简化了更新，但数据“读”取可能更多

反范式化设计

尽量不使用关联关系，而是在文档中保存冗余数据的拷贝
优点：无需处理join操作，数据读取性能好
- es 通过压缩 _source字段，减少磁盘空间的开销
缺点：不适合在数据频繁修改的场景
- 一条数据的改动，可能会引起很多数据的更新

在es中处理关联关系

关系型数据库，一般采用范式化设计，在es中，往往考虑反范式化设计
- 反范式化的好处：无需表连接 /读取速度更快/无需行锁
es并不擅长处理关联关系，我们一般采用以下4中方式处理关联关系
- 对象类型
- 嵌套对象
- 父子关联关系
- 应用端关联

案例一：博客和作者信息

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22


PUT blog/_doc/1
{
  "content":"I like Elasticsearch",
  "time":"2019-01-01T00:00:00",
  "user":{
    "userid":1,
    "username":"Jack",
    "city":"Shanghai"
  }
}

POST blog/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"content": "Elasticsearch"}},
        {"match": {"user.username": "Jack"}}
      ]
    }
  }
}

案例二：电影和演员

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30


# 写入一条电影信息
POST my_movies/_doc/1
{
  "title":"Speed",
  "actors":[
    {
      "first_name":"Keanu",
      "last_name":"Reeves"
    },

    {
      "first_name":"Dennis",
      "last_name":"Hopper"
    }

  ]
}

# 查询电影信息
POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"actors.first_name": "Keanu"}},
        {"match": {"actors.last_name": "Hopper"}}
      ]
    }
  }
}

为什么搜不到需要的结果？

存储时，内部对象的边界并没有考虑在内，json格式被处理成扁平的键值对的结构

1
2
3


"title":"Speed"
"actors.first_name":[Keanu,Dennis]
"actors.last_name":[Reeves,Hopper]

当对多个字段进行查询时，出现了意外结构
可以使用 nested Data Type 解决这个问题

什么是Nested Data Type

允许对象数组中的对象被独立索引
在内部，nested文档会被保存为多个lucene文档，在查询时做join处理

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18


PUT my_movies
{
      "mappings" : {
      "properties" : {
        "actors" : {
          "type": "nested",
          "properties" : {
            "first_name" : {"type" : "keyword"},
            "last_name" : {"type" : "keyword"}
          }},
        "title" : {
          "type" : "text",
          "fields" : {"keyword":{"type":"keyword","ignore_above":256}}
        }
      }
    }
}

Nested 查询

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29


# Nested 查询
POST my_movies/_search
{
  "query": {
    "bool": {
      "must": [
        {"match": {"title": "Speed"}},
        {
          "nested": {
            "path": "actors",
            "query": {
              "bool": {
                "must": [
                  {"match": {
                    "actors.first_name": "Keanu"
                  }},

                  {"match": {
                    "actors.last_name": "Hopper"
                  }}
                ]
              }
            }
          }
        }
      ]
    }
  }
}

Nested 聚合

先指定nested路径，然后再嵌套聚合语句

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20


# Nested Aggregation
POST my_movies/_search
{
  "size": 0,
  "aggs": {
    "actors": {
      "nested": {
        "path": "actors"
      },
      "aggs": {
        "actor_name": {
          "terms": {
            "field": "actors.first_name",
            "size": 10
          }
        }
      }
    }
  }
}

小结

es建模时，针对关联关系一般采用反范式化设计
对于一对多的关系，使用对象类型，会导致多个对象被扁平化为JSON数组，导致搜索结果不正确
在文档更新频率不高的情况，可以使用nested来解决这个问题
nested在查询和聚合的时候需要指定path

原文档

https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-nested-query.html

目录

对象及Nested对象

数据库的范式化设计

反范式化设计

在es中处理关联关系

为什么搜不到需要的结果？

什么是Nested Data Type

Nested 查询

Nested 聚合

小结

原文档