elasticsearch查询之大数据集分页查询

一、 要解决的问题

{ "error": { "root_cause": [ { "type": "query_phase_execution_exception", "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting." } ], "type": "search_phase_execution_exception", "reason": "all shards failed", "phase": "query", "grouped": true, "failed_shards": [ { "shard": 0, "index": "shirts", "node": "OBkTpZcTQJ25kmlNZ6xyLg", "reason": { "type": "query_phase_execution_exception", "reason": "Result window is too large, from + size must be less than or equal to: [10000] but was [10001]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting." } } ] }, "status": 500 } 

二、elasticsearch支持的分页方式

elasticsearch提供了三种分页的查询方式,以支持不同的查询场景;

from + size

search after

scroll

以下测试使用的是 elasticsearch 6.8

三、 from + size 分页

from + size是使用最普遍的search分页方案;

from: 设置要返回第一条记录的相对位置,默认值为0;

size: 此次search返回的最大记录数量, 默认值为10;

我们可以直接从第11条记录开始,返回最多10条记录;

GET my_store_index/_search { "query": { "match": { "name": "bj" } }, "from": 10, "size": 10 } 

此种分页方式特点

四、search after

即使我们将max_result_window调整成一个更大的值,但是当我们命中的结果比较多的时候,使用from + size的分页效果就会比较差;

elasticsearch提供的search after可以帮助我们解决这个问题;search after可以利用请求中包含的上一页的信息来帮助查询下一页的记录;

起始搜索如下,需要添加sort字段,并使用id作为排序字段;这里的排序字段需要确保每个document都是不同的,这样才能确保排序的唯一性;

GET my_store_index/_search { "_source": false, "query": { "match": { "name": "bj" } }, "size": 10, "sort": [ { "id": { "order": "desc" } } ] } 

我们可以看到返回结果中包含了sort字段,里边包含了命中记录对应的sort的值;

{ "took" : 1, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 25, "max_score" : null, "hits" : [ { "_index" : "my_store_index", "_type" : "_doc", "_id" : "f64cf9f4-db2b-4059-bf97-315fe95f233c", "_score" : null, "sort" : [ "f64cf9f4-db2b-4059-bf97-315fe95f233c" ] } ] } } 

我们使用上一个请求中的sort的值作为以下请求中search_after的参数,来查询下一页的数据;

GET my_store_index/_search { "_source": false, "query": { "match": { "name": "bj" } }, "size": 10, "search_after":["f64cf9f4-db2b-4059-bf97-315fe95f233c"], "sort": [ { "id": { "order": "desc" } } ] } 

此种分页方式的特点

我们分别使用from+size、search after查询第二个10000条记录,查看两者执行时间,可以发现search after快将近2s;

GET my_store_index/_search { "_source":["id"], "query": { "match_phrase_prefix": { "deviceData.content": "us" } }, "size": 10000, "from": 10000 } { "took":8212, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":29908, "max_score":97.09149 } } 
GET my_store_index/_search { "_source":["id"], "query": { "match_phrase_prefix": { "deviceData.content": "us" } }, "size": 10000, "sort": [ { "id": { "order": "desc" } } ], "search_after":["aa877c87-bb08-4fbd-8a51-ed4ebaa57251"] } { "took":6320, "timed_out":false, "_shards":{ "total":5, "successful":5, "skipped":0, "failed":0 }, "hits":{ "total":29908, "max_score":null } } 

五、scroll

elasticsearch提供的scroll可以实现一个请求返回所有命中记录,我们可以使用类似关系数据库中的游标的方式来获取命中的记录; scroll并不是为了实现实时的搜索请求,更多的是为了处理大量的数据,尤其适合从某一个index进行重新索引;

为了使用scroll,我们需要在url里通过scroll指定elasticsearch需要保留搜索结果的时间;

GET my_store_index/_search?scroll=1m { "_source": false, "query": { "match": { "name": "bj" } }, "size": 2 } { "_scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAb0Fk9Ca1RwWmNUUUoyNWttbE5aNnh5TGcAAAAAAAAG8xZPQmtUcFpjVFFKMjVrbWxOWjZ4eUxnAAAAAAAABvUWT0JrVHBaY1RRSjI1a21sTlo2eHlMZwAAAAAAAAb2Fk9Ca1RwWmNUUUoyNWttbE5aNnh5TGcAAAAAAAAG9xZPQmtUcFpjVFFKMjVrbWxOWjZ4eUxn", "took" : 2, "timed_out" : false, "_shards" : { "total" : 5, "successful" : 5, "skipped" : 0, "failed" : 0 }, "hits" : { "total" : 25, "max_score" : 3.5134304, "hits" : [ ] } } 

我们使用上边请求返回的_scroll_id来获取下一页的数据;

GET _search/scroll { "scroll" : "1m", "scroll_id" : "DnF1ZXJ5VGhlbkZldGNoBQAAAAAAAAdlFk9Ca1RwWmNUUUoyNWttbE5aNnh5TGcAAAAAAAAHZhZPQmtUcFpjVFFKMjVrbWxOWjZ4eUxnAAAAAAAAB2cWT0JrVHBaY1RRSjI1a21sTlo2eHlMZwAAAAAAAAdoFk9Ca1RwWmNUUUoyNWttbE5aNnh5TGcAAAAAAAAHaRZPQmtUcFpjVFFKMjVrbWxOWjZ4eUxn" } 

此种分页方式的特点

本网页由快兔兔AI采集器生成,目的为演示采集效果,若侵权请及时联系删除。

原文链接:https://www.cnblogs.com/wufengtinghai/p/15870021.html

更多内容