Source:
https://dzone.com/articles/23-useful-elasticsearch-example-queries
Tim Ojo breaks down 23 different Elasticsearch example queries.
Don't forget to bookmark this article for quick reference when you need it!
To illustrate the different query types in Elasticsearch, we will
be searching a collection of book documents with the following fields: title,
authors, summary, release date, and number of reviews.
But first, let’s create
a new index and index some documents using the bulk
API:
PUT /bookdb_index
{ "settings":
{ "number_of_shards": 1 }}
POST /bookdb_index/book/_bulk
{ "index": {
"_id": 1 }}
{ "title":
"Elasticsearch: The Definitive Guide", "authors":
["clinton gormley", "zachary tong"], "summary" :
"A distibuted real-time search and analytics engine",
"publish_date" : "2015-02-07", "num_reviews": 20,
"publisher": "oreilly" }
{ "index": {
"_id": 2 }}
{ "title":
"Taming Text: How to Find, Organize, and Manipulate It",
"authors": ["grant ingersoll", "thomas morton",
"drew farris"], "summary" : "organize text using
approaches such as full-text search, proper name recognition, clustering,
tagging, information extraction, and summarization",
"publish_date" : "2013-01-24", "num_reviews": 12,
"publisher": "manning" }
{ "index": {
"_id": 3 }}
{ "title":
"Elasticsearch in Action", "authors": ["radu
gheorge", "matthew lee hinman", "roy russo"],
"summary" : "build scalable search applications using
Elasticsearch without having to do complex low-level programming or understand
advanced data science algorithms", "publish_date" :
"2015-12-03", "num_reviews": 18, "publisher":
"manning" }
{ "index": {
"_id": 4 }}
{ "title":
"Solr in Action", "authors": ["trey grainger",
"timothy potter"], "summary" : "Comprehensive guide to
implementing a scalable search engine using Apache Solr",
"publish_date" : "2014-04-05", "num_reviews": 23,
"publisher": "manning" }
Examples
Basic Match Query
There are two ways of
executing a basic full-text (match) query: using the Search Lite API which
expects all the search parameters to be passed in as part of the URL or using
the full JSON request body which allows you use the full Elasticsearch DSL.
Here is a basic match
query that searches for the string “guide” in all the fields:
GET /bookdb_index/book/_search?q=guide
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
1.3278645,
"_source": {
"title":
"Solr in Action",
"authors": [
"trey
grainger",
"timothy
potter"
],
"summary":
"Comprehensive guide to implementing a scalable search engine using Apache
Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher":
"manning"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
1.2871116,
"_source": {
"title":
"Elasticsearch: The Definitive Guide",
"authors": [
"clinton
gormley",
"zachary
tong"
],
"summary":
"A distibuted real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher":
"oreilly"
}
}
]
The full body version of
this query is shown below and produces the same results as the above search
lite.
{
"query": {
"multi_match" : {
"query" : "guide",
"fields" : ["title", "authors",
"summary", "publish_date", "num_reviews",
"publisher"]
}
}
}
The multi_match keyword
is used in place of the match keyword as a convenient shorthand way of
running the same query against multiple fields. The fields property
specifies what fields to query against and, in this case, we want to query
against all the fields in the document.
Note: Prior to
ElasticSearch 6 you could use the "_all" field to find a
match in all the fields instead of having to specify each field. The "_all" field works by
concatenating all the fields into one big field, using space as a delimiter and
then analyzing and indexing the field. In ES6, this functionality has been
deprecated and disabled by default. ES6 provides the "copy_to"
parameter if you are interested in creating a custom "_all" field. See the ElasticSearch
Guide for more info.
The SearchLite API also
allows you to specify what fields you want to search on. For example, to search
for books with the words “in Action” in the title field:
GET /bookdb_index/book/_search?q=title:in action
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
1.6323128,
"_source": {
"title":
"Elasticsearch in Action",
"authors": [
"radu
gheorge",
"matthew lee
hinman",
"roy
russo"
],
"summary":
"build scalable search applications using Elasticsearch without having to
do complex low-level programming or understand advanced data science
algorithms",
"publish_date": "2015-12-03",
"num_reviews": 18,
"publisher":
"manning"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
1.6323128,
"_source": {
"title": "Solr
in Action",
"authors": [
"trey
grainger",
"timothy
potter"
],
"summary":
"Comprehensive guide to implementing a scalable search engine using Apache
Solr",
"publish_date": "2014-04-05",
"num_reviews": 23,
"publisher":
"manning"
}
}
]
However, the full body
DSL gives you more flexibility in creating more complicated queries (as we will
see later) and in specifying how you want the results back. In the example
below, we specify the number of results we want back, the offset to start from
(useful for pagination), the document fields we want to be returned, and term
highlighting. Note that we use a "match" query instead of
a "multi_match" query because we only care about
searching in the title field.
POST /bookdb_index/book/_search
{
"query": {
"match" :
{
"title" : "in action"
}
},
"size": 2,
"from": 0,
"_source": [
"title", "summary", "publish_date" ],
"highlight": {
"fields" :
{
"title" : {}
}
}
}
[Results]
"hits": {
"total": 2,
"max_score":
1.6323128,
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
1.6323128,
"_source": {
"summary":
"build scalable search applications using Elasticsearch without having to
do complex low-level programming or understand advanced data science
algorithms",
"title":
"Elasticsearch in Action",
"publish_date": "2015-12-03"
},
"highlight":
{
"title": [
"Elasticsearch <em>in</em>
<em>Action</em>"
]
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
1.6323128,
"_source": {
"summary":
"Comprehensive guide to implementing a scalable search engine using Apache
Solr",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
},
"highlight":
{
"title": [
"Solr
<em>in</em> <em>Action</em>"
]
}
}
]
Note: For multi-word
queries, the match query lets you specify whether to use
the and operator instead of the default or operator. You can
also specify the minimum_should_match option to tweak the relevance of the
returned results. Details can be found in the Elasticsearch guide.
Boosting
Since we are searching
across multiple fields, we may want to boost the scores in a certain field. In
the contrived example below, we boost scores from the summary field by a factor
of 3 in order to increase the importance of the summary field, which will, in
turn, increase the relevance of document _id 4.
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query" : "elasticsearch guide",
"fields": ["title", "summary^3"]
}
},
"_source":
["title", "summary", "publish_date"]
}
[Results]
"hits": {
"total": 3,
"max_score":
3.9835935,
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
3.9835935,
"_source": {
"summary":
"Comprehensive guide to implementing a scalable search engine using Apache
Solr",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
3.1001682,
"_source": {
"summary":
"build scalable search applications using Elasticsearch without having to
do complex low-level programming or understand advanced data science
algorithms",
"title":
"Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
2.0281231,
"_source": {
"summary":
"A distibuted real-time search and analytics engine",
"title":
"Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
Note: Boosting does not
merely imply that the calculated score gets multiplied by the boost factor. The
actual boost value that is applied goes through normalization and some internal
optimization. More information on how boosting works can be found in the Elasticsearch
guide.
Bool Query
The AND/OR/NOT operators
can be used to fine tune our search queries in order to provide more relevant
or specific results. This is implemented in the search API as a bool query. The bool query accepts
a must parameter (equivalent to AND), a must_not parameter
(equivalent to NOT), and a should parameter (equivalent to OR). For example,
if I want to search for a book with the word “Elasticsearch” OR “Solr” in the
title, AND is authored by “clinton gormley” but NOT authored by “radu gheorge”:
POST /bookdb_index/book/_search
{
"query": {
"bool": {
"must": {
"bool" : {
"should": [
{
"match": { "title": "Elasticsearch" }},
{
"match": { "title": "Solr" }}
],
"must":
{ "match": { "authors": "clinton gormely" }}
}
},
"must_not":
{ "match": {"authors": "radu gheorge" }}
}
}
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
2.0749094,
"_source": {
"title":
"Elasticsearch: The Definitive Guide",
"authors": [
"clinton
gormley",
"zachary
tong"
],
"summary": "A distibuted
real-time search and analytics engine",
"publish_date": "2015-02-07",
"num_reviews": 20,
"publisher":
"oreilly"
}
}
]
Note: As you can see, a
bool query can wrap any other query type including other bool queries to create
arbitrarily complex or deeply nested queries.
Fuzzy Queries
Fuzzy matching can be
enabled on Match and Multi-Match queries to catch spelling errors. The degree
of fuzziness is specified based on the Levenshtein distance
from the original word, i.e. the number of one character changes that need to
be made to one string to make it the same as another string.
POST /bookdb_index/book/_search
{
"query": {
"multi_match"
: {
"query" : "comprihensiv guide",
"fields": ["title", "summary"],
"fuzziness": "AUTO"
}
},
"_source":
["title", "summary", "publish_date"],
"size": 1
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
2.4344182,
"_source": {
"summary":
"Comprehensive guide to implementing a scalable search engine using Apache
Solr",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
}
]
Note: Instead of
specifying "AUTO" you can specify the numbers 0, 1, or 2 to indicate
the maximum number of edits that can be made to the string to find a match. The
benefit of using "AUTO" is that it takes into account the length of
the string. For strings that are only 3 characters long, allowing a fuzziness
of 2 will result in poor search performance. Therefore it's recommended to
stick to "AUTO" in most cases.
Wildcard Query
Wildcard queries allow
you to specify a pattern to match instead of the entire term. ? matches any
character and * matches zero or more characters. For
example, to find all records that have an author whose name begins with the
letter ‘t’
POST /bookdb_index/book/_search
{
"query": {
"wildcard"
: {
"authors" : "t*"
}
},
"_source":
["title", "authors"],
"highlight": {
"fields" :
{
"authors" : {}
}
}
}
[Results]
"hits": [
{
"_index": "bookdb_index",
"_type":
"book",
"_id":
"1",
"_score": 1,
"_source": {
"title":
"Elasticsearch: The Definitive Guide",
"authors": [
"clinton
gormley",
"zachary
tong"
]
},
"highlight": {
"authors": [
"zachary
<em>tong</em>"
]
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score": 1,
"_source": {
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant ingersoll",
"thomas
morton",
"drew
farris"
]
},
"highlight": {
"authors": [
"<em>thomas</em> morton"
]
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score": 1,
"_source": {
"title":
"Solr in Action",
"authors": [
"trey
grainger",
"timothy
potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
Regexp Query
Regexp queries allow you
to specify more complex patterns than wildcard queries.
POST /bookdb_index/book/_search
{
"query": {
"regexp" :
{
"authors" : "t[a-z]*y"
}
},
"_source":
["title", "authors"],
"highlight": {
"fields" :
{
"authors" : {}
}
}
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score": 1,
"_source": {
"title":
"Solr in Action",
"authors": [
"trey
grainger",
"timothy
potter"
]
},
"highlight": {
"authors": [
"<em>trey</em> grainger",
"<em>timothy</em> potter"
]
}
}
]
Match Phrase Query
The match phrase query
requires that all the terms in the query string be present in the document, be
in the order specified in the query string and be close to each other. By
default, the terms are required to be exactly beside each other but you can
specify the slop value which indicates how far apart terms
are allowed to be while still considering the document a match.
POST /bookdb_index/book/_search
{
"query": {
"multi_match" : {
"query": "search engine",
"fields": ["title", "summary"],
"type": "phrase",
"slop": 3
}
},
"_source": [
"title", "summary", "publish_date" ]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
0.22327082,
"_source": {
"summary": "Comprehensive guide to implementing a
scalable search engine using Apache Solr",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
0.16113183,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"title":
"Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
Note: in the example
above, for a non-phrase type query, document _id 1 would normally
have a higher score and appear ahead of document _id 4 because its field
length is shorter. However, as a phrase query the proximity of the terms is
factored in, so document _id
4 scores better.
Note: Also note that, if
the slop parameter was reduced to 1 document _id 1 would no longer
appear in the result set.
Match Phrase Prefix
Match phrase prefix
queries provide search-as-you-type or a poor man’s version of autocomplete at
query time without needing to prepare your data in any way. Like the match_phrase query, it accepts a slop parameter to make
the word order and relative positions somewhat less rigid. It also accepts
the max_expansions parameter to limit the number of terms
matched in order to reduce resource intensity.
POST /bookdb_index/book/_search
{
"query": {
"match_phrase_prefix" : {
"summary": {
"query": "search en",
"slop":
3,
"max_expansions": 10
}
}
},
"_source": [
"title", "summary", "publish_date" ]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
0.5161346,
"_source":
{
"summary": "Comprehensive guide to implementing a
scalable search engine using Apache Solr",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
0.37248808,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"title":
"Elasticsearch: The Definitive Guide",
"publish_date":
"2015-02-07"
}
}
]
Note: Query-time
search-as-you-type has a performance cost. A better solution is index-time
search-as-you-type. Check out the Completion
Suggester API or the use of Edge-Ngram
filters for more information.
Query String
The query_string query provides a means of executing multi_match queries,
bool queries, boosting, fuzzy matching, wildcards, regexp, and range queries in
a concise shorthand syntax. In the following example, we execute a fuzzy search
for the terms “search algorithm” in which one of the book authors is “grant
ingersoll” or “tom morton.” We search all fields but apply a boost of 2 to the
summary field.
POST /bookdb_index/book/_search
{
"query": {
"query_string" : {
"query": "(saerch~1 algorithm~1) AND (grant
ingersoll) OR (tom morton)",
"fields": ["title", "authors" ,
"summary^2"]
}
},
"_source": [
"title", "summary", "authors" ],
"highlight": {
"fields" :
{
"summary" : {}
}
}
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score":
3.571021,
"_source": {
"summary":
"organize text using approaches such as full-text search, proper name
recognition, clustering, tagging, information extraction, and
summarization",
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"authors": [
"grant
ingersoll",
"thomas
morton",
"drew
farris"
]
},
"highlight": {
"summary": [
"organize text
using approaches such as full-text <em>search</em>, proper name
recognition, clustering, tagging"
]
}
}
]
Simple Query String
The simple_query_string query is a version of the query_string query that is more suitable for use in a single search box
that is exposed to users because it replaces the use of AND/OR/NOT with +/|/-,
respectively, and it discards invalid parts of a query instead of throwing an
exception if a user makes a mistake.
POST /bookdb_index/book/_search
{
"query": {
"simple_query_string" : {
"query": "(saerch~1 algorithm~1) + (grant ingersoll) | (tom morton)",
"fields": ["title", "authors" ,
"summary^2"]
}
},
"_source": [
"title", "summary", "authors" ],
"highlight": {
"fields" :
{
"summary" : {}
}
}
}
Term/Terms Query
The above examples have
been examples of full-text search. Sometimes we are more interested in a
structured search in which we want to find an exact match and return the
results. The term and terms queries help us
here. In the below example, we are searching for all books in our index
published by Manning Publications.
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" :
["title","publish_date","publisher"]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score":
1.2231436,
"_source":
{
"publisher": "manning",
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
1.2231436,
"_source":
{
"publisher": "manning",
"title":
"Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
1.2231436,
"_source":
{
"publisher": "manning",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
}
]
Multiple terms can be
specified by using the terms keyword instead and passing in an array of
search terms.
{
"query": {
"terms" :
{
"publisher": ["oreilly", "packt"]
}
}
}
Term Query - Sorted
Term queries results
(like any other query results) can easily be sorted. Multi-level sorting is
also allowed.
POST /bookdb_index/book/_search
{
"query": {
"term" : {
"publisher": "manning"
}
},
"_source" :
["title","publish_date","publisher"],
"sort": [
{
"publish_date": {"order":"desc"}}
]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
null,
"_source": {
"publisher":
"manning",
"title": "Elasticsearch in
Action",
"publish_date": "2015-12-03"
},
"sort": [
1449100800000
]
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
null,
"_source": {
"publisher":
"manning",
"title":
"Solr in Action",
"publish_date": "2014-04-05"
},
"sort": [
1396656000000
]
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score":
null,
"_source": {
"publisher":
"manning",
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
},
"sort": [
1358985600000
]
}
]
Note: In ES6, to sort or
aggregate by a text field, like a title, for example, you would need to enable
fielddata on that field. More details on this can be found in the ElasticSearch
Guide.
Range Query
Another structured query
example is the range query. In this example, we search for books published in
2015.
POST /bookdb_index/book/_search
{
"query": {
"range" :
{
"publish_date": {
"gte": "2015-01-01",
"lte": "2015-12-31"
}
}
},
"_source" :
["title","publish_date","publisher"]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
1,
"_source":
{
"publisher": "oreilly",
"title": "Elasticsearch:
The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
1,
"_source":
{
"publisher":
"manning",
"title":
"Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
Note: Range queries work
on date, number, and string type fields.
Filtered Bool Query
When using a bool query,
you can use a filter clause to filter down the results of a query. For our
example, we are querying for books with the term “Elasticsearch” in the title
or summary but we want to filter our results to only those with 20 or more reviews.
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query": "elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"range" : {
"num_reviews": {
"gte": 20
}
}
}
}
},
"_source" :
["title","summary","publisher", "num_reviews"]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
0.5955761,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"publisher":
"oreilly",
"num_reviews": 20,
"title":
"Elasticsearch: The Definitive Guide"
}
}
]
Multiple filters can be
combined through the use of the bool filter. In the next example, the filter
determines that the returned results must have at least 20 reviews, must not be
published before 2015 and should be published by O'Reilly.
POST /bookdb_index/book/_search
{
"query": {
"filtered": {
"query" : {
"multi_match": {
"query":
"elasticsearch",
"fields": ["title","summary"]
}
},
"filter": {
"bool": {
"must": {
"range" : { "num_reviews": { "gte": 20 } }
},
"must_not": {
"range" : { "publish_date": { "lte":
"2014-12-31" } }
},
"should": {
"term": { "publisher": "oreilly" }
}
}
}
}
},
"_source" :
["title","summary","publisher",
"num_reviews", "publish_date"]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
0.5955761,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"publisher": "oreilly",
"num_reviews": 20,
"title":
"Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
}
]
Function Score: Field Value Factor
There may be a case
where you want to factor in the value of a particular field in your document
into the calculation of the relevance score. This is typical in scenarios where
you want the boost the relevance of a document based on its popularity. In our
example, we would like the more popular books (as judged by the number of
reviews) to be boosted. This is possible using the field_value_factor function score.
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"field_value_factor": {
"field" : "num_reviews",
"modifier": "log1p",
"factor" : 2
}
}
},
"_source":
["title", "summary", "publish_date",
"num_reviews"]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
0.44831306,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"num_reviews": 20,
"title":
"Elasticsearch: The Definitive Guide",
"publish_date": "2015-02-07"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
0.3718407,
"_source":
{
"summary": "Comprehensive guide to implementing a
scalable search engine using Apache Solr",
"num_reviews": 23,
"title":
"Solr in Action",
"publish_date":
"2014-04-05"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
0.046479136,
"_source":
{
"summary": "build scalable search applications using Elasticsearch
without having to do complex low-level programming or understand advanced data
science algorithms",
"num_reviews": 18,
"title":
"Elasticsearch in Action",
"publish_date": "2015-12-03"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score":
0.041432835,
"_source":
{
"summary": "organize text using approaches such as
full-text search, proper name recognition, clustering, tagging, information
extraction, and summarization",
"num_reviews": 12,
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
}
]
Note 1: We could have
just run a regular multi_match query and sorted by the num_reviews field
but then we lose the benefits of having relevance scoring.
Note 2: There are a
number of additional parameters that tweak the extent of the boosting effect on
the original relevance score such as “modifier”, “factor”, “boost_mode”, etc.
These are explored in detail in the Elasticsearch
guide.
Function Score: Decay Functions
Suppose that instead of
wanting to boost incrementally by the value of a field, you have an ideal
value you want to target and you want the boost factor to decay the further
away you move from the value. This is typically useful in boosts based on
lat/long, numeric fields like price, or dates. In our contrived example, we are
searching for books on “search engines” ideally published around June 2014.
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search
engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"exp": {
"publish_date" : {
"origin":
"2014-06-15",
"offset": "7d",
"scale" : "30d"
}
}
}
],
"boost_mode" : "replace"
}
},
"_source":
["title", "summary", "publish_date",
"num_reviews"]
}
[Results]
"hits": [
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
0.27420625,
"_source":
{
"summary": "Comprehensive guide to implementing a
scalable search engine using Apache Solr",
"num_reviews": 23,
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"1",
"_score":
0.005920768,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"num_reviews": 20,
"title":
"Elasticsearch: The Definitive Guide",
"publish_date":
"2015-02-07"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score":
0.000011564,
"_source":
{
"summary": "organize text using approaches such as full-text
search, proper name recognition, clustering, tagging, information extraction,
and summarization",
"num_reviews": 12,
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"publish_date": "2013-01-24"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
0.0000059171475,
"_source":
{
"summary": "build scalable search applications using
Elasticsearch without having to do complex low-level programming or understand
advanced data science algorithms",
"num_reviews": 18,
"title":
"Elasticsearch in Action",
"publish_date": "2015-12-03"
}
}
]
Function Score: Script Scoring
In the case where the
built-in scoring functions do not meet your needs, there is the option to
specify a Groovy script to use for scoring. In our example, we want to specify
a script that takes into consideration the publish_date before deciding how much to factor in the number of reviews.
Newer books may not have as many reviews yet so they should not be penalized
for that.
The scoring script looks
like this:
publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd',
threshold).getTime()) {
my_score = Math.log(2.5 +
num_reviews)
} else {
my_score = Math.log(1 +
num_reviews)
}
return my_score
To use a scoring script
dynamically, we use the script_score parameter:
POST /bookdb_index/book/_search
{
"query": {
"function_score": {
"query": {
"multi_match" : {
"query" : "search engine",
"fields": ["title", "summary"]
}
},
"functions": [
{
"script_score": {
"params" : {
"threshold": "2015-07-30"
},
"script":
"publish_date = doc['publish_date'].value; num_reviews =
doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd',
threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 +
num_reviews);"
}
}
]
}
},
"_source":
["title", "summary", "publish_date",
"num_reviews"]
}
[Results]
"hits": {
"total": 4,
"max_score":
0.8463001,
"hits": [
{
"_index":
"bookdb_index",
"_type": "book",
"_id":
"1",
"_score":
0.8463001,
"_source":
{
"summary": "A distibuted real-time search and analytics
engine",
"num_reviews": 20,
"title":
"Elasticsearch: The Definitive Guide",
"publish_date":
"2015-02-07"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"4",
"_score":
0.7067348,
"_source":
{
"summary": "Comprehensive guide to implementing a scalable
search engine using Apache Solr",
"num_reviews": 23,
"title":
"Solr in Action",
"publish_date": "2014-04-05"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"3",
"_score":
0.08952084,
"_source":
{
"summary": "build scalable search applications using
Elasticsearch without having to do complex low-level programming or understand
advanced data science algorithms",
"num_reviews": 18,
"title": "Elasticsearch in
Action",
"publish_date": "2015-12-03"
}
},
{
"_index":
"bookdb_index",
"_type":
"book",
"_id":
"2",
"_score":
0.07602123,
"_source":
{
"summary": "organize text using approaches such as
full-text search, proper name recognition, clustering, tagging, information
extraction, and summarization",
"num_reviews": 12,
"title":
"Taming Text: How to Find, Organize, and Manipulate It",
"publish_date":
"2013-01-24"
}
}
]
}
Note 1: To use dynamic
scripting, it must be enabled for your Elasticsearch instance in the config/elasticsearch.yaml file. It’s also possible to use scripts that have been
stored on the Elasticsearch server. Check out the Elasticsearch
reference docs for more information.
Note 2: JSON cannot
include embedded newline characters so the semicolon is used to separate
statements.