bool queries

In ElasticSearch, you can think of bool as meaning a combination of queries.

ElasticSearch uses the term bool for what they call Boolean Queries. But that's confusing and misleading. This is because boolean for most people means true or false. That is not exactly what a bool query is in ElasticSearch.

For example, ElasticSearch does not simply use AND or OR. Instead, bool accepts the following operators.

  • must
  • should
  • filter
  • must_not

To look at this from the traditional definition of boolean, what matters is that each sub-query (which you could also call clause) in the array of queries must resolve to true. So, looked at that way you could say the clauses are connected with the logical operator AND. Each single clause inside the bool could be a NOT, AND, or OR statement.

bool example

Here is an example. This query returns contracts that have the name Nevada in any of the fields listed in the simple_query_string clause and contain THIS OPERATING AGREEMENT in the snippet.

{
    "query": {
  "bool": {
      "must": [
          {
              "simple_query_string": {
                  "default_operator": "or",
                  "fields": [
                      "name",
                      "snippet",
                      "body",
                      "company.name",
                      "jurisdiction.name"
                  ],
                  "query": "\"Nevada\""
              }
          },
          {  
                    "match_phrase": {  
                        "snippet": {  
                            "query": "THIS OPERATING AGREEMENT"  
                        }  
                    }  
                } 
      ]
    }
  },
    "post_filter": {
        "term": {
            "_index": "contract"
        }
    },
    "size": 1,
    "profile": false,
    "explain": false,
    "timeout": "15000ms",
    "_source": [
        "name",
        "snippet",
        "category.name",
        "category.value",
        "company.name",
        "company.value",
        "jurisdiction.name",
        "jurisdiction.value",
        "industry.name",
        "industry.value",
        "filing_date",
        "group_id",
        "group_size"
    ],
    "highlight": {
        "fields": {
            "name": {}
        },
        "order": "score",
        "post_tags": [
            "</b>"
        ],
        "pre_tags": [
            "<b>"
        ]
    }
}

Results

{
  "took" : 63,
  "timed_out" : false,
  "_shards" : {
    "total" : 42,
    "successful" : 42,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 174,
      "relation" : "eq"
    },
    "max_score" : 13.150608,
    "hits" : [
      {
        "_index" : "contract",
        "_type" : "_doc",
        "_id" : "en/keA2k2UCl11",
        "_score" : 13.150608,
        "_source" : {
          "snippet" : "This OPERATING AGREEMENT (as amended from time to time, this “Agreement”) of Nevada Marketing, LLC (the “Company”) is made by Harrah’s Operating Company, Inc. (the “Member”) effective as of August 3, 2006.",
          "group_size" : 10,
          "group_id" : "21UShOdvC7G",
          "filing_date" : "2008-10-29",
          "jurisdiction" : {
            "name" : "Nevada",
            "value" : "nevada-us"
          },
          "name" : "OPERATING AGREEMENT OF NEVADA MARKETING, LLC a Nevada Limited Liability Company",
          "industry" : {
            "name" : "Services-miscellaneous amusement & recreation",
            "value" : "services-miscellaneous-amusement-recreation"
          },
          "company" : {
            "name" : "New Gaming Capital Partnership",
            "value" : "1447472"
          },
          "category" : {
            "name" : "Operating Agreement",
            "value" : "operating-agreement"
          }
        },
        "highlight" : {
          "name" : [
            "OPERATING AGREEMENT OF <b>NEVADA</b> MARKETING, LLC a <b>Nevada</b> Limited Liability Company"
          ]
        }
      }
    ]
  }
}

Parameters and Attributes

bool and must

A bool query is one that has at one least one of the following:

  • must
  • should
  • filter
  • must_not

Let's simplify this by giving give some definitions and examples.

We can say that:

  • must and filter are the same except documents that match the filter criteria are not include in the calculated document score.
  • must_not is pretty clear. Its needs no explanation. It lists documents that don't match this criteria.
  • should is the most confusing. This is because the word should is not at exactly a term in logic. So let's give it a clear definition. We say that that a should query is an array of must queries, any or which could be true. So it's like an OR query. But it's not exactly OR. ElasticSearch gives a parameter minimum_should_match. To explain that, suppose minimum_should_match = 2. Then we have 3 clauses in our query. ElasticSearch in that case will pick any document for which at at least 2 of those 3 clauses match. Of course writing minimum_should_match=1 is what we mean in regular logic when we use the term or. That is, in regular boolean logic, given three statements a, b, and c, we say that a or b or c is true when any one of a or b or c is true.

should explained

Suppose we have this document:

  {
        "_index" : "contract",
        "_type" : "_doc",
        "_id" : "en/6ZKwL7AdR8G",
        "_score" : 14.342296,
        "_source" : {
          "jurisdiction" : {
            "name" : "Nevada",
            "value" : "nevada-us"
          },
          "category" : {
            "name" : "Operating Agreement"
          }
        }
      }

Obviously this query will match that document.

Notice that minimum_should_match is not placed inside the should operator. Instead it is placed outside it, inside the bool operator.

curl -X POST "https://www.lawinsider.com/api/v1alpha/search?token=$token&pretty" -H 'Content-Type: application/json' -d'{
    "query": {
  "bool": {
      "minimum_should_match" : 3,
      "should": [
        {
              "simple_query_string": {
                  "fields": [
                      "jurisdiction.name"
                  ],
                  "query": "Nevada"
              }
          },
        {
              "simple_query_string": {
                  "fields": [
                      "jurisdiction.value"
                  ],
                  "query": "nevada-us"
              }
          },
        {
              "simple_query_string": {
                  "fields": [
                      "category.name"
                  ],
                  "query": "Operating Agreement"
              }
          }
      ]
    }
  },
    "post_filter": {
        "term": {
            "_index": "contract"
        }
    },
    "size": 1,
    "_source": [
        "category.name",
        "jurisdiction.name",
        "jurisdiction.value"
    ]
}'

match nothing

Here is another example. Obviously this query will show nothing. Since it says must not match any documents that match:

{
    "query": {
        "bool": {
            "must_not": [
                {
                    "match_all": {}
                }
            ]
        }
    }

every document excluding Nevada

This shows every document not in Nevada.

{
    "query": {
        "bool": {
            "must_not": [
                {
                    "simple_query_string": {
                  "fields": [
                      "jurisdiction.name"
                  ],
                  "query": "Nevada"
              }
                }
            ]
        }
    }