Nested Queries
First, to briefly review JSON, any attribute is not at the top (root) level of a JSON document is called nested. For example, here the term bottom is nested under top.
{
"top": {
"bottom"
}
You would refer to bottom
by writing top.bottom
.
Here is an example from Law Insider. You could use a simple filter query to search this document using the term
operator and checking if sections.type
equals definition
. But that will not work in all cases, as explained below.
"filter": [
{
"terms": {
"sections.type": [
"definition"
]
}
}
]
The filter
operator would work just for "sections.type" = "definition"
. But once you have a second condition, like "sections.name.raw": "Person"
, you need to use a nested query. This is because searching is like a two step process: the nested query first searches (1) documents at the top and then (2) searches through the documents found in step (1) to get the final results.
Nested Query Example
Look at this example below.
curl -X POST "https://www.lawinsider.com/api/v1alpha/search?token=$token&pretty" -H 'Content-Type: application/json' -d'{
"query": {
"bool": {
"filter": [
{
"nested": {
"path": "sections",
"query": {
"bool": {
"must": [
{
"terms": {
"sections.type": [
"definition"
]
}
}
]
}
}
}
}
]
}
},
"size": 1,
"profile": false,
"explain": false,
"timeout": "15000ms",
"_source": [
"name",
"snippet",
"category.name",
"category.value",
"company.name",
"company.value",
"jurisdiction.name",
"jurisdiction.value",
"industry.name",
"industry.value",
"filing_date",
"group_id",
"group_size"
],
"highlight": {
"fields": {
"body": {},
"company.name": {},
"jurisdiction.name": {}
},
"order": "score",
"post_tags": [
"</b>"
],
"pre_tags": [
"<b>"
]
}
}'
JSON Query Results
We only got 1 matching record since we used "size": 1
to make the list short.
{
"took" : 10,
"timed_out" : false,
"_shards" : {
"total" : 42,
"successful" : 42,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10000,
"relation" : "gte"
},
"max_score" : 0.0,
"hits" : [
{
"_index" : "contract",
"_type" : "_doc",
"_id" : "en/5WVGGzpNQEK",
"_score" : 0.0,
"_source" : {
"snippet" : "THIS FIFTH AMENDMENT TO SUBLEASE (this “Fifth Amendment”) is made and entered into as of July 10, 2013, by and between LUDLOW TECHNICAL PRODUCTS CORPORATION, a New York corporation (“Sublandlord”), and SYNACOR, INC., a Delaware corporation (“Subtenant”).",
"group_size" : 1,
"group_id" : "5WVGGzpNQEK",
"filing_date" : "2016-03-22",
"jurisdiction" : {
"name" : "New York",
"value" : "new-york-us"
},
"name" : "FIFTH AMENDMENT TO SUBLEASE",
"industry" : {
"name" : "Services-computer programming, data processing, etc.",
"value" : "services-computer-programming-data-processing-etc"
},
"company" : {
"name" : "Synacor, Inc.",
"value" : "1408278"
},
"category" : {
"name" : "Sublease",
"value" : "sublease"
}
}
}
]
}
}
Parameters and Attributes
We already mentioned some of the parameters in the match_phrase query. So look there are well. Below are some comments on the other parameters and attributes.
Hits
This shows how many documents matched. But we limited the results to only list one of them using size
.
"hits" : {
"total" : {
"value" : 5,
"relation" : "eq"
}
path
path is a JSON concept that means where in the nested document you want to start your search from. In Law Insider there is only one path you can use: sections. [The reason for this is technical. In ElasticSearch terms you can only use paths for attributes that have an index mapping of type nested. The index mapping topic is beyond the scope of this documentation, because you cannot modify how indexes are configured in Law Insider.]
sections are either:
- name
- type
- value
- snippet
For example, here is one section from a document:
{
"snippet" : "shall mean this Operating Agreement as amended from time to time.",
"name" : "Agreement",
"type" : "definition",
"value" : "agreement"
}
inner_hits
inner_hits is useful for debugging. It shows you which parts of your nested query matched the documents by listing an inner_hits section.
terms
terms is a type of text-searching operator. With term
the field must match the text exactly. term expects a string. terms expects an array, like:
"post_filter": {
"terms": {
"_index": ["contract", "clause"]
}
}
Updated over 1 year ago