Law Insider Quick Start Tutorial

On this page we walk you those writing ElasticSearch queries. We give examples of the most common search types and explain in detail each parameter.

You search documents using ElasticSearch Query DSL commands. These are are written as JSON documents that you POST to the Law Insider ElasticSearch API.

ElasticSearch Basic Concepts

ElasticSearch provides low-level granularity over search. This means you can write simple queries or complex ones. Complex ones take advantage of some of advanced ElasticSearch features to fine-tune results to a very high degree.

One thing to note as well is ElasticSearch search parameters are presented to the API as a POST and not a GET. To most programmers, that will seem backwards as normally to read information you use a GET and to write it you use a POST.

But the HTTP GET operation does not allow a JSON Body. So ElasticSearch uses POST because they need to use JSON since their queries can be very long.

ℹ️

Note for ElasticSearch Developers ℹ️

If you are an experience ElasticSearch developer, you will find that you don't have the full access you are used to. For example, you cannot:

  • obtain an index mapping
  • bind to the ElasticSearch instance using an SDK like elasticsearch-py.

This is because the traditional ways of logging into ElasticSearch are not allowed by Law Insider. Instead you gain access using the token URI query parameter.

Basic Search Command

ℹ️ curl ℹ️

In this documentation we use curl for queries. We use it because it's the simplest way and shortest way to send commands to an API endpoint. It doesn't require any programming.

curl is in all operating systems (On older versions of Windows you will need to install it.). To use it you need to get to the command prompt.

Or course you don't have to use curl and can use PostMan or write Python or JavaScript code as you try out these queries. Just copy the JSON code from our examples and use that there.

All searches have the same basic format. There are query URI parameters and a JSON body:

curl -X POST  
"https://www.lawinsider.com/api/v1alpha/search?token=$token&pretty"  
-d '{ 
  "json": "body"
}`

Here is an explanation of each item in the curl.

URLThe Law Insider API endpoint is always the same. It does not vary by the type of document you are searching for.
-X POSTmeans run an HTTP POST command
search?Law Insider always requires that search be here.
tokentoken is the authentication token you got from Law Insider. This is your credentials. Here we wrote $token with a dollar sign ($) in front meaning to pull it from the environment variable token. You can do that instead of hard-coding it.
prettyThis means pretty print the JSON response. That means add line feeds and indents to the response to make it easier to read.
-d {}Because the JSON will span more than one line you need to delimit it with a tick mark (`).

Indexes

The topmost item in documents in ElasticSearch is the index. Think of it like a file folder. Documents under one index are not under any other index. So you need to filter your searches by index so that you don't mix two different document types. For example, don't mix contracts and clauses.

There are many way to filter that. One way is to use post_filter.


```json
"post_filter": {
        "term": {
            "_index": "clause"
        }
    }

Here are the indexes. As you can see, they roughly correspond with the different document types.

contractcategory
definitionclause
companyjurisdiction
country

ElasticSearch Concepts

First, we give a sample ElasticSearch query, a dictionary search.

Dictionary Search

This search searches the dictionary index.

curl -X POST "https://www.lawinsider.com/api/v1alpha/search?token=<token>&pretty" -H 'Content-Type: application/json' -d'{
    "query": {
        "bool": {
            "should": [
                {
                    "match_all": {}
                },
                {
                    "rank_feature": {
                        "boost": 50,
                        "field": "weight_rank",
                        "saturation": {
                            "pivot": 16
                        }
                    }
                }
            ]
        }
    },
    "post_filter": {
        "term": {
            "_index": "contract"
        }
    },
    "size": 1,
    "profile": false,
    "explain": false,
    "timeout": "15000ms",
    "_source": [
        "name",
        "snippet",
        "category.name",
        "category.value",
        "company.name",
        "company.value",
        "jurisdiction.name",
        "jurisdiction.value",
        "industry.name",
        "industry.value",
        "filing_date",
        "group_id",
        "group_size"
    ],
    "highlight": {
        "fields": {
            "body": {},
            "company.name": {},
            "jurisdiction.name": {}
        },
        "order": "score",
        "post_tags": [
            "</b>"
        ],
        "pre_tags": [
            "<b>"
        ]
    }
}'


Produces:

{
  "took" : 287,
  "timed_out" : false,
  "_shards" : {
    "total" : 42,
    "successful" : 42,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 10000,
      "relation" : "gte"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "contract",
        "_type" : "_doc",
        "_id" : "en/hSFq9usP1aO",
        "_score" : 1.0,
        "_source" : {
          "snippet" : "THIS OPERATING AGREEMENT (this “Agreement”) is made effective as of the day of February 12, 2014, by AVATAR PROPERTIES INC., a Florida corporation (“Avatar Properties”).",
          "group_size" : 3,
          "group_id" : "i7io9hWv9jO",
          "filing_date" : "2015-03-17",
          "name" : "OPERATING AGREEMENT OF AVH ACQUISITION, LLC",
          "industry" : {
            "name" : "Operative builders",
            "value" : "operative-builders"
          },
          "company" : {
            "name" : "AVH Carolinas, LLC",
            "value" : "1636447"
          },
          "category" : {
            "name" : "Operating Agreement",
            "value" : "operating-agreement"
          }
        }
      }
    ]
  }
}'

Query and Results Explained

We break down this search to explain each section.

post_filter

This snippet says which index to query. Remember that index is the main attribute for all ES documents, for example, clauses and contracts are indexes. So you have to tell Law Insider which index to use.

 "post_filter": {
        "term": {
            "_index": "country"
        }
    }

bool

bool means ALL of the statements below it must be true in order for the document to match. bool requires must, should, must_not, or filter.

Look here for a complete explanation of boolean queries.

  {
        "bool": {
            "should": [
                {
                    "match_all": {}
                },
                {
                    "rank_feature": {
                        "boost": 50,
                        "field": "weight_rank",
                        "saturation": {
                            "pivot": 16
                        }
                    }
                }
            ]
        }

match_all

match_all means match all documents. Normally you would not want to list all documents, you would put some kind of text match. But this is just an example of how to search. In the JSON above you will see that the size attribute is set to 1. That means to return only one document.

score

score is a non-negative integer. You can roughly think of this as importance or relevance. The higher the score, the more relevant the document is to the search.

score has many uses.

For example, you can use score to sort documents by something other than the default sort order. You can think of that feature as something like a Google Search. Google Searches are built so that the most important pages come up first.

You can use different boost operations to push the most relevant documents to the top by modifying their score. For example, that is what this complex rank_feature does. It's not necessary to understand this code at this point. You are welcome to look it up elsewhere. Just know that Law Insider stores an attribute with documents called weight_rank to refine search results using the rank_feature ElasticSearch command.

 "rank_feature": {
                        "boost": 50,
                        "field": "weight_rank",
                        "saturation": {
                            "pivot": 16
                        }
                    }

Various factors affect the score, like how many times the text you search for appears in the field you are searching. Also a very long field would have a lower score since the longer the field the more likely it is to contain some random word.

_source

_source are the fields that were used when the document index was created. It does not include all fields in the document. When you do a search the _source fields are the only ones you can return. By default all of them are returned. However common practice is to use fields or highlight to return just a subset of those.

 "_source": [
        "name",
        "snippet",
        "category.name",
        "category.value",
        "company.name",
        "company.value",
        "jurisdiction.name",
        "jurisdiction.value",
        "industry.name",
        "industry.value",
        "filing_date",
        "group_id",
        "group_size"
    ],

highlight

Here again we think what Google does to explain this. The highlight puts italics or whatever HTML delimiter around matching text to show you which parts matched your query. Thus it puts emphasis on what you are looking for among a longer string of text.

For example, suppose we have a query with this simple_query_string query.

{
  "bool": {
      "filter": [],
      "must": [
          {
              "simple_query_string": {
                  "default_operator": "or",
                  "fields": [
                      "name",
                      "snippet",
                      "body",
                      "company.name",
                      "jurisdiction.name"
                  ],
                  "query": "\"South Carolina\""
              }
          }
      ]
    }
  },

With this highlight section:

"highlight": {
        "fields": {
            "body": {},
            "company.name": {},
            "jurisdiction.name": {}
        },
        "order": "score",
        "post_tags": [
            "</b>"
        ],
        "pre_tags": [
            "<b>"
        ]
    }

It would put the boldface and tags around matching text to highlight it:

"highlight" : {
          "name" : [
            "WINDSTREAM <b>SOUTH</b> <b>CAROLINA</b>, LLC A <b>South</b> <b>Carolina</b> Limited Liability Company OPERATING AGREEMENT"
          ]
        }