Table Schema

Law Insider includes two types of data:

  1. SEC Repository data
  2. contracts you upload yourself

To accommodate both, the table schema includes columns that are in SEC data but not in contracts and visa versa, for example sec_exhibit_id. See below for details.

ℹ️ Sample Data ℹ️

You can find sample data here in Google Sheets.

Note that saving query results to Google Sheets is a common practice with BigQuery.

ℹ️ Annotation ℹ️

The asterisk (*) shows fields that are generated with machine learning. For example, ML can distinguish between an employment agreement or a sale contract using natural language processing and other techniques.

Fields that begin with sec_ are only on SEC documents.

Column nameData typeValue
repository_idSTRINGRepository id is the name of the repository as well as the name of the dataset in Big Query.
doc_idSTRINGUnique hash assigned to the document. It also is used to create URLs. The URL is so you can open the document from a browser.
doc_category*STRINGMachine learning classified document category, e.g., employment agreement.
doc_is_contract*STRINGThis is set to TRUE when ML has determined that the document is a contract or agreement.
doc_filenameSTRINGOriginal document filename at time of processing, e.g., my_agreement.pdf.
doc_languageSTRINGMachine learning classified document language ,e.g., en, fr
doc_source_urlSTRINGThe URL address of the document.
doc_headSTRINGThe first 124 characters in the document.
doc_bodySTRINGHTML of the full text document.
definitions*RECORDDefinition and title of a defined term. Numbered by order of occurrence within document.
clauses*RECORDTitle and clause snippet. Numbered by order of occurrence within document.
paragraphsRECORDFull text document split into paragraphs. Numbered by order of occurrence.
sec_exhibit_idSTRINGSEC exhibit id, e.g., EX-10.
sec_filing_dateSTRINGDate filing was submitted to the SEC.
sec_filing_idSTRINGFiling ID.
sec_filing_typeSTRINGVersion number of SEC public forms.
sec_company_cikINTEGERThe Central Index Key (CIK) is a ten digit number assigned by the SEC to each entity that submits filings.
sec_company_nameSTRINGThis corresponds to the name of the legal entity registered under the Investment Company Act of 1940. (Entered by Filer).
sec_company_sicINTEGERThe Standard Industrial Classification Codes that appear in a company's disseminated EDGAR filings indicate the company's type of business.

Show Schema in Google Big Query

The schema looks like this in Google Big Query: