Law Insider includes two types of data:
- SEC Repository data
- contracts you upload yourself
To accommodate both, the table schema includes columns that are in SEC data but not in contracts and visa versa, for example
sec_exhibit_id. See below for details.
ℹ️ Sample Data ℹ️
You can find sample data here in Google Sheets.
Note that saving query results to Google Sheets is a common practice with BigQuery.
ℹ️ Annotation ℹ️
The asterisk (*) shows fields that are generated with machine learning. For example, ML can distinguish between an employment agreement or a sale contract using natural language processing and other techniques.
Fields that begin with sec_ are only on SEC documents.
|Column name||Data type||Value|
|repository_id||STRING||Repository id is the name of the repository as well as the name of the dataset in Big Query.|
|doc_id||STRING||Unique hash assigned to the document. It also is used to create URLs. The URL is so you can open the document from a browser.|
|doc_category*||STRING||Machine learning classified document category, e.g., employment agreement.|
|doc_is_contract*||STRING||This is set to TRUE when ML has determined that the document is a contract or agreement.|
|doc_filename||STRING||Original document filename at time of processing, e.g., |
|doc_language||STRING||Machine learning classified document language ,e.g., en, fr|
|doc_source_url||STRING||The URL address of the document.|
|doc_head||STRING||The first 124 characters in the document.|
|doc_body||STRING||HTML of the full text document.|
|definitions*||RECORD||Definition and title of a defined term. Numbered by order of occurrence within document.|
|clauses*||RECORD||Title and clause snippet. Numbered by order of occurrence within document.|
|paragraphs||RECORD||Full text document split into paragraphs. Numbered by order of occurrence.|
|sec_exhibit_id||STRING||SEC exhibit id, e.g., EX-10.|
|sec_filing_date||STRING||Date filing was submitted to the SEC.|
|sec_filing_type||STRING||Version number of SEC public forms.|
|sec_company_cik||INTEGER||The Central Index Key (CIK) is a ten digit number assigned by the SEC to each entity that submits filings.|
|sec_company_name||STRING||This corresponds to the name of the legal entity registered under the Investment Company Act of 1940. (Entered by Filer).|
|sec_company_sic||INTEGER||The Standard Industrial Classification Codes that appear in a company's disseminated EDGAR filings indicate the company's type of business.|
Show Schema in Google Big Query
The schema looks like this in Google Big Query:
Updated 3 months ago