Table Schema
Law Insider includes two types of data:
- SEC Repository data
- contracts you upload yourself
To accommodate both, the table schema includes columns that are in SEC data but not in contracts and visa versa, for example sec_exhibit_id
. See below for details.
ℹ️ Sample Data ℹ️
You can find sample data here in Google Sheets.
Note that saving query results to Google Sheets is a common practice with BigQuery.
ℹ️ Annotation ℹ️
The asterisk (*) shows fields that are generated with machine learning. For example, ML can distinguish between an employment agreement or a sale contract using natural language processing and other techniques.
Fields that begin with sec_ are only on SEC documents.
Column name | Data type | Value |
---|---|---|
repository_id | STRING | Repository id is the name of the repository as well as the name of the dataset in Big Query. |
doc_id | STRING | Unique hash assigned to the document. It also is used to create URLs. The URL is so you can open the document from a browser. |
doc_category* | STRING | Machine learning classified document category, e.g., employment agreement. |
doc_is_contract* | STRING | This is set to TRUE when ML has determined that the document is a contract or agreement. |
doc_filename | STRING | Original document filename at time of processing, e.g., my_agreement.pdf . |
doc_language | STRING | Machine learning classified document language ,e.g., en, fr |
doc_source_url | STRING | The URL address of the document. |
doc_head | STRING | The first 124 characters in the document. |
doc_body | STRING | HTML of the full text document. |
definitions* | RECORD | Definition and title of a defined term. Numbered by order of occurrence within document. |
clauses* | RECORD | Title and clause snippet. Numbered by order of occurrence within document. |
paragraphs | RECORD | Full text document split into paragraphs. Numbered by order of occurrence. |
sec_exhibit_id | STRING | SEC exhibit id, e.g., EX-10. |
sec_filing_date | STRING | Date filing was submitted to the SEC. |
sec_filing_id | STRING | Filing ID. |
sec_filing_type | STRING | Version number of SEC public forms. |
sec_company_cik | INTEGER | The Central Index Key (CIK) is a ten digit number assigned by the SEC to each entity that submits filings. |
sec_company_name | STRING | This corresponds to the name of the legal entity registered under the Investment Company Act of 1940. (Entered by Filer). |
sec_company_sic | INTEGER | The Standard Industrial Classification Codes that appear in a company's disseminated EDGAR filings indicate the company's type of business. |
Show Schema in Google Big Query
The schema looks like this in Google Big Query:
Updated 9 months ago