Patent and patent application Oath Signature data (JSON and PNG)

The USPTO receives millions of patent applications and supporting documents each year. During the application process, inventors sometimes submit documents using alternative versions of their names. As raised in the USPTO directors blog (dated September 08, 2021), this can limit the office’s ability to accurately certify the number of applications from a specific inventor and determine whether inventors are following application fee rules and regulations. Certifications submitted with errors often mean longer wait times for all applicants.

Prior to this sample USPTO dataset, identifying patent application discrepancies required manually reviewing millions of documents to match names and signatures. Patent documents come in different formats and languages and can contain multiple inventors for each application. Signatures exist in various locations within a document, making signature to applicant name matching challenging.

This sample USPTO research dataset provides images of signatures extracted from inventor oath documents.  This data set could be used for validation of micro entity certifications or other research purposes. It includes 883,811 applications and oath document signature images. Its is 40.5 GB of total size and is broken as 8 zip files for the following Patent Application Series:

Application # Series Applications Signature Counts
12 160,116 292,354
13 156,284 282,303
14 159,067 304,182
15 154,964 305,029
16 134,728 260,884
17 58,718 112,406
29 58,125 84,123
35 1,809 1,984
Total 883,811 1,643,265

 

Each of these zip files contain folders for each application number in a given series. The application folders contain the oath document identifier that includes the image(s) of the signature(s) as PNG, and JSON file that contains the application number, the inventor name(s), and confidence level of the signature extraction algorithm.

Fun Fact: This research data set includes a few celebrity signature images such as Elon Musk and Lori Greiner. See if you can identify the others!

Elon Musk  Lori Greiner

Updated: 2022-09-30

  Download (6.3 GB)
Dates Available Sep 30, 2022 – Sep 30, 2022

Patent Litigation data (stata (.dta) and MS Excel (.csv))

Contains detailed U.S. District Courts patent litigation data on 81,350 unique court cases filed during the period 1963 - 2020. The data was collected from the Public Access to Court Electronic Records (PACER) and RECAP as sources for all of the content. The final output datasets, provided in five different files, include information on the litigating parties involved and their attorneys; the cause of action; the court location; important dates in the litigation history; and, covering over 5 million document level information from the docket reports, descriptions of all documents submitted in a given case.
Updated: 2024-03-27

  Download (467.26 MB)
Dates Available Dec 29, 2016 – Mar 27, 2024

Patent examination research dataset (stata (.dta) and MS excel (.csv))

Contains detailed information on more than 13 million publicly viewable patent applications filed with the USPTO along with more than 1 million PCT applications through June 2023. The data files include information on each application's characteristics, prosecution history, continuation history, claims of foreign priority, patent term adjustment history, publication history, and correspondence address information.
Updated: 2023-09-26

  Download (79.39 MB)
Dates Available Dec 02, 2015 – Sep 26, 2023

Patent grant single-page TIFF images

Contains the images of each patent grant issued weekly (Tuesdays) from July 31, 1790 to December 26, 2023 in Tagged Image File Format (TIFF) Revision 6.0 with CCITT Group 4 Compression (single-page TIFFs). Includes a separate weekly Certificates-of-Correction (C-of-C) file and a daily Certificates file.
Updated: 2023-12-26

  Download (13.87 GB)
Dates Available Jul 30, 0790 – Dec 26, 2023

Certificates

Certificates include post issuance documents, e.g., ex parte and inter partes reexamination documents. These were weekly and are daily starting on Ocotober of 2012.
Updated: 2023-12-27

  Download (82 KB)
Dates Available Oct 04, 2011 – Dec 27, 2023

Patent grant bibliographic data/SGML

Contains (JAN 2001 - DEC 2001) the bibliographic text (front page) of each patent grant issued weekly (Tuesdays) in CY2001 (excludes images/drawings and reexaminations). The file format is Standard Generalized Markup Language (SGML) in accordance with the U.S. Patent Grant Version 2.4 Document Type Definition (DTD).
Updated: 2001-12-25

  Download (2.31 MB)
Dates Available Jan 02, 2001 – Dec 25, 2001

Patent assignment annual XML (backfile)

Contains (backfile - August 1980 - December 2023) patent assignment text (no drawings/images) derived from patent assignment recordations made at the USPTO. The file format is eXtensible Markup Language (XML) in accordance with the Patent Assignment Daily XML (PADX) Version 0.3 Document Type Definition (DTD).
Updated: 1980-01-01

  Download (109.86 MB)
Dates Available Jan 01, 1980 – Dec 31, 2023

MCF patent application (patent application sequence)

Current U.S. classification information for all patent applications (non-provisional utility and plant) published by the USPTO from March 15, 2001 to present. Approx. 450 main divisions of technology, called classifications/classes, broken into approx. 150,000 subdivisions, called subclassifications/subclasses. Provided in published patent application number sequence with the current U.S. original classification/subclassification and any cross-reference classification/subclassifications with the format of ASCII text.
Updated: 2021-03-22

  Download (35.63 MB)
Dates Available Mar 04, 2020 – Mar 22, 2021