The HTRC Extracted Features Dataset (1.0) contains page-level features for 13.7 million public-domain and in-copyright volumes, including
Find out more at the HTRC Extracted Features Dataset webpage, which includes full documentation, a sample dataset, and links for downloading the data.
The HathiTrust Research Center links to additional tools, datasets, and information about workshops.
HathiTrust Research Center's Bookworm tool charts trends in word use from 1500-2015 in hundreds of thousands of texts in HathiTrust. Filters are available for subject classification, fiction/non-fiction, genres, language, format, page and word counts, and publication information. Controls allow choice of date ranges, different metrics and case sensitivity.
Bookworms based on other text collections are available at http://bookworm.culturomics.org/.