Vectorspace AI Datasets


Dataset augmentation


We provide data augmentation services in the form of static and real-time, context-controlled, correlation matrix datasets based on Natural Language Processing (NLP) and Natural Language Understanding (NLU). Our datasets can be applied in all industries to generate new interpretations, hypotheses and discoveries.

Real-time insights


Using our API, our customers have access to near real-time (NRT) datasets that update as frequently as once per minute (1440 API calls per day), allowing for near real-time correlation scores and insights that can be generated in isolation, or as an augmentation to any external or internal dataset.

Built-in data provenance


For advanced users, we offer optional data provenance solutions via our Data Provenance Pipeline (DPP). The DPP rigorously controls data lineage, ensuring that you always know exactly where your data originated and how it was processed. This is a must-have for bioscience and financial institutions who rely on our datasets to make billion-dollar decisions every day.

State-of-the-art data pipeline


The Vectorspace data engineering pipeline takes unstructured text from any data source and applies state-of-the-art machine learning techniques based on unsupervised learning and NLP/NLU to find hidden relationships between entities (like genes or stocks) that can provide a valuable “signal” for our customers.

While our NLP/NLU datasets are endlessly customizable and can be leveraged by any industry, we provide access to many high quality data sources out-of-the-box.

Sign Up to learn more about our highly customizable product offerings.



Real-time peer reviewed published scientific papers, public company news & social information:














Data object types processed:



Data acquisition & model training sources:



Encyclopedias: