COVID-19 Dataset Builder

Context-controlled correlation matrix datasets for unsupervised learning, clustering & hidden relationship networks

  • Create unique datasets drugs, genes, protiens & infectious diseases
  • Choose custom feature vectors or column labels
  • Generate clusters or visualize hidden relationship networks
Every 24 hours we analyze thousands of bioscientific research papers that are published around the world through the National Library of Medicine (NLM) and other sources including the COVID-19 Open Research Dataset (CORD-19) composed of scientific literature directly related to COVID-19, SARS-CoV-2, and the Coronavirus group along with LitCovid, a curated literature hub for tracking up-to-date scientific information about the 2019 novel Coronavirus.

Enter 1 to 5 COVID-19 related concepts or keywords

example: Batteries, Bioengineering, Graphene, Blockchain, Machine Learning

Data Sources:

What kind of things can be done with custom concept columns & features?

  • Create unique clusters based on concepts and hidden relationships
  • Determine if price correlations have similar concept or keyword correlations
  • Dataset Augmentation: Detach custom columns, append them to other proprietary inhouse datasets
  • Select a Data Context (e.g. Biological, Chemical, Geophysical and others) to derive different signals
  • Use drug, gene, protein or infectious disease names as custom concept column labels
  • Create features using global events e.g. trending terms anywhere on the internet

    Scores range from 0 to 1 and represent strength of known and hidden relationships between a concept and a stock, option or ETF. The score is calculated based on a series of algorithms that monitor data surrounding each company associated to the underlying security where each score is combined with scores from human curation teams. These concepts can then be factored or parameterized for exploring new signals or building new models.

  • Data reqeusts:

    Dataset & feature requests: