Dataset Builder

Create unique cryptocurrency, ETF, Nasdaq, NYSE & OTC datasets with custom feature vectors for boosting alpha

Background | BlackRock's NLP ETF | GitHub | API

GOOG 0.000 0.000 0.000
AAPL 0.000 0.000 0.000
GBTC 0.000 0.000 0.000
... ... ... ...

Data & Feature Engineering Pipeline Overview:


Rows contain stock symbols. Columns contain cryptocurrencies. Dataset generation is based on public & private databases, labeled datasets, triangulation & human curation by market researchers.

1,002 curated cryptocurrencies by 10,286 NYSE, Nasdaq & OTC Stocks:

CSV endpoint:

TSV endpoint:

The Periodic Table of Elements by 10,286 NYSE, Nasdaq & OTC Stocks:

CSV endpoint:

TSV endpoint:

Generate custom feature attributes:

Enter 1 to 5 concepts:

example: Batteries, Music, Graphene, Helium, Health

What kind of things can be done with custom concept columns & features?

  • Create unique sectors or clusters based on concepts and hidden relationships and compare their gains to the S&P (see below)
  • Determine if price correlations have similar concept or keyword correlations
  • Examine symbiotic, parasitic and sympathetic relationships between equities
  • Automatically create baskets of stocks based on concepts and/or keywords
  • Detach the custom columns and append them to other proprietary inhouse datasets
  • Select a Data Context (e.g. Biological, Chemical, Geophysical and others) to derive different signals
  • Use stock symbols as custom concept column labels and model cross-correlations between equities
  • Create features using trending terms anywhere on the internet

    How do the concepts & trends correlate to crypto, stocks or ETFs?

    Scores range from 0 to 1 and represent strength of known and hidden relationships between a concept and a stock, option or ETF. The score is calculated based on a series of algorithms that monitor data surrounding each company associated to the underlying security where each score is combined with scores from human curation teams. These concepts can then be factored or parameterized for exploring new signals or building new models. [Ref: Equity Correlations - J.P. Morgan]


    Returns for Playstation, Helium, Korea, Shampoo, & Coffee in comparison to the S&P 500: (interactive)

  • Dataset & feature requests: