Dataset Builder



Create unique Cryptocurrency, ETF, Nasdaq, NYSE & OTC datasets
with custom feature vectors for boosting alpha






EquityType:
NYSE/Nasdaq
Feature:
'Helium'
Feature:
'Blockchain'
Feature:
'Graphene'
GOOG 0.000 0.000 0.000
AAPL 0.000 0.000 0.000
GBTC 0.000 0.000 0.000
... ... ... ...




Data & Feature Engineering Pipeline Overview:








Rows contain stock symbols. Columns contain fundamental data and 1 year (2008 - 2014) of minute-by-minute price data for S&P 500 stocks or EOD data for all equities + up to 5 columns you can define, represented by keywords or concepts you choose. Dataset generation is based on public & private databases, an army of human curators and data triangulation.

Dataset Production Platforms:

  • Nasdaq, NYSE & OTCBB + General Data
  • Nasdaq, NYSE & OTCBB + Genomic & Molecular Biology Data
  • Nasdaq, NYSE & OTCBB + Chemical Data
  • Nasdaq, NYSE & OTCBB + Geophysical Data
  • S&P 500 + General Data
  • S&P 500 + Genomic & Molecular Biology Data
  • S&P 500 + Chemical Data
  • S&P 500 + Geophysical Data
  • [ Request Custom Dataset ]


    What kind of things can be done with custom concept columns & features?

  • Create unique sectors or clusters based on concepts and hidden relationships and compare their gains to the S&P (see below)
  • Determine if price correlations have similar concept or keyword correlations
  • Examine symbiotic, parasitic and sympathetic relationships between equities
  • Automatically create baskets of stocks based on concepts and/or keywords
  • Detach the custom columns and append them to other proprietary inhouse datasets
  • Select a Data Context (e.g. Biological, Chemical, Geophysical and others) to derive different signals
  • Use stock symbols as custom concept column labels and model cross-correlations between equities
  • Create features using trending terms anywhere on the internet


    How do the concepts & trends correlate to crypto, stocks or ETFs?

    Scores range from 0 to 1 and represent strength of known and hidden relationships between a concept and a stock, option or ETF. The score is calculated based on a series of algorithms that monitor data surrounding each company associated to the underlying security where each score is combined with scores from human curation teams. These concepts can then be factored or parameterized for exploring new signals or building new models. [Ref: Equity Correlations - J.P. Morgan]


    Example use case:

    Returns for Playstation, Helium, Korea, Shampoo, & Coffee in comparison to the S&P 500: (interactive)



    Google Search Trends for Batteries, Bioengineering, Graphene, Blockchain & Machine Learning





  • Contact us here: