Updating NLP/U correlation matrix datasets

Dataset APIs:

Dataset examples:

Example questions that can be answered:

Which drugs have potential hidden relationships with other drugs?

Which public company pharmaceutical companies may have hidden relationships to drug compounds?

  • Within a context e.g. DNA repair genes?
  • Based on real-time news?
  • Based on recently published scientific papers?

Example scenarios:

  • An approved drug is discovered to have a new application for a different disease (drug repositioning/repurposing) - extract a cluster of other drug compounds that may have known or hidden relationships.
  • A drug compound passes phase 3 clinical trials for Company XYZ - cluster other public company pharmaceutical companies that may have known or hidden relationships to company XYZ.
  • Company XYZ has a drug compound that's found to cause significant health problems - find a cluster of other public company pharmaceutical companies that may share the same risk as company XYZ due to similar drugs being developed in their public company pharmaceutical pipelines.

Example use cases (with Python code):

Data sources & dataset provenance:

[ Data Pipeline Provenance (DPP) Hash: c7106d99a4e073143badee0f3686ef2058e316e0 ]

[ Static, dynamic & real-time ]

Real-time data engineering pipeline output log:

Real-time peer reviewed published scientific papers, public company news & social information:

Data acquisition & model training sources:

Data object types processed:

  • FDA label
  • Product name
  • Description
  • Pharmacodynamics
  • Molecular Weight
  • Molecular Formula
  • Mechanism of Action (MOA)
  • Genes/Proteins
  • Genomic Pathways
  • Associated FASTA sequence
  • Melting Point
  • Hydrophobicity
  • Isoelectric Point
  • Metabolism
  • Dosage form
  • Dosage strength
  • Absorbtion
  • Drug-to-drug interactions
  • Indications
  • Adverse effects
  • Toxicity descriptions
  • Start/End marketing dates
  • Half-life
  • Route of elimination
  • Synonyms
  • Pubmed IDs/abstracts/papers
  • Book reference IDs
  • Generic: true/false
  • Approved: yes/no
  • Country
  • Manufacturer
  • Ontology


  • Encyclopedia of Biological Chemistry - 1st & 2nd Edition
  • Encyclopedia of Genetics, Genomics, Proteomics and Bioinformatics
  • Encyclopedia of Molecular Cell Biology and Molecular Medicine Vol 1-16
  • The Encyclopedia of Molecular Biology - Creighton
  • The Gale Encyclopedia of Medicine Vol 1-6 4th Edition
  • Van Nostrand's Encyclopedia of Chemistry - 5th Edition
  • Encyclopedia of Earth And Space Science
  • Encyclopedia of Plant and Crop Science - 1st Edition
  • Encyclopedia of Earth Science
  • Encyclopedia of Solid Earth Geophysics
  • Encyclopedia of Marine Science
  • Encyclopedia of Physical Science Volume 1 & 2
  • Encyclopedia of Business and Finance Vol 1 & 2
  • SAGE Publications Encyclopedia of Business in Today's World
  • The Encyclopedia of Political Science Set - 1st Edition
  • Encyclopedia of World Geography
  • Encyclopedia of Geology - 5 Volume Set
  • Encyclopedia of World History 7 Volumes Set Facts on File
  • Encyclopedia of Mathematical Physics Vol 1-5
  • Encyclopedia of Mathematics Science
  • Encyclopedia of Condensed Matter Physics
  • Encyclopedia of Physics Research
  • Encyclopedia of Nonlinear Science
  • Rourke's World of Science Encyclopedia - Vol 1-10
  • McGraw-Hill Encyclopedia of Science & Technology, 10th Edition, Vol 1-19
  • Wiley Encyclopedia of Computer Science and Engineering - Vol 1-5
  • Wiley Encyclopedia of Food Science and Technology - Vol 1-4