Vectorspace AI Careers



For more information on any of the open requisitions below please contact:




Mid-level Data Engineer


Job Description

As a mid-level Data Engineer you will be responsible for improving the architecture of a general-purpose data pipeline used to do ETL of various data sources: crawled web pages, various APIs, PDFs, text files, etc.

You will build, maintain, scale and support existing data pipelines and deploy them on a Cloud Service Provider. You will ensure the proper storing of the raw and processed data.

You will set up monitoring for this pipeline, re-run on failure and update the pipeline with improvements without disrupting ongoing operations.

You will integrate this pipeline to feed data to our machine learning / language models and archive the model's artifacts (weights, hyperparameters, model training code versions) at each run, as well as load these artifacts into an API server. You will have the opportunity to learn more about NLP / NLU language models and how to train and evaluate them.

You are expected to share knowledge with the team and mentor junior data engineers.

Job Requirements

Required
  • 2-4 years of experience building and deploying production-grade data pipelines
  • Strong Python fundamentals
  • Experience with at least one of the following
    • Microsoft Azure
    • Google Cloud Platform
    • AWS
  • Good basic understanding of servers
Nice to haves
  • Experience with a data orchestration tool
    • Dagster
    • Airflow
  • Experience in CI/CD, DevOps
  • Experience in API engineering
  • Experience mentoring junior engineers
  • Basic experience with training of machine learning models and archiving of artifacts
  • Side projects
  • Technical blog



Mid-level NLP / NLU Engineer


Job Description

As a mid-level NLP / NLU Engineer you will be responsible for the training and evaluation of the language models we use to find hidden relationships in proprietary and public data which we may obtain through our partners or crawl from websites, APIs, and files.

You will examine data thoroughly to find out how it is best to clean, normalize and/or preprocess and create stop words lists. If you find that you lack data for the model to perform well, you will be expected to help the Data Engineering team find sources of data the model needs.

You will research about model architectures and choose the right ones through experimentation. These experiments must be replicable (via Jupyter notebooks, or your preferred experimentation stack) and the results of which should be meticulously logged. You will train and optimize these models, logging the hyperparameters, metrics and performance of each run.

You will be asked to experiment on ensembles of language models, optimizing for the business-specific metric our partners care about. For example, for financial institutions this would be the Sharpe or Sortino ratio.

You will create modules for different models, which will be used inside our pipelines to load saved models and run inference on incoming data.

You will help us improve our correlation matrix datasets by tweaking the code that generates it. You will research on the best methods to achieve context control (e.g. how similar are Moderna and AstraZeneca in the context of space biosciences? How about in the context of DNA repair?)

You are expected to share your NLP / machine learning knowledge to upskill the team and mentor more junior NLP engineers.

Job Requirements

Required
  • 1-4 years of experience training and evaluating a variety of language models and deploying them to production
  • Strong machine learning fundamentals and Python skills
  • Proficiency in at least one machine learning library like pytorch, tensorflow, keras, fastai, etc
  • A scientific mindset and interest in research and development
Nice to haves
  • Experience in MLOps / the deployment and serving of models
  • Experience in experiment tracking tools e.g. Weights & Biases
  • Actual NLP / NLU research experience (published papers, experiments)
  • Experience in web crawling, pulling data from APIs, and processing different filetypes
  • Experience mentoring junior NLP / NLU practitioners
  • Side projects
  • Technical blog


Feedback