gabrielcretin.fr
Gabriel Cretin

Gabriel Cretin

Building AI systems for structural biology — from distributed GPU training to production infrastructure

Deep Learning Researcher

AI for Computational Biology

Specialized in Protein Language Models and distributed GPU training using PyTorch Lightningwith 40k+ A100/H100 GPU hours on national HPC (IDRIS/CNRS). Published research on protein structure prediction, analysis and flexibility, and generative modeling.

Input Embeddings
Training Model
3D Structure
Flexibility
Fold Recognition
PyTorch Lightning ESM-2 / Ankh Autoencoders Contrastive Learning
Explore research

SRE & MLOps

Linux System Administrator

Managing 14 web servers, 5 GPU nodes, and an HPC cluster (708 cores). Running self-hosted GitLab, JupyterHub, Mattermost, OpenLDAP, with 1 PB+ backup infrastructure and 380+ CPUs across the lab.

gabriel@hpc-master:~
~
Ansible Docker Swarm SLURM / HPC OpenLDAP
View infrastructure
researchoperations
Paris, France PhD defended October 2025 🇫🇷 French / 🇬🇧 English (C2)

Professional Profile

A rare hybrid profile combining 6 years of Linux infrastructure management with 4 years of deep learning research. I design novel AI architectures and deploy them on the infrastructure I build — from training on national supercomputers to production APIs serving 18K+ weekly requests.

40k+
GPU hours (IDRIS/CNRS)
8
Peer-reviewed papers
700+
HPC cores managed
18K+
Weekly API requests

Scientific Track

Deep Learning Research

Focus on Protein Language Models (ESM-2, Ankh, ProtTrans) and generative architectures. Developed Adversarial Autoencoders for embedding compression and contrastive learning for improved fold recognition — published in top-tier journals.

  • 8 peer-reviewed publications
  • 40k+ GPU hours on IDRIS/CNRS
  • 4 production web tools (PYTHIA, SWORD2, ICARUS, PEGASUS)

Engineering Track

SRE & Infrastructure

Managing a complete Linux ecosystem: web servers, GPU clusters, HPC, and centralized authentication. Building reliable platforms for research teams with automated provisioning and monitoring.

  • 14 web servers, 12 databases (~18K views/week)
  • HPC cluster: 61 nodes, 708 cores
  • 1 PB+ cumulated storage infrastructure

AI & Protein Science

Ph.D Research (2021–2025)

Representation Learning & Generative Modeling

Designed Adversarial Autoencoder (AAE) architectures to compress high-dimensional pLM embeddings (ESM-2, Ankh) into fixed-size latent spaces. Implemented contrastive triplet learning to improve structural fold recognition, surpassing state-of-the-art structure-based methods. Explored de novo protein design through latent space interpolation.

Adversarial Autoencoders Triplet Loss ESM-2 / Ankh / ProtTrans Fold Recognition

Tech Stack

ML Engineering

  • PyTorch / Lightning Expert
  • HPC / SLURM Advanced
  • Python / Bash / R Fluent
  • AlphaFold / Foldseek Proficient

SRE & Infrastructure

HPC & Compute

Cluster Management

5 GPU servers for deep learning & molecular dynamics. SGI HPC cluster: 61 nodes, 708 cores. 25 Linux workstations with 700+ CPUs.

SLURM Spack / Environment Modules

Services

Self-Hosted Stack

14 web servers, 2 APIs, 12 databases serving ~18K views/week. GitLab, Mattermost, JupyterHub, centralized auth.

GitLab CI/CD OpenLDAP JupyterHub

Automation & Networking

Infrastructure as Code

1 PB+ backup infrastructure (3 dedicated servers). Ansible playbooks, Docker Swarm orchestration, Samba/NFS storage.

Ansible Docker Swarm Apache2 / NFS

Published Tools & Research

PYTHIA

Deep learning predictor for local protein conformations using Protein Blocks alphabet.

Int. J. Mol. Sci. 2021

SWORD2

Interactive web server for hierarchical 3D structure decomposition into domains & Protein Units.

Nucleic Acids Res. 2022

ICARUS

Flexible non-sequential structural alignment method using Protein Peeling for rigid sub-units.

Bioinformatics 2023

PEGASUS

Sequence-based predictor of MD-derived flexibility (RMSF, LDDT) using Protein Language Models.

Protein Science 2025

Selected Publications

8 peer-reviewed papers Journals: NAR, Bioinformatics, Protein Science

PEGASUS: Prediction of MD-derived protein flexibility

Protein Science, 2025 — co-first author

ICARUS: Flexible protein structural alignment

Bioinformatics, 2023 — first author

SWORD2: Hierarchical analysis of protein 3D structures

Nucleic Acids Research, 2022 — first author

PYTHIA: Deep learning for local protein conformation

Int. J. Mol. Sci., 2021 — first author

CV / Resume

Download the PDF and browse a short timeline snapshot.

  1. PhD Thesis Defense

    Université Paris Cité - "Deep learning approaches for protein analysis, prediction, and generation."

  2. PhD Student

    DSIMB Lab - Protein Language Models embeddings compression into continuous latent space for generation,
    and protein structure analysis and prediction

  3. Lead Systems Engineer (SRE)

    Managing the full-stack infrastructure: GPU clusters, 1PB+ storage, authentication (LDAP), and containerized web services.

  4. Master - Biology, Computer Science, Bioinformatics

    Université Paris Cité - with honors (rank 2/23).

  5. BSc - Biology, Computer Science, Bioinformatics

    Université Paris Cité - rank 6/21.

  6. Two-year degree (D.U.T) - Bioengineering & Bioinformatics

    Université de Clermont-Ferrand (Campus Aurillac) - ranks 4/45 (1st year) and 5/34 (2nd year).

Contact

Best way to reach me: gabriel.cretin@u-paris.fr

Paris, France

© Gabriel Cretin