Gabriel Cretin
Building AI systems for structural biology — from distributed GPU training to production infrastructure
Deep Learning Researcher
AI for Computational Biology
Specialized in Protein Language Models and distributed GPU training using PyTorch Lightningwith 40k+ A100/H100 GPU hours on national HPC (IDRIS/CNRS). Published research on protein structure prediction, analysis and flexibility, and generative modeling.
SRE & MLOps
Linux System Administrator
Managing 14 web servers, 5 GPU nodes, and an HPC cluster (708 cores). Running self-hosted GitLab, JupyterHub, Mattermost, OpenLDAP, with 1 PB+ backup infrastructure and 380+ CPUs across the lab.
Professional Profile
A rare hybrid profile combining 6 years of Linux infrastructure management with 4 years of deep learning research. I design novel AI architectures and deploy them on the infrastructure I build — from training on national supercomputers to production APIs serving 18K+ weekly requests.
Scientific Track
Deep Learning Research
Focus on Protein Language Models (ESM-2, Ankh, ProtTrans) and generative architectures. Developed Adversarial Autoencoders for embedding compression and contrastive learning for improved fold recognition — published in top-tier journals.
- 8 peer-reviewed publications
- 40k+ GPU hours on IDRIS/CNRS
- 4 production web tools (PYTHIA, SWORD2, ICARUS, PEGASUS)
Engineering Track
SRE & Infrastructure
Managing a complete Linux ecosystem: web servers, GPU clusters, HPC, and centralized authentication. Building reliable platforms for research teams with automated provisioning and monitoring.
- 14 web servers, 12 databases (~18K views/week)
- HPC cluster: 61 nodes, 708 cores
- 1 PB+ cumulated storage infrastructure
AI & Protein Science
Ph.D Research (2021–2025)
Representation Learning & Generative Modeling
Designed Adversarial Autoencoder (AAE) architectures to compress high-dimensional pLM embeddings (ESM-2, Ankh) into fixed-size latent spaces. Implemented contrastive triplet learning to improve structural fold recognition, surpassing state-of-the-art structure-based methods. Explored de novo protein design through latent space interpolation.
Tech Stack
ML Engineering
- PyTorch / Lightning Expert
- HPC / SLURM Advanced
- Python / Bash / R Fluent
- AlphaFold / Foldseek Proficient
SRE & Infrastructure
HPC & Compute
Cluster Management
5 GPU servers for deep learning & molecular dynamics. SGI HPC cluster: 61 nodes, 708 cores. 25 Linux workstations with 700+ CPUs.
Services
Self-Hosted Stack
14 web servers, 2 APIs, 12 databases serving ~18K views/week. GitLab, Mattermost, JupyterHub, centralized auth.
Automation & Networking
Infrastructure as Code
1 PB+ backup infrastructure (3 dedicated servers). Ansible playbooks, Docker Swarm orchestration, Samba/NFS storage.
Published Tools & Research
PYTHIA
Deep learning predictor for local protein conformations using Protein Blocks alphabet.
Int. J. Mol. Sci. 2021SWORD2
Interactive web server for hierarchical 3D structure decomposition into domains & Protein Units.
Nucleic Acids Res. 2022ICARUS
Flexible non-sequential structural alignment method using Protein Peeling for rigid sub-units.
Bioinformatics 2023PEGASUS
Sequence-based predictor of MD-derived flexibility (RMSF, LDDT) using Protein Language Models.
Protein Science 2025Selected Publications
PEGASUS: Prediction of MD-derived protein flexibility
Protein Science, 2025 — co-first author
ICARUS: Flexible protein structural alignment
Bioinformatics, 2023 — first author
SWORD2: Hierarchical analysis of protein 3D structures
Nucleic Acids Research, 2022 — first author
PYTHIA: Deep learning for local protein conformation
Int. J. Mol. Sci., 2021 — first author
CV / Resume
Download the PDF and browse a short timeline snapshot.
-
PhD Thesis Defense
Université Paris Cité - "Deep learning approaches for protein analysis, prediction, and generation."
-
PhD Student
DSIMB Lab - Protein Language Models embeddings compression into continuous latent space for generation,
and protein structure analysis and prediction -
Lead Systems Engineer (SRE)
Managing the full-stack infrastructure: GPU clusters, 1PB+ storage, authentication (LDAP), and containerized web services.
-
Master - Biology, Computer Science, Bioinformatics
Université Paris Cité - with honors (rank 2/23).
-
BSc - Biology, Computer Science, Bioinformatics
Université Paris Cité - rank 6/21.
-
Two-year degree (D.U.T) - Bioengineering & Bioinformatics
Université de Clermont-Ferrand (Campus Aurillac) - ranks 4/45 (1st year) and 5/34 (2nd year).
© Gabriel Cretin