README

RosettaMPNN

Overview

Description

RosettaMPNN is a community-driven repository for protein sequence design tools based on Message Passing Neural Networks (MPNNs). Starting from the LigandMPNN infrastructure, this repository combines many of the MPNN-based tools developed by Rosetta Commons, including ProteinMPNN and HyperMPNN to serve as a centralized home for MPNN-based sequence design tools. If you would like your MPNN-based tool incorporated into this repository, create a pull request or reach out to Hope Woods, the Rosetta Commons Technical Product Lead.

As one of the tools maintained by Rosetta Commons, the MPNN tools that compose RosettaMPPN have been refactored to create a single, unified Python API and command-line interface. This, along with the creation of unit and integration test infrastructure, will streamline development of RosettaMPNN, facilitate long-term maintenance, and promote collaboration between contributors.

This README is a great place to start, but for more information about what RosettaMPNN can do and how to contribute, see the documentation.

What MPNN tools are currently included?

  • ProteinMPNN: The original MPNN tool that can couple amino acid sequences in different chains and is symmetry aware. It can be used to design monomers, cyclic oligomers, protein nanoparticles, and protein-protein interfaces.

  • LigandMPNN: Extends the capabilities of ProteinMPNN to also be able to design protein sequences in the context of small molecules, nucleotides and metals. This allows for the design of small molecule binding proteins, sensors, and enzymes.

  • HyperMPNN: Adds a new model to construct highly thermostable proteins. These proteins are incredibly useful for the creation of vaccines, protein nanoparticles for drug delivery, and industrial biocatalysts. For more information on how this model was trained please see the HyperMPNN github page.

  • Multistate Design: Enables sequence design for multiple protein conformations at once, improving protein flexiblity and resulting in more realistic protein structures.

Key Publications

The following publications describe the underlying methods and models integrated in RosettaMPNN:


Table of Contents


Features

  • Multiple MPNN model variants: ProteinMPNN, LigandMPNN, HyperMPNN, and more

  • Unified Python API and CLI: Consistent interface for scripting and command-line use

  • Flexible, extensible framework: Add your own models or design protocols

  • Actively maintained: Community contributions encouraged

  • Tested workflows: Integration and unit tests, reproducible pipelines


Getting Started

Installation Guide

1. Clone the repository:

git clone https://github.com/woodsh17/RosettaMPNN.git
cd RosettaMPNN

2. Download the model weights (includes weights for HyperMPNN):

bash get_model_params.sh model_params

3. Set up your Python environment and install (choose one of the following options):

Option A: Using Conda
conda create -n rosettampnn python=3.11
conda activate rosettampnn
pip install -r requirements.txt
pip install -e .

(Optional but recommended) Add RosettaMPNN to your PYTHONPATH:

export PYTHONPATH=/PATH/TO/RosettaMPNN:$PYTHONPATH

Whenever you want to run RosettaMPNN, activate your environment:

conda activate rosettampnn
Option B: Using uv and venv
#create virtual environment with python 3.11
uv venv --python=python3.11
source .venv/bin/activate
#if cuda is available
uv pip install -e .[cuda]
#if cuda is not available
uv pip install -e .

(Optional but recommended) Add RosettaMPNN to your PYTHONPATH:

export PYTHONPATH=/PATH/TO/RosettaMPNN:$PYTHONPATH

Whenever you want to run RosettaMPNN, activate your environment:

source .venv/bin/activate

If you do not have uv installed, run:

curl -LsSf https://astral.sh/uv/install.sh | sh

Docker image

Docker image coming soon

Examples

Basic Use Case

For this example we will use 1BC8.pdb from the example inputs. Flags explained:

  • --out_folder: Output directory for results

  • --pdb_path: Input structure in PDB format

  • --checkpoint_protein_mpnn: Path to model weights, necessary if you are not running inside RosettaMPNN

Example Command Line

python -m RosettaMPNN \
--out_folder ./out/ \
--pdb_path ~/RosettaMPNN/inputs/1BC8.pdb \
--checkpoint_protein_mpnn ~/RosettaMPNN/model_params/proteinmpnn_v_48_020.pt

Expected outputs:

  • seqs/: Designed sequence as 1BC8.fa. Confidence metric and sequence recovery is reported in the fasta file. The overall_confidence reflects the average confidence over the redesigned residues: overall_confidence=exp[-mean_over_residues(log_probs)] with a miniumum value of 0 and a max value of 1. Higher numbers mean the model is more confident about that sequence. Sequence recovery with respect to the input sequence is calculated only over the redesigned residues.

  • backbones/: Output structure with predicted sequence as 1BC8.pdb

  • packed/: (empty unless side-chain packing is specified)

Multi-State Design

⚠️ Experimental Feature: The multi-state implementation is not yet scientifically validated. Use with caution.

Multi-state design allows you to design sequences compatible with multiple structures or states. Originally implemented by the Kuhlman lab (GitHub).

Flags explained:

  • --multi_state_pdb_path: Path to a JSON file listing the PDBs to be included

  • --multi_state_constraints: Semicolon-separated list of multi-state design constraints, commas separate individual residue sets within a constraint

Example Command Line

#copy PDB files to working directory
cp PATH/TO/RosettaMPNN/inputs/4GYT_dimer.pdb .
cp PATH/TO/RosettaMPNN/inputs/4GYT_monomer.pdb .

#create json file that points to input pdbs
cat <<EOF >> msd_pdbs.json
{
    "./4GYT_dimer.pdb": "",
    "./4GYT_monomer.pdb": ""
}
EOF

#run RosettaMPNN with multi_state design options
python -m RosettaMPNN \
--out_folder ./out_msd \
--multi_state_pdb_path ~/RosettaMPNN/inputs/msd_pdbs.json \
--multi_state_constraints 4GYT_dimer:A7-A183:0.5,4GYT_dimer:B7-B183:0.5,4GYT_monomer:A7-A183:1 \
--checkpoint_protein_mpnn ~/RosettaMPNN/model_params/proteinmpnn_v_48_020.pt

Same as basic use case, plus:

  • msd/: Combined multi-state structure as msd.pdb

  • Extra FASTA/PDB files for each input structure

Using HyperMPNN Weights

The retrained HyperMPNN weights were downloaded when you ran get_model_params.sh. You can use these weights with the protein_mpnn model option. These weights are not compatible with the ligand_mpnn model.

Example Command Line

python -m RosettaMPNN \
--out_folder ./out_hyper/ \
--pdb_path ~/RosettaMPNN/inputs/1BC8.pdb \
--model_type protein_mpnn \
--checkpoint_protein_mpnn ~/RosettaMPNN/model_params/hypermpnn_v48_020_epoch300.pt

For more information on how to run RosettaMPNN and different options available see the documentation.


Developing

Contributing

We welcome contributions to improve RosettaMPNN. We use a fork-and-PR system for contribution. To contribute to RosettaMPNN, please fork the RosettaMPNN repo under your own GitHub user space. You can then develop your additions in your own space. Once you’re ready to contribute it back, open a PR against the main RosettaMPNN repo.

Testing

  • Unit and integration tests are located in the test/ directory.

  • To run tests locally, use:

    pytest test/
    
  • Continuous integration (CI) is set up with GitHub Actions to automatically run all unit and integration tests for pull requests targeting the main branch.

  • Please ensure that you add appropriate tests for any new code contributed to the repository.


Support & Help

You can find more detailed documentation on the documentation site


Citing RosettaMPNN

If you use RosettaMPNN in your work, please cite the relevant publications listed in Key Publications