README¶
RosettaMPNN¶
Overview¶
Description¶
RosettaMPNN is a community-driven repository for protein sequence design tools based on Message Passing Neural Networks (MPNNs). Starting from the LigandMPNN infrastructure, this repository combines many of the MPNN-based tools developed by Rosetta Commons, including ProteinMPNN and HyperMPNN to serve as a centralized home for MPNN-based sequence design tools. If you would like your MPNN-based tool incorporated into this repository, create a pull request or reach out to Hope Woods, the Rosetta Commons Technical Product Lead.
As one of the tools maintained by Rosetta Commons, the MPNN tools that compose RosettaMPPN have been refactored to create a single, unified Python API and command-line interface. This, along with the creation of unit and integration test infrastructure, will streamline development of RosettaMPNN, facilitate long-term maintenance, and promote collaboration between contributors.
This README is a great place to start, but for more information about what RosettaMPNN can do and how to contribute, see the documentation.
What MPNN tools are currently included?¶
ProteinMPNN: The original MPNN tool that can couple amino acid sequences in different chains and is symmetry aware. It can be used to design monomers, cyclic oligomers, protein nanoparticles, and protein-protein interfaces.
LigandMPNN: Extends the capabilities of ProteinMPNN to also be able to design protein sequences in the context of small molecules, nucleotides and metals. This allows for the design of small molecule binding proteins, sensors, and enzymes.
HyperMPNN: Adds a new model to construct highly thermostable proteins. These proteins are incredibly useful for the creation of vaccines, protein nanoparticles for drug delivery, and industrial biocatalysts. For more information on how this model was trained please see the HyperMPNN github page.
Multistate Design: Enables sequence design for multiple protein conformations at once, improving protein flexiblity and resulting in more realistic protein structures.
Key Publications¶
The following publications describe the underlying methods and models integrated in RosettaMPNN:
ProteinMPNN: General protein backbone-based sequence design Science, 2023
LigandMPNN: Extends sequence design for protein-ligand complexes, while maintaining compatibility with ProteinMPNN models Nature Methods, 2025
HyperMPNN: A set of weights that can be used with the ProteinMPNN model that generate highly thermostable protein sequences. bioRxiv, 2024
Table of Contents¶
Features¶
Multiple MPNN model variants: ProteinMPNN, LigandMPNN, HyperMPNN, and more
Unified Python API and CLI: Consistent interface for scripting and command-line use
Flexible, extensible framework: Add your own models or design protocols
Actively maintained: Community contributions encouraged
Tested workflows: Integration and unit tests, reproducible pipelines
Getting Started¶
Installation Guide¶
1. Clone the repository:
git clone https://github.com/woodsh17/RosettaMPNN.git
cd RosettaMPNN
2. Download the model weights (includes weights for HyperMPNN):
bash get_model_params.sh model_params
3. Set up your Python environment and install (choose one of the following options):
Option A: Using Conda
conda create -n rosettampnn python=3.11
conda activate rosettampnn
pip install -r requirements.txt
pip install -e .
(Optional but recommended) Add RosettaMPNN to your PYTHONPATH:
export PYTHONPATH=/PATH/TO/RosettaMPNN:$PYTHONPATH
Whenever you want to run RosettaMPNN, activate your environment:
conda activate rosettampnn
Option B: Using uv
and venv
#create virtual environment with python 3.11
uv venv --python=python3.11
source .venv/bin/activate
#if cuda is available
uv pip install -e .[cuda]
#if cuda is not available
uv pip install -e .
(Optional but recommended) Add RosettaMPNN to your PYTHONPATH:
export PYTHONPATH=/PATH/TO/RosettaMPNN:$PYTHONPATH
Whenever you want to run RosettaMPNN, activate your environment:
source .venv/bin/activate
If you do not have uv
installed, run:
curl -LsSf https://astral.sh/uv/install.sh | sh
Docker image¶
Docker image coming soon
Examples¶
Basic Use Case
For this example we will use 1BC8.pdb from the example inputs. Flags explained:
--out_folder
: Output directory for results--pdb_path
: Input structure in PDB format--checkpoint_protein_mpnn
: Path to model weights, necessary if you are not running inside RosettaMPNN
Example Command Line
python -m RosettaMPNN \
--out_folder ./out/ \
--pdb_path ~/RosettaMPNN/inputs/1BC8.pdb \
--checkpoint_protein_mpnn ~/RosettaMPNN/model_params/proteinmpnn_v_48_020.pt
Expected outputs:
seqs/
: Designed sequence as1BC8.fa
. Confidence metric and sequence recovery is reported in the fasta file. Theoverall_confidence
reflects the average confidence over the redesigned residues:overall_confidence=exp[-mean_over_residues(log_probs)]
with a miniumum value of 0 and a max value of 1. Higher numbers mean the model is more confident about that sequence. Sequence recovery with respect to the input sequence is calculated only over the redesigned residues.backbones/
: Output structure with predicted sequence as1BC8.pdb
packed/
: (empty unless side-chain packing is specified)
Multi-State Design
⚠️ Experimental Feature: The multi-state implementation is not yet scientifically validated. Use with caution.
Multi-state design allows you to design sequences compatible with multiple structures or states. Originally implemented by the Kuhlman lab (GitHub).
Flags explained:
--multi_state_pdb_path
: Path to a JSON file listing the PDBs to be included--multi_state_constraints
: Semicolon-separated list of multi-state design constraints, commas separate individual residue sets within a constraint
Example Command Line
#copy PDB files to working directory
cp PATH/TO/RosettaMPNN/inputs/4GYT_dimer.pdb .
cp PATH/TO/RosettaMPNN/inputs/4GYT_monomer.pdb .
#create json file that points to input pdbs
cat <<EOF >> msd_pdbs.json
{
"./4GYT_dimer.pdb": "",
"./4GYT_monomer.pdb": ""
}
EOF
#run RosettaMPNN with multi_state design options
python -m RosettaMPNN \
--out_folder ./out_msd \
--multi_state_pdb_path ~/RosettaMPNN/inputs/msd_pdbs.json \
--multi_state_constraints 4GYT_dimer:A7-A183:0.5,4GYT_dimer:B7-B183:0.5,4GYT_monomer:A7-A183:1 \
--checkpoint_protein_mpnn ~/RosettaMPNN/model_params/proteinmpnn_v_48_020.pt
Same as basic use case, plus:
msd/
: Combined multi-state structure asmsd.pdb
Extra FASTA/PDB files for each input structure
Using HyperMPNN Weights
The retrained HyperMPNN weights were downloaded when you ran get_model_params.sh
. You can use these weights with the protein_mpnn
model option. These weights are not compatible with the ligand_mpnn
model.
Example Command Line
python -m RosettaMPNN \
--out_folder ./out_hyper/ \
--pdb_path ~/RosettaMPNN/inputs/1BC8.pdb \
--model_type protein_mpnn \
--checkpoint_protein_mpnn ~/RosettaMPNN/model_params/hypermpnn_v48_020_epoch300.pt
For more information on how to run RosettaMPNN and different options available see the documentation.
Developing¶
Contributing¶
We welcome contributions to improve RosettaMPNN. We use a fork-and-PR system for contribution. To contribute to RosettaMPNN, please fork the RosettaMPNN repo under your own GitHub user space. You can then develop your additions in your own space. Once you’re ready to contribute it back, open a PR against the main RosettaMPNN repo.
Testing¶
Unit and integration tests are located in the
test/
directory.To run tests locally, use:
pytest test/
Continuous integration (CI) is set up with GitHub Actions to automatically run all unit and integration tests for pull requests targeting the
main
branch.Please ensure that you add appropriate tests for any new code contributed to the repository.
Support & Help¶
You can find more detailed documentation on the documentation site
Full documentation: https://woodsh17.github.io/RosettaMPNN/
Open an issue for bugs or feature requests: GitHub Issues
General questions: RosettaCommons contact form
Citing RosettaMPNN¶
If you use RosettaMPNN in your work, please cite the relevant publications listed in Key Publications