| | --- |
| | license: cc-by-4.0 |
| | datasets: |
| | - sfisch/DirectContacts2 |
| | pipeline_tag: tabular-classification |
| | repo: https://github.com/KDrewLab/DirectContacts2_analysis.git |
| | --- |
| | # DirectContacts2: A network of direct physical protein interactions derived from high throughput mass spectrometry experiments |
| | Proteins carry out cellular functions by self-assembling into functional complexes, a process that depends on direct physical interactions |
| | between components. While tools like AlphaFold and RoseTTAFold have advanced structure prediction, they remain limited in scaling to the full |
| | human proteome. DirectContacts2 addresses this challenge by integrating diverse large-scale protrin interaction datasets, including AP/MS (BioPlex1–3, Boldt et al., Hein et al.), |
| | biochemical fractionation (Wan et al.), proximity labeling (Gupta et al., Youn et al.), and RNA pulldown (Treiber et al.), to predict whether ~26 million |
| | human protein pairs interact directly or indirectly. |
| |
|
| | ## Funding |
| | NIH R00, NSF/BBSRC |
| |
|
| | ## Citation |
| |
|
| | Erin R. Claussen, Miles D Woodcock-Girard, Samantha N Fischer, Kevin Drew |
| | |
| | ## References |
| | Kevin Drew, Christian L. Müller , Richard Bonneau, Edward M. Marcotte (2017) Identifying direct contacts between protein complex subunits from their conditional dependence in proteomics datasets. PLOS Computational Biology 13(10): e1005625. https://doi.org/10.1371/journal.pcbi.1005625 |
| | Samantha N. Fischer, Erin R Claussen, Savvas Kourtis, Sara Sdelci, Sandra Orchard, Henning Hermjakob, Georg Kustatscher, Kevin Drew hu.MAP3.0: Atlas of human protein complexes by integration of > 25,000 proteomic experiments. Molecular Systems Biology 1–33 (2025) doi:10.1038/s44320-025-00121-5. |
| | Erickson, Nick, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. "Autogluon-tabular: Robust and accurate automl for structured data." arXiv preprint arXiv:2003.06505 (2020). |
| | Huttlin et al. Dual proteome-scale networks reveal cell-specific remodeling of the human interactome Cell. 2021 May 27;184(11):3022-3040.e28. doi: 10.1016/j.cell.2021.04.011. |
| | Huttlin et al. Architecture of the human interactome defines protein communities and disease networks. Nature. 2017 May 25;545(7655):505-509. DOI: 10.1038/nature22366. |
| | Treiber et al. A Compendium of RNA-Binding Proteins that Regulate MicroRNA Biogenesis.. Mol Cell. 2017 Apr 20;66(2):270-284.e13. doi: 10.1016/j.molcel.2017.03.014. |
| | Boldt et al. An organelle-specific protein landscape identifies novel diseases and molecular mechanisms. Nat Commun. 2016 May 13;7:11491. doi: 10.1038/ncomms11491. |
| | Youn et al. High-Density Proximity Mapping Reveals the Subcellular Organization of mRNA-Associated Granules and Bodies. Mol Cell. 2018 Feb 1;69(3):517-532.e11. doi: 10.1016/j.molcel.2017.12.020. |
| | Gupta et al. A Dynamic Protein Interaction Landscape of the Human Centrosome-Cilium Interface. Cell. 2015 Dec 3;163(6):1484-99. doi: 10.1016/j.cell.2015.10.065. |
| | Wan, Borgeson et al. Panorama of ancient metazoan macromolecular complexes. Nature. 2015 Sep 17;525(7569):339-44. doi: 10.1038/nature14877. Epub 2015 Sep 7. |
| | Hein et al. A human interactome in three quantitative dimensions organized by stoichiometries and abundances. Cell. 2015 Oct 22;163(3):712-23. doi: 10.1016/j.cell.2015.09.053. Epub 2015 Oct 22. |
| | Huttlin et al. The BioPlex Network: A Systematic Exploration of the Human Interactome. Cell. 2015 Jul 16;162(2):425-40. doi: 10.1016/j.cell.2015.06.043. |
| | Reimand et al. g:Profiler-a web server for functional interpretation of gene lists (2016 update). Nucleic Acids Res. 2016 Jul 8;44(W1):W83-9. doi: 10.1093/nar/gkw199. |
| | |
| | ## Associated Code |
| | Code examples using the DirectContacts2 model can be found on our |
| | [GitHub](https://github.com/KDrewLab/DirectContacts2_analysis.git) |
| | All feature matrices and associated files can be found in the [DirectContacts2 dataset](https://huggingface.co/datasets/sfisch/DirectContacts2) |
| | |
| | # Usage |
| |
|
| | ## Accessing and using the model |
| | DirectContacts2 was constructed using [AutoGluon](https://auto.gluon.ai/stable/index.html) an auto-ML tool. The module [TabularPredictor](https://auto.gluon.ai/stable/api/autogluon.tabular.TabularPredictor.html) |
| | is used to is used train, test, and make predictions with the model. |
| |
|
| | This can be downloaded using the following: |
| |
|
| | $ pip install autogluon==0.8.2 |
| | |
| | Then it can be imported as: |
| |
|
| | >>> from autogluon.tabular import TabularPredictor |
| | Note that to perform operations with our model the **0.8.2 version** must be used |
| | |
| | To use the model and make predictions, we show two full code examples using the [full feature matrix](https://github.com/KDrewLab/DirectContacts2_analysis/blob/main/machine_learning/generating_predictions_w_DirectContacts2.ipynb) |
| | and the [test feature matrix](https://github.com/KDrewLab/DirectContacts2_analysis/blob/main/machine_learning/DirectContacts2_testing.ipynb) in jupyter notebooks. |
| |
|
| | All feature matrices can be pulled using the 'datasets' module from HuggingFace and examples of that are seen on our [GitHub](https://github.com/KDrewLab/DirectContacts2_analysis.git) |
| | and on our [DirectContacts2 HuggingFace dataset](https://huggingface.co/datasets/sfisch/DirectContacts2) |
| |
|
| |
|
| | ## Model card authors |
| | Samantha Fischer (sfisch6@uic.edu) |