AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
Protein aggregation plays a central role in the pathogenesis of many neurodegenerative diseases and poses major challenges in protein engineering. A key driver of this process is the presence of aggregation-prone regions (APRs) within protein sequences. We present AggrescanAI, a deep learning-based...
Guardado en:
| Autores principales: | , , , , , |
|---|---|
| Formato: | Preprint |
| Lenguaje: | en_US |
| Publicado: |
Journal of Molecular Biology
2026
|
| Materias: | |
| Acceso en línea: | https://hdl.handle.net/20.500.14769/5227 https://doi.org/10.1016/j.jmb.2026.169643 |
| Aporte de: |
| id |
I32-R138-20.500.14769-5227 |
|---|---|
| record_format |
dspace |
| spelling |
I32-R138-20.500.14769-52272026-01-19T14:01:45Z AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings Navarro, Alvaro Palacios, Santiago Galmarini, Thierry Bárcenas, Oriol Ventura, Salvador Marino-Buslje, Cristina AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS Protein aggregation plays a central role in the pathogenesis of many neurodegenerative diseases and poses major challenges in protein engineering. A key driver of this process is the presence of aggregation-prone regions (APRs) within protein sequences. We present AggrescanAI, a deep learning-based tool that predicts residue-level aggregation propensity directly from sequence. It leverages contextual embeddings from the ProtT5 protein language model, which captures rich information implicitly encoded in the sequence, without requiring structural data. The model was trained on a set of experimentally annotated APRs, expanded via homology transfering, evaluated by cross-validation, and validated with an external benchmark. AggrescanAI outperforms state of the art predictors and captures aggregation shifts induced by pathogenic mutations. To facilitate accessibility, we provide a user-friendly and fully open Google Colab notebook: https://gitlab.com/bioinformatics-fil/aggrescanai. AggrescanAI represents a new generation of sequence-based aggregation predictors, powered by deep learning and protein language models. 2026-01-19T13:54:24Z 2026-01-19T13:54:24Z 2026-01-16 Preprint Navarro, A. M., Palacios, S., Galmarini, T., Bárcenas, O., Ventura, S., & Marino-Buslje, C. (2026). AggrescanAI: Prediction of aggregation-prone regions using contextualized embeddings. Journal of Molecular Biology. https://doi.org/10.1016/j.jmb.2026.169643 1089-8638 https://hdl.handle.net/20.500.14769/5227 https://doi.org/10.1016/j.jmb.2026.169643 /10.1016/j.jmb.2026.169643 en_US Journal of Molecular Biology |
| institution |
Instituto Tecnológico de Buenos Aires (ITBA) |
| institution_str |
I-32 |
| repository_str |
R-138 |
| collection |
Repositorio Institucional Instituto Tecnológico de Buenos Aires (ITBA) |
| language |
en_US |
| topic |
AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS |
| spellingShingle |
AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS Navarro, Alvaro Palacios, Santiago Galmarini, Thierry Bárcenas, Oriol Ventura, Salvador Marino-Buslje, Cristina AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings |
| topic_facet |
AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS |
| description |
Protein aggregation plays a central role in the pathogenesis of many neurodegenerative diseases and poses major challenges in protein engineering. A key driver of this process is the presence of aggregation-prone regions (APRs) within protein sequences. We present AggrescanAI, a deep learning-based tool that predicts residue-level aggregation propensity directly from sequence. It leverages contextual embeddings from the ProtT5 protein language model, which captures rich information implicitly encoded in the sequence, without requiring structural data. The model was trained on a set of experimentally annotated APRs, expanded via homology transfering, evaluated by cross-validation, and validated with an external benchmark. AggrescanAI outperforms state of the art predictors and captures aggregation shifts induced by pathogenic mutations. To facilitate accessibility, we provide a user-friendly and fully open Google Colab notebook: https://gitlab.com/bioinformatics-fil/aggrescanai. AggrescanAI represents a new generation of sequence-based aggregation predictors, powered by deep learning and protein language models. |
| format |
Preprint |
| author |
Navarro, Alvaro Palacios, Santiago Galmarini, Thierry Bárcenas, Oriol Ventura, Salvador Marino-Buslje, Cristina |
| author_facet |
Navarro, Alvaro Palacios, Santiago Galmarini, Thierry Bárcenas, Oriol Ventura, Salvador Marino-Buslje, Cristina |
| author_sort |
Navarro, Alvaro |
| title |
AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings |
| title_short |
AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings |
| title_full |
AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings |
| title_fullStr |
AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings |
| title_full_unstemmed |
AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings |
| title_sort |
aggrescanai: prediction of aggregation-prone regions using contextualized embeddings |
| publisher |
Journal of Molecular Biology |
| publishDate |
2026 |
| url |
https://hdl.handle.net/20.500.14769/5227 https://doi.org/10.1016/j.jmb.2026.169643 |
| work_keys_str_mv |
AT navarroalvaro aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings AT palaciossantiago aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings AT galmarinithierry aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings AT barcenasoriol aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings AT venturasalvador aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings AT marinobusljecristina aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings |
| _version_ |
1865139399223672832 |