AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings

Protein aggregation plays a central role in the pathogenesis of many neurodegenerative diseases and poses major challenges in protein engineering. A key driver of this process is the presence of aggregation-prone regions (APRs) within protein sequences. We present AggrescanAI, a deep learning-based...

Descripción completa

Guardado en:
Detalles Bibliográficos
Autores principales: Navarro, Alvaro, Palacios, Santiago, Galmarini, Thierry, Bárcenas, Oriol, Ventura, Salvador, Marino-Buslje, Cristina
Formato: Preprint
Lenguaje:en_US
Publicado: Journal of Molecular Biology 2026
Materias:
Acceso en línea:https://hdl.handle.net/20.500.14769/5227
https://doi.org/10.1016/j.jmb.2026.169643
Aporte de:
id I32-R138-20.500.14769-5227
record_format dspace
spelling I32-R138-20.500.14769-52272026-01-19T14:01:45Z AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings Navarro, Alvaro Palacios, Santiago Galmarini, Thierry Bárcenas, Oriol Ventura, Salvador Marino-Buslje, Cristina AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS Protein aggregation plays a central role in the pathogenesis of many neurodegenerative diseases and poses major challenges in protein engineering. A key driver of this process is the presence of aggregation-prone regions (APRs) within protein sequences. We present AggrescanAI, a deep learning-based tool that predicts residue-level aggregation propensity directly from sequence. It leverages contextual embeddings from the ProtT5 protein language model, which captures rich information implicitly encoded in the sequence, without requiring structural data. The model was trained on a set of experimentally annotated APRs, expanded via homology transfering, evaluated by cross-validation, and validated with an external benchmark. AggrescanAI outperforms state of the art predictors and captures aggregation shifts induced by pathogenic mutations. To facilitate accessibility, we provide a user-friendly and fully open Google Colab notebook: https://gitlab.com/bioinformatics-fil/aggrescanai. AggrescanAI represents a new generation of sequence-based aggregation predictors, powered by deep learning and protein language models. 2026-01-19T13:54:24Z 2026-01-19T13:54:24Z 2026-01-16 Preprint Navarro, A. M., Palacios, S., Galmarini, T., Bárcenas, O., Ventura, S., & Marino-Buslje, C. (2026). AggrescanAI: Prediction of aggregation-prone regions using contextualized embeddings. Journal of Molecular Biology. https://doi.org/10.1016/j.jmb.2026.169643 1089-8638 https://hdl.handle.net/20.500.14769/5227 https://doi.org/10.1016/j.jmb.2026.169643 /10.1016/j.jmb.2026.169643 en_US Journal of Molecular Biology
institution Instituto Tecnológico de Buenos Aires (ITBA)
institution_str I-32
repository_str R-138
collection Repositorio Institucional Instituto Tecnológico de Buenos Aires (ITBA)
language en_US
topic AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS
spellingShingle AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS
Navarro, Alvaro
Palacios, Santiago
Galmarini, Thierry
Bárcenas, Oriol
Ventura, Salvador
Marino-Buslje, Cristina
AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
topic_facet AGREGACIÓN DE PROTEÍNAS, REGIONES PROPENSAS A LA AGREGACIÓN, APRENDIZAJE PROFUNDO, MODELOS DE LENGUAJE DE PROTEÍNAS, PREDICCIÓN BASADA EN SECUENCIA, ENFERMEDADES NEURODEGENERATIVAS, MUTACIONES PATOGÉNICAS
description Protein aggregation plays a central role in the pathogenesis of many neurodegenerative diseases and poses major challenges in protein engineering. A key driver of this process is the presence of aggregation-prone regions (APRs) within protein sequences. We present AggrescanAI, a deep learning-based tool that predicts residue-level aggregation propensity directly from sequence. It leverages contextual embeddings from the ProtT5 protein language model, which captures rich information implicitly encoded in the sequence, without requiring structural data. The model was trained on a set of experimentally annotated APRs, expanded via homology transfering, evaluated by cross-validation, and validated with an external benchmark. AggrescanAI outperforms state of the art predictors and captures aggregation shifts induced by pathogenic mutations. To facilitate accessibility, we provide a user-friendly and fully open Google Colab notebook: https://gitlab.com/bioinformatics-fil/aggrescanai. AggrescanAI represents a new generation of sequence-based aggregation predictors, powered by deep learning and protein language models.
format Preprint
author Navarro, Alvaro
Palacios, Santiago
Galmarini, Thierry
Bárcenas, Oriol
Ventura, Salvador
Marino-Buslje, Cristina
author_facet Navarro, Alvaro
Palacios, Santiago
Galmarini, Thierry
Bárcenas, Oriol
Ventura, Salvador
Marino-Buslje, Cristina
author_sort Navarro, Alvaro
title AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
title_short AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
title_full AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
title_fullStr AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
title_full_unstemmed AggrescanAI: Prediction of Aggregation-Prone Regions Using Contextualized Embeddings
title_sort aggrescanai: prediction of aggregation-prone regions using contextualized embeddings
publisher Journal of Molecular Biology
publishDate 2026
url https://hdl.handle.net/20.500.14769/5227
https://doi.org/10.1016/j.jmb.2026.169643
work_keys_str_mv AT navarroalvaro aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings
AT palaciossantiago aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings
AT galmarinithierry aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings
AT barcenasoriol aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings
AT venturasalvador aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings
AT marinobusljecristina aggrescanaipredictionofaggregationproneregionsusingcontextualizedembeddings
_version_ 1865139399223672832