CNN–LSTM with Soft Attention Mechanism for Human Action Recognition in Videos
Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. Attention mechanisms have become a very important concept within deep learning approach, their operat...
Guardado en:
| Autores principales: | , , |
|---|---|
| Formato: | Artículo publishedVersion |
| Lenguaje: | Español |
| Publicado: |
FIUBA
2021
|
| Materias: | |
| Acceso en línea: | https://elektron.fi.uba.ar/elektron/article/view/130 https://repositoriouba.sisbi.uba.ar/gsdl/cgi-bin/library.cgi?a=d&c=elektron&d=130_oai |
| Aporte de: |
| Sumario: | Action recognition in videos is currently a topic of interest in the area of computer vision, due to potential applications such as: multimedia indexing, surveillance in public spaces, among others. Attention mechanisms have become a very important concept within deep learning approach, their operation tries to imitate the visual capacity of people that allows them to focus their attention on relevant parts of a scene to extract important information. In this paper we propose a soft attention mechanism adapted to a base CNN–LSTM architecture. First, a VGG16 convolutional neural network extracts the features from the input video. Then an LSTM classifies the video into a particular class. To carry out the training and testing phases, we used the HMDB-51 and UCF-101 datasets. We evaluate the performance of our system using accuracy as an evaluation metric, obtaining 40,7 % (base approach), 51,2 % (with attention) for HMDB-51 and 75,8 % (base approach), 87,2 % (with attention) for UCF-101. |
|---|