الفهرس | Only 14 pages are availabe for public view |
Abstract In this thesis, we propose a Pyramid-attentive module that relies on multi-part features and multiple attention systemsto aggregate features from multi-levels and learns attention from multi aspects. Self-attention is used to strengthen most discriminative features in spatial and channeldomains to capture global information. We propose part relationbetween different levels to learn robust features from parts while temporal attention is used to aggregate the temporal features. We introduce integration for the most discriminative features in global view and multi-local views and study the effects on four challenging datasets. We also explore the generalization ability of our model by a cross dataset. On the PRID2011 dataset, it achieves 98.9% for Rank1(estimate the average probability of correct pedestrian with the highest-ranked) and it improves by 2.6% compared to the state of the art and achieves 100% for Rank 5. On the iLIDS-VIDdataset, it achieves 92.8% for Rank1 and it improves by 3.9 % compared to the state of the art and achieves 100% for Rank 10. On the DukeMTMC-VideoReID dataset, it achieves 97.2% for Rank1 and it improves by 1% compared to the state of the art and achieves 100% for Rank 20. On the MARSdataset, it achieves 90.6% for Rank1, and it improves by 0.6% compared to the state of art. |