Analysis and Development of Human Action Recognition Techniques for Video Sequences by Chirag Ishwarbhai PatelMaterial type: Visual materialPublisher: Ahmedabad Nirma Institute of Technology 2015Description: 171p Ph. D. Thesis with Synopsis and CD.DDC classification: TT000030 Online resources: Institute Repository (Campus Access)
|Item type||Current location||Collection||Call number||Status||Date due||Barcode||Item holds|
|Thesis||Institute of Technology||Reference||TT000030 PAT (Browse shelf)||Not For Loan||TT000030|
|CD/DVD||Institute of Technology||Reference||TT000030 PAT (Browse shelf)||Not For Loan||TT000030-1|
|Synopsis||Institute of Technology||Reference||TT000030 PAT (Browse shelf)||Not For Loan||TT000030-2|
Guided by: Dr. R. N. Patel With Synopsis and CD 11EXTPHDE05
Video surveillance has long been in use to monitor security sensitive areas such as banks, department stores, highways, crowded public places and sensitive areas like borders. To automate the tracking and monitoring process, it is required to mimic the human interaction. The main objective of the work is to detect a moving object and there by recognize the human action in the surveillance video. This process encompasses phases like moving object detection in static and dynamic background and human action recognition in constrained and unconstrained video sequences.
Moving object detection is a crucial and critical task for any surveillance system. In this work, moving object detection is explored using visual attention where there is no need for background formulation. Various bottom up approaches and combination of bottom up and top down approaches are proposed in this work. Study shows that the presented models are efficient in moving object detection as it does not re- quire the learning of background model and previous video frames. Implementation results also indicate that this proposed approach works efficiently even in presence of minor movements in the background and other outdoor conditions. Due attention is also given to detecting moving object placed in vidoes with dynamic background. To improve the performance of bottom up approach, top down features like Local Binary Pattern (LBP), Haar wavelet features and Histogram of Oriented Gradient (HOG) are combined with the saliency map. Visual attention and feature based cues are used to develop this model to compute regions of interest in a given video. Cues are computed considering temporal aspects of video. To prove efficiency of proposed algorithm, indoor and outdoor backgrounds are considered. Experimental study shows that this approach works properly with significant improvement in results when compared to existing methods.
An approach that exploits action dissimilarities has been proposed using the fusion of features, treating action recognition as a multi-class classification problem. The proposed method utilizes feature descriptor preparation to improve the performance classification. In this approach, moving object is detected and segmented from backgrounds. Then, the features are extracted using the Histogram of Oriented Gradient (HOG) classifier from a segmented moving object. Features such as an average of HOG features over ten non-overlapping video frames, velocity, and displacement, as well as a combination of these three features, are used in the proposed work. Here, the Artificial Neural Network (ANN), Support Vector Machines (SVM), Multiple Kernel Learning (MKL) and late fusion methods are used for classification purposes. The proposed approach is performed and compared with the state-of-the-art methods for action recognition on two publicly available benchmark datasets (KTH and Weizmann). The results demonstrate that our method outpeforms the existing state-of-the-art methods. By fusing several modalities, features and (or) classifier decision score, the gap between correlation of action and its class has been bridged. In this work, six fusion models inspired from the early fusion schemes, late fusion schemes and intermediate fusion schemes have been presented. Performance of all models is evaluated with the ASLAN benchmark dataset of action videos. Significant improvement is achieved with the proposed fusion schemes when compared to usual fusion schemes. This approach proves to be better than the available state-of-the-art methods.