Description
This project shows the deployment and research on the application of Semi-supervised Deep Learning models for the Anomaly Detection problem in Surveillance videos being developped on a synchronous federated training architecture under the paradigm of Federated Learning, in which training is performed distributed over many train nodes.
This research is accompanied with the deployment of an Active Learning framework for the continuous learning of the model from continuous video recording streams.
Spatio-temporal Learning of the normal behaviour
Video-segments reconstruction autoencoder models proposed by [Rash_19, Yong_17, Mahm_16] are used to learn spatio-temporal features from video sequences containing normal events. The model is trained to perform accurate reconstruction for the video segments containing normal events. As no abnormal event is feed during the training, it is expected that reconstruction for normal video segments being more accurate than segments containing abnormal events.
The Root of the sum of squared erros is proposed as loss function to measure the reconstruction accuracy for the input video segments. Then, a normalized anomaly threshold μ is set to separate both classes of video segments. Additionally, a second temporal threshold λ is introduced in order to mitigate the false positives due to error peaks caused by oclusions, sudden variation of light illuminance or appearance of objects, etc. In this way, the temporal threshold determines the minimum number of consecutive anomalous cuboids required to determine a video time strip to contain an abnormal event.
Federated learning from multiple data sources
A synchronous federated learning architecture is proposed to perform the autoencoder model learning from multiple data sources. Training is being performed through two simulated client nodes. Each of one, trains a local autoencoder model from an exclusive set of video segments. To conduct the agregation from the local models to the global model, FedAvg [kone_15] is applied.
Two different sets of experiments have been designed: On the first set, each dataset is split into two disjoint subsets for each of the clients, and local training is performed on each one followed by the agregation. This experiment let us to evaluate the real performance gain through federated agregation in comparison with the base performance showed up by centralized training approaches. On the second set, federated model is trained from multiple video sources provided from different datasets capturing the same scenario from diferent views. Each client node performs local training over its video dataset. In this way, agregation capability from heterogeneus spatial structure information is evaluated and compared against base performance showed by the single models being trained on each of the individual datasets.
Experiment results proves closed quality metrics obtained by the federated models in comparison to the single models being trained over each of the single datasets. This poses up robust and accurate aggregation capabilities got by the Federated Learning paradigm for aggregating from identical spatial-structural data, as well as spatial heterogeneus data.
Resources
- Source code for experiments and analysis, and full documentation and experiments analysis can be found on the following repo
Examples
Following some examples of the global model's event prediction capability can be played over some of the samples from each dataset. For all the samples μ=0.5 and λ=9 are used. Move the mouse over each sample to see the reconstruction made by the model:
References
[Rash_19] Rashmika Nawaratne, Damminda Alahakoon, Daswin Silva, and Xinghuo Yu. Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Transactions on Industrial Informatics, PP:1–1, 08 2019.
[Yong_17] Yong Shean Chong and Yong Haur Tay. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks, pages 189–196. Springer, 2017.
[Mahm_16] Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 733–742, 2016.
[Kone_15] Jakub Konečný, Brendan McMahan, Daniel Ramage. Federated Optimization: Distributed Optimization Beyond the Datacenter. arXiv preprint arXiv:1511.03575