Real-Word anomaly detection in Surveillance through Semi-supervised Federated Active Learning

Description

This project shows the deployment and research on the application of Semi-supervised Deep Learning models for the Anomaly Detection problem in Surveillance videos being developped on a synchronous federated training architecture under the paradigm of Federated Learning, in which training is performed distributed over many train nodes.

This research is accompanied with the deployment of an Active Learning framework for the continuous learning of the model from continuous video recording streams.

Spatio-temporal Learning of the normal behaviour

Spatio-temporal learner autoencoder architecture

Video-segments reconstruction autoencoder models proposed by [Rash_19, Yong_17, Mahm_16] are used to learn spatio-temporal features from video sequences containing normal events. The model is trained to perform accurate reconstruction for the video segments containing normal events. As no abnormal event is feed during the training, it is expected that reconstruction for normal video segments being more accurate than segments containing abnormal events.

The Root of the sum of squared erros is proposed as loss function to measure the reconstruction accuracy for the input video segments. Then, a normalized anomaly threshold μ is set to separate both classes of video segments. Additionally, a second temporal threshold λ is introduced in order to mitigate the false positives due to error peaks caused by oclusions, sudden variation of light illuminance or appearance of objects, etc. In this way, the temporal threshold determines the minimum number of consecutive anomalous cuboids required to determine a video time strip to contain an abnormal event.

Federated learning from multiple data sources

A synchronous federated learning architecture is proposed to perform the autoencoder model learning from multiple data sources. Training is being performed through two simulated client nodes. Each of one, trains a local autoencoder model from an exclusive set of video segments. To conduct the agregation from the local models to the global model, FedAvg [kone_15] is applied.

Two different sets of experiments have been designed: On the first set, each dataset is split into two disjoint subsets for each of the clients, and local training is performed on each one followed by the agregation. This experiment let us to evaluate the real performance gain through federated agregation in comparison with the base performance showed up by centralized training approaches. On the second set, federated model is trained from multiple video sources provided from different datasets capturing the same scenario from diferent views. Each client node performs local training over its video dataset. In this way, agregation capability from heterogeneus spatial structure information is evaluated and compared against base performance showed by the single models being trained on each of the individual datasets.

Experiment results proves closed quality metrics obtained by the federated models in comparison to the single models being trained over each of the single datasets. This poses up robust and accurate aggregation capabilities got by the Federated Learning paradigm for aggregating from identical spatial-structural data, as well as spatial heterogeneus data.

Training loss evolution on federated learning for client #0

Training loss evolution on federated learning for client #1

Class Rec. errors got by centralized learning and federated learning offline

Training time got by centralized learning and federated learning on UCSD Ped 1

Resources

Source code for experiments and analysis, and full documentation and experiments analysis can be found on the following repo

Examples

Following some examples of the global model's event prediction capability can be played over some of the samples from each dataset. For all the samples μ=0.5 and λ=9 are used. Move the mouse over each sample to see the reconstruction made by the model:

UCSD Ped 1 - Test 002

Original Video

Reconstr. Video

UCSD Ped 1 - Test 010

Original Video

Reconstr. Video

UCSD Ped 1 - Test 027

Original Video

Reconstr. Video

UCSD Ped 2 - Test 2

Original Video

Reconstr. Video

UCSD Ped 2 - Test 7

Original Video

Reconstr. Video

References

[Rash_19] Rashmika Nawaratne, Damminda Alahakoon, Daswin Silva, and Xinghuo Yu. Spatiotemporal anomaly detection using deep learning for real-time video surveillance. IEEE Transactions on Industrial Informatics, PP:1–1, 08 2019.
[Yong_17] Yong Shean Chong and Yong Haur Tay. Abnormal event detection in videos using spatiotemporal autoencoder. In International Symposium on Neural Networks, pages 189–196. Springer, 2017.
[Mahm_16] Mahmudul Hasan, Jonghyun Choi, Jan Neumann, Amit K Roy-Chowdhury, and Larry S Davis. Learning temporal regularity in video sequences. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 733–742, 2016.
[Kone_15] Jakub Konečný, Brendan McMahan, Daniel Ramage. Federated Optimization: Distributed Optimization Beyond the Datacenter. arXiv preprint arXiv:1511.03575

Real-Word anomaly detection in Surveillance through Semi-supervised Federated Active Learning

Nicolás Cubero Torres, Francisco Barranco Expósito, Eduardo Ros Vidal

Contents

Description

Spatio-temporal Learning of the normal behaviour

Federated learning from multiple data sources

Resources

Examples

References