This research shows a prototype for crowd location and counting for earthquakes based on deep learning and the infrastructure of a state-of-the-art 5G standalone network deployed at the Universidad de Concepcion, Chile. The system uses an 8 MP panoramic network camera to capture real-time crowd images, which are sent to a Deep Learning Server (DLS) over the 5G network. The camera provides visible color images, and its sensor technology can provide color images even at night. The DLS uses frames from the video feed and generates Focal Inverse Distance Transform (FIDT) maps, in which the counting and location of people are carried out. In particular, the FIDT maps are generated from the crowd images using a deep-learning model composed of two cascaded autoencoders. The 5G technology allows the system to transfer data from the camera to DLS at high speed, an essential feature for a system that will help authorities make critical decisions during natural disasters. Under this scenario, and considering that the number of rescuers is usually limited, our system enables a better distribution of them among several crowded places by instantly knowing the number of people at any time of the day or night.
Earthquakes, and their cascading threats to economic and social sustainability, are a common problem between China and Chile. In such emergencies, automatic image recognition systems have become critical tools for preventing and reducing civilian casualties. Human crowd detection and estimation are fundamental for automatic recognition under life-threatening natural disasters. However, detecting and estimating crowds in scenes is nontrivial due to occlusion, complex behaviors, posture changes, and camera angles, among other issues. This paper presents the first steps in developing an intelligent Earthquake Early Warning System (EEWS) between China and Chile. The EEWS exploits the ability of deep learning architectures to properly model different spatial scales of people and the varying degrees of crowd densities. We propose an autoencoder architecture for crowd detection and estimation because it creates compressed representations for the original crowd input images in its latent space. The proposed architecture considers two cascaded autoencoders. The first performs reconstructive masking of the input images, while the second generates Focal Inverse Distance Transform (FIDT) maps. Thus, the cascaded autoencoders improve the ability of the network to locate people and crowds, thereby generating high-quality crowd maps and more reliable count estimates.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.