Driver fatigue monitoring system based on multi-layer deep learning framework and motion analysis

Abstract: The latest development of the automobile industry has aroused the research interest of researchers in fatigue driving monitoring. The intention is to develop an effective driver monitoring system, which can detect abnormal psychological and physical states in time and reduce traffic accidents caused by fatigue driving. Much of the literature now focuses specifically ON the study of physiological signals, by measuring heart rate variability (HRV) to obtain information about cardiac motion. In fact, HRV is also a valid measure of physiological stress because it can provide information related to the activity of the cardiovascular system innervated by the autonomic nervous system. This paper aims to reconstruct the photoplethysmogram (PPG) signal in a robust manner by extracting facial feature points, analyzing the subtle skin movements caused by blood pressure. It is concluded that the PPG signal detected by the sensor has a strong correlation with the PPG signal reconstructed using the facial landmarks, and we obtain evidence to support this conclusion from the experimental results.

1 Introduction

Drowsiness is a physiological state characterized by a reduced level of consciousness and difficulty staying awake. According to the National Safety Council, the proportion of fatal accidents caused by drowsy driving is rising significantly in the United States. Therefore, it is of great significance to develop an effective early warning system that can detect in advance that the driver’s physiological condition is not suitable for driving. According to reports, studies have shown that heart rate variability (HRV) is associated with the level of driver attention. To be precise, heart rate variability is an important indicator representing an individual’s physiological adaptability and behavioral flexibility. Heart rate variability is assessed by measuring blood pressure using the PPG signal. Specifically, the PPG signal is composed of peak blood vessel volumes representing successive cardiac cycles. The PPG detection method uses an LED light source to illuminate different parts of the skin, and then uses a photodiode to evaluate the reflected intensity of the light. Although physiological signals allow us to monitor drowsiness, recent research has focused on assessing driver fatigue using computer vision techniques. While developing a face detection system in an automotive environment is certainly challenging, there are many ways to use cameras to determine the blink rate and thus assess fatigue. Different from other studies, our method focuses on using computer vision techniques to detect and extract facial landmarks by analyzing the pixel intensity changes of previously recorded video sequences to define the time series of facial landmarks. More specifically, the rationale of our method is also to reveal subtle facial movements caused by changes in blood pressure through “video zoom”. The purpose of this study is to construct a PPG signal by defining a time series of facial landmarks instead of using sensors.

The rest of this paper is structured as follows: Section II presents related research results; Section III provides an overview of PPG signals and introduces our pipeline based on long short memory and convolutional neural networks. Section IV explains the experimental procedure. Finally, Section V discusses the advantages of our method and future research directions.

2 Related research

Most of the papers published in the past have used physiological signals to detect driver drowsiness, and achieved high detection accuracy. In fact, many studies have proven that driver fatigue monitoring solutions based solely on computer vision technology may not necessarily be effective, especially visual methods that focus on analyzing traffic signs, which tend to fail when road conditions are poor.

Some researchers have published a photo volume description signal (PPG) detection research results, the author achieved good detection results using low-power wireless PPG sensors. Another approach was for the authors to assess fatigue using low- and high-frequency PPG signals detected at the fingers and earlobes. The research cited in this paper mainly evaluates HRV signaling by studying ECG and PPG signaling. However, the method cited in this paper has high requirements on computational performance and requires the integration of expensive detection equipment on the vehicle. Although the integrated sensor is not necessarily a direct measurement tool, in order to accurately obtain physiological signals, the driver still needs to place the hand or other parts of the body (such as earlobes or fingers) on the sensor, which is an important factor for promoting applications in automobiles. limit. This paper takes a different approach and proposes an innovative framework. The basic principle is to capture the driver’s face image, collect facial feature points, and reconstruct the PPG signal to evaluate the HRV signal and fatigue level.

3 Background and pipeline scheme

As mentioned before, we propose an innovative driver drowsiness state monitoring method without using sensors to acquire PPG signals. The research results of some scholars have explained how the video magnification method can reveal the movement changes of the human face by magnifying ordinary video images, because blood pressure changes in successive cardiac cycles can cause color changes in different parts of the skin. Studies have demonstrated that autonomic nervous system activity modulates certain physiological processes, such as blood pressure and breathing rate, which can be indirectly measured by assessing heart rate variability signals, which occur during periods of physiological stress, extreme fatigue, and drowsiness Variety.

Assessing HRV heart rate variability requires the use of biofeedback tools or software, as well as high-quality sensors to detect ECG signals, and a powerful processor to manage the large amount of data. ECG signal is a traditional heart rate variability assessment method. However, this method has some drawbacks in its use. Although the detection effect is good, the subtle movement of the human body during the data acquisition (data sampling) process will cause some noise in the signal. and artifacts. In order to overcome the problems of ECG, the industry proposed that PPG signal is a reliable solution, and the ability to detect changes in blood volume enables PPG to effectively detect subtle skin movements that are difficult to observe with the naked eye. In particular, by analyzing the PPG signal, we were able to delineate changes in heart rate over a specific period of time, showing whether both branches of the autonomic nervous system (parasympathetic and sympathetic) were functioning properly. Generally, a small HRV value indicates a constant heart rate interval; a large HRV value indicates an abnormal heart rate interval. A very normal heart rhythm and subtle changes in heart rate can determine if attention is reduced due to chronic physiological stress. However, there is no one standard HRV value because HRV values ​​vary from person to person.

With this in mind, we developed a driver drowsiness monitoring system using a combination of long-term short-memory (LSTM) neural network and convolutional neural network (CNN). The pipeline mechanism proposed in this paper represents an advance in cardiac motion assessment methods, as it uses a low frame rate (25fps) camera to detect and extract key feature points in face images and analyze the pixel changes for each video frame. Precisely speaking, LSTM is a powerful solution for evaluating hidden nonlinear correlations between data.

Specifically, the output of the LSTM pipeline is the predicted time series of facial feature points after synthesizing the raw PPG target data detected by the sensor.

In addition, the accurate classification of the CNN model indicates that the LSTM predictions are valid and can determine the level of attention of the car driver.

4 experiments

In total, 71 objects participated in our LSTM-CNN pipeline run. More specifically, the dataset is PPG samples from patients/drivers of different gender, age (between 20 and 70 years) and pathology. In this case, we collect data not only on healthy subjects, but also on patients with high blood pressure, diabetes, etc. Taking into account the difference between the two drowsiness states, the respective PPG signal samples of the two drowsiness were measured separately. Specifically, we simulated two scenarios of full wakefulness and sleepiness confirmed by synchronized ECG sampling signals, with Beta and Alpha waveforms confirming the state of brain activity during arousal and sleepiness, respectively. The simulation interval for each scenario was set at 5 minutes to ensure that the system had sufficient time to complete the initial calibration and continuous learning in real time. At the same time, we use a low frame rate (25fps) full HD camera to record a video of the driver’s face. As mentioned above, we first use the dlib library based on Kazemi and Sullivan machine learning algorithms to detect previously recorded video frames and extract the face. Then, calculate the pixel intensity associated with each feature point, and the change of pixel intensity in each frame, determine the time series of face feature points, and input it into the LSTM neural network.

4.1 CNN pipeline

This section will describe the CNN model architecture used in the experiments in more detail. The CNN architecture proposed in this paper provides strong evidence for validating LSTM predictions. Specifically, our CNN model is able to track and learn the facial expressions of car drivers, leading to improved drowsiness detection. To train the model, we set the batch size to 32 and the initial learning rate to 0.0001. Furthermore, we used 32 neurons in the hidden layer and 2 output neurons in the binary classification.

We are very optimistic about the experimental results, because the accuracy rate is 80%.

4.2 Long Short-Term Memory (LSTM, Long Short-Term Memory) pipeline

Driver fatigue monitoring system based on multi-layer deep learning framework and motion analysis

Fig. 1. LSTM pipeline

Regarding the ability of Long Short-Term Memory (LSTM) to detect the relevance of sequential data (time series), we constructed an LSTM model with facial feature point time series as input data and raw PPG signal as target data , to reconstruct the PPG signal (Fig. 1). After adjusting all time series values ​​in the range of (0.2, 0.8) using the MinMaxScaler algorithm, we conducted model training considering the following parameters. The simulation training uses 256 neurons, the batch size is 128, and the initial learning rate and dropout rate are set to 0.001 and 0.2, respectively. To evaluate the robustness of the PPG reconstructed signal, we calculated the frequencies of the PPG minima (Fourier spectrum), we specifically analyzed the frequencies of these points, and compared the frequencies of the original PPG minima with the reconstructed PPG minima. frequency.

5 Conclusion

Figure 2. Fast Fourier Transform (FFT) spectrum of the original PPG minima (blue) and FFT of the reconstructed PPG minima (green).

Finally, we provide an effective monitoring system based on LSTM-CNN to determine the driver’s drowsiness by assessing cardiac activity through PPG signals. Different from other methods, our method reconstructs PPG signals from facial landmark data and does not involve sensor systems. As mentioned earlier, we construct an LSTM pipeline, using facial feature point time series as input data and PPG detected by sensors as target data, to demonstrate the robustness of PPG reconstructed signals. In addition, we build a CNN model that not only classifies the driver’s physiological state, but also validates the LSTM predictions. Finally, we computed the Fast Fourier Transform (FFT) spectrum of the original PPG minima and the FFT spectrum of the reconstructed PPG minima (Figure 2). Experimental results demonstrate the promising application of our method, as we are able to distinguish between sleepy and awake subjects with nearly 100% accuracy, which is consistent with the average results achieved by similar pipelines reported in the scientific literature. The use of improved PPG sensors and deep processing of PPG signals using the special capabilities learned by the Stacked-AutoEconder architecture will bring improvements to the pipeline proposed in this paper. This is the direction the author is currently researching.