Estimation of Heart Rate from Vocal Frequency Based on Support Vector Machine

Heart rate (HR) is one of the vital signs used to assess our physical condition; it would be beneficial if HR could easily be obtained without special medical instruments. In this study, a feature of vocal frequency was used to estimate HR, because it can easily be recorded with a common device such as a smartphone. Previous studies proposed that a support vector machine (SVM) that adopted the inner product as the kernel function was efficient for estimating HR to a certain extent. However, these studies did not present the effectiveness of other kernel functions, such as the hyperbolic tangent function. Therefore, this study identified a combination of kernel functions of the kernel ridge regression (KRR). In addition, features of vocal frequency to effectively estimate HR were investigated. To evaluate the effectiveness, experiments were conducted with two subjects. In the experiment, 60 sets of HRs and voice data were measured per subject. To identify the most effective kernel function, four kernel functions (the inner function, Gaussian function, polynomial function, and hyperbolic tangent function) were compared. Moreover, effective features of vocal frequency were selected with the sequential feature selection (SFS) method. As a consequence, the hyperbolic tangent function worked best, and high-frequency components of voice were efficient. However, results of this research indicated that effective vocal spectrum components to estimate HR differ depending on prediction models.


Introduction
Heart rate (HR) is one of the vital signs used to assess a patient's physical condition. In recent years, HR is often measured with a simplified measuring instrument when outside a hospital. For instance, mobile cardio-tachometers are used for prophylaxis or for treatment of lifestyle diseases, such as obesity. Obesity is responsible for many disorders, including diabetes, hypertension, and hyperlipidemia, and can lead to death [1]. For treatment of obesity, adequate exercise is effective. However, hard-intensity exercise is not advised for obese patients, and it is important to exercise under good conditions and appropriate exercise intensities. Thus, it is important to assess the physical condition and exercise intensity. HR is recommended as a simple and effective way to evaluate a patient's physical condition [2,3].
Currently, there are many applications to measure HR without special medical equipment. For example, HR can be measured with a smartphone camera or some wearable appliances. To maintain our health, these tools are useful because our physical condition should routinely and easily be assessed.
However, existing cardio-tachometers that use smartphone cameras have drawbacks from a practical perspective. For example, cardio-tachometers using smartphone cameras are easily influenced by the user's body motion or external light. Therefore, this research proposes a method to more easily estimate HR by measuring the human voice.
The relationship between HR and voice has already been observed, because HR and voice have been simultaneously measured in stress assessments and other similar assessments [4]. As shown in Fig. 1, a change in vocal frequency relates to HR variation. Fig. 1 illustrates that formants are shifted to the right (higher frequency) with a decrease in HR, and the amplitude of each formant also changes with HR variation. In sum, the higher the HR is, the larger the amplitude of the fifth formant. Although factors that change vocal frequency are not limited to HR, this example indicates that HR is one factor that influences vocal frequency.
Given this relationship between HR and vocal frequency, some methods to estimate HR from voice has been proposed [5][6][7], and [8]. Studies [5] and [6] take advantage of fluctuations synchronized with R- www.ssjournals.com waves of the ECG signal in the vocal spectrogram. In [5], HR is roughly estimated by image processing for 2D images of a vocal spectrogram. In [6], the authors chose the component of vocal frequency that is most affected by R-waves in one-dimensional scanning of the speech spectrogram; this algorithm realized accurate HR estimations. On the other hand, studies [7] and [8] proposed HR estimation methods based on the support vector machine (SVM), and HR values were estimated to some level. In [7] and [8], the inner product was used as the kernel function for the SVM, but evaluation results obtained by other kernel functions, such as the Gaussian or hyperbolic tangent function, were not presented. Generally, it is known that estimation accuracy depends on an adopted kernel function. Therefore, this research first compares several kernel functions and identifies the most suitable one to estimate HR. In addition, frequency bands that effectively estimate HR value are also examined. If a more effective combination of kernel function and voice feature can be identified, it facilitates the practical use of HR estimation from voice.  Fig. 2 shows a processing diagram of HR estimation from voice data. The proposed HR estimation method consists of following three steps. First, the recorded voice signal is transformed into frequency representations by using the fast Fourier transform (FFT). Next, vocal frequency components to effectively estimate HR are selected by the sequential feature selection (SFS) method [9], and selected vocal frequency components are used as explanatory variables in the subsequent estimation step. Finally, an SVM-based [10,11] HR estimation is performed using the explanatory variables. (Generally, the term "explanatory variable" is also called "feature," but the term "feature" means a variable mapped into an unknown feature space in the context of the SVM.)

Selecting effective explanatory variables
In this research, explanatory variables to effectively estimate HR are selected with the SFS method. (Explanatory variables are vocal frequency components ranging from 120 Hz to 10000 Hz. In this research, time-domain features are not used.) In other words, the SFS method is performed to select the best subset to increase the accuracy of HR estimation. The SFS method is performed according to the following steps: I.
SFS begins with an evaluation of the accuracy of the kernel ridge regression (KRR)-based estimation for each explanatory variable, identifying one explanatory variable to most accurately estimate HR. II. SFS is performed using a pair of explanatory variables. The explanatory variable selected in Step I is coupled with each of the other variables from the remaining explanatory variable subset, identifying the pair yielding the highest estimation accuracy. III. SFS next evaluates three variables: the pair from Step II and another selected from the remaining subset. IV. Steps I-III are repeated with four or more explanatory variables.
Usually, the estimation accuracy peaks when the number of selected explanatory variables reaches a certain number. In sum, the best explanatory variable subset can be singled out for maximum accuracy in the SFS procedure. (Note that the "accuracy" is defined as a correlation coefficient between the estimated and measured HRs in this research.)

Support vector machine-based regression method
The SVM, which was proposed by Vapnik [12], can realize an effective regression in the feature space mapped by nonlinear transformation. SVM performs implicit mapping of the input data into a high-dimensional feature space using a kernel function. This is defined as an inner product in the feature space and does not require an understanding of the types of features that are being used. In this research, KRR, which is one of the applications of SVM algorithms, was used to estimate HR from the vocal frequency. KRR is a ridge-regression method based on some kernel matrices, and it can also be derived from a modified least-squares method.
Where k is a vector whose elements are ) ( , and K is a matrix whose elements are ) , , This vector and matrix are used to construct the linear regression in the feature space. In this research, ) ( s x f is the estimated HR, and y is the measured HR.
A kernel function ) , ( z x K computes the inner product of the two vectors x and z in a given feature mapping, which are realized in several forms such as The kernel functions described in equations (2) through (5) are called the inner function, Gaussian function, polynomial function, and hyperbolic tangent function, respectively. In these functions, parameters a, b, c, m, and  must be optimized depending on obtained datasets, and there are some algorithms to decide them. In this research, these parameters have been empirically set.
In the following evaluation step, these four kernel functions were tested to estimate HR, and accuracies of HR estimation obtained by these kernel functions are compared.

Experiment
To evaluate the proposed method to estimate HR from vocal frequency, experiments measuring the HR and voice were conducted involving two subjects: a 33-year-old male and a 25-year-old female.
In the measurement experiments, factors that might affect recorded vocal frequency were controlled. The measurement experiments were conducted in a sufficiently quiet room; moreover, subjects always sat with their head upright in a relaxed manner, because IJASR|VOL 02|ISSUE 01|2016 www.ssjournals.com physical posture can influence the production of sound. First, the subject's voice was recorded using a voice recorder (Sony ICD-TX50). The voice recorder was kept 5 cm from the subject's mouth, and the subject was asked to sustain the sound [ӓ] (Japanese vowels) for 7 s. The sampling frequency of the voice recording was 44.1 kHz with 16-bit resolution. Next, immediately following the voice recording, HR was measured with a cardio-tachometer (Omron HEM-1010), assuming that the HR does not change suddenly if there are no external stimuli. In fact, subjects were not given any stimuli after the voice measurement. These two steps were repeated 60 times for each subject.

Evaluation and results
As described above, KRR-based HR estimations were performed using the four kernel functions described in equations (2) through (5). To evaluate the estimation accuracies, the correlation coefficient between measured and estimated HRs is computed for each kernel function. For each subject, one data set is initially selected as evaluation data from the 60 trials. The remaining 59 are used as the training sets, and the HR value is estimated. Next, another data set that has not been selected as evaluation data is selected as the new evaluation data, and HR is estimated with the remaining 59 training sets. This same selection and estimation procedure is repeated 60 times, and the 60 estimated HR values are computed. Table 1 shows these correlation coefficients for the male subject, female subject, and both subjects.

Discussion
To estimate the HR value from the vocal frequency, the KRR was used in this research, and four kernel functions (inner function, Gaussian function, polynomial function, and hyperbolic tangent function) were compared to examine which works best for the HR estimation.
As shown in Table 1, the hyperbolic tangent function yielded the most accurate estimation of HR for the male, female, and both subjects (correlation coefficient = 1.0), and the Gaussian function was the second-most effective (correlation coefficient > 0.9). By contrast, the inner function and polynomial function were effective for each subject's (male and female) HR estimation, but these did not produce excellent results when the learning machines for the KRR were generated with both male and female subjects' vocal frequency components.
Figs. 3 through 8 show histograms of the vocal frequency components that were selected as effective features to estimate HR by the SFS method when the hyperbolic tangent function or Gaussian function was used as the kernel function. These figures illustrate that the vocal frequency components higher than approximately 9000 Hz were most selected. (For example, Fig. 4 shows that eight out of the nine effective frequency components were selected from frequencies higher than 9000 Hz.) In addition, these IJASR|VOL 02|ISSUE 01|2016 www.ssjournals.com figures illustrate that frequency components ranging from 6000 Hz 7000 Hz were the second-most effective, behind frequencies higher than 9000 Hz.
These results indicate that high-frequency components of voice are effective for HR estimation. Meanwhile, the previous study [13], which proposed an explicit model of the relationship between HR and vocal frequency components, proposed that low-and medium-frequency components were effective for HR estimation. However, an account for these differences is unknown. Therefore, in future work, physical or physiological interpretation of these differences of effective vocal frequency components between prediction models will be investigated with more subjects.

Conclusion
The goal of this research is to identify an effective combination of kernel function of the KRR and features of vocal frequency aimed at HR estimation. To evaluate the effectiveness of kernel functions and features of vocal frequency, HR and voice data were recorded from male and female subjects. To identify the most suitable kernel function, the inner function, Gaussian function, polynomial function, and hyperbolic tangent function were compared. In addition, effective vocal spectrum features were selected with the SFS method. As a result, the hyperbolic tangent function realized the most accurate HR estimation, and it is implied that high-frequency voice components were most efficient. However, results of this research indicated that effective vocal spectrum components to estimate HR differ substantially depending on prediction models.