Antenna Beamforming using Support Vector Machines

From MeliWiki
Jump to: navigation, search

<Contributed to MeLi Wiki by Professor Christos Christodoulou, Department of Electrical and Computer Engineering, University of New Mexico>


Abstract

Support Vector Machines (SVM) have improved generalization performance over other classical optimization techniques. Here, we introduce an SVM-based approach for linear array processing and beamforming. The development of a modified cost function is presented and it is shown how it can be applied to the problem of linear beamforning. Finally, comparison examples are included to show the validity of the new minimization approach.


Introduction

Support Vector Machines (SVM) have shown clear advantage in prediction, regression and estimation over classical approaches in a wide range of applications due to its improved generalization performance. We introduce here a framework to apply the SVM approach to linear antenna array processing.

Antenna array signal processing involves complex signals, for which a complex formulation of the SVM is needed. We introduce this formulation by introducing the real and imaginary parts of the error in the primal optimization and then proceeding as usual to solve a complex valued constrained optimization problem. We formulate the algorithm using a modified cost function which includes the unavoidable numerical regularization for the optimization. The adjustment of the parameters into this cost function leads to an improved robustness of the method against thermal and multiuser noise.

The resulting algorithm is a natural counterpart of the real valued Support Vector Regressor, which can be immediately applied to array signal processing.

We introduce the derived formulation as the optimizer for a linear beamformer. Several experiments illustrate the advantage of SVM over the minimum mean squared error (MMSE) based algorithms due to its robustness against outliers and improved generalization ability. The first experiments compare the behavior of both algorithms in an environment in which interferent signals are close to the desired ones, thus producing nongaussian noise. The last experiment illustrates the improved generalization ability of the SVM when small data sets are used for training, which is common in communications applications.


The Support Vector Approach and the Cost Function

Let <math>\mathbf{x}[n]</math> be spatially sampled data. A linear beamformer can be written as


Eq (1)
<math>d[n]=\mathbf{w}^T \mathbf{x}[n]+\epsilon[n]</math>

where <math>{\mathbf{x}}[n]</math> is the vector of <math>M</math> elements of the array output and <math>\epsilon[n]</math> is the output error.

Coefficients <math>\mathbf{w}</math> are usually estimated through the minimization of a certain cost function on <math>\epsilon[n]</math>

The SVM approach can be applied to the adjustment of this model. The main idea of SVM is to obtain the solution which has the minimum norm of <math>\mathbf{w}</math>. Due to the minimization of the weight vector norm, the solution will be regularized in the sense of Thikonov [1], improving the generalization performance. The minimization has to be subject to the constraints


Eq (2)
<math>d[n]-{\mathbf{w}}^T {\mathbf{x}}[n] \leq \varepsilon + \xi_n</math>
<math>-d[n]+{\mathbf{w}}^T {\mathbf{x}}[n]\leq \varepsilon + \xi'_n</math>
<math>\xi[n], \xi'[n] \geq 0 </math>

not to be trivial. <math>\xi_n</math> and <math>\xi'_n</math> are the "slack" variables or losses. The optimization is intended to minimize a cost function over these variables. The parameter <math>\varepsilon</math> is used to allow those <math>\xi_n</math> or <math>\xi'_n</math> for which the error is less that <math>\varepsilon</math> to be zero. This is equivalent to the minimization of the so-called <math>\varepsilon</math>-insensitive or Vapnik Loss Function [2], given by


Eq (3)
<math>L_{\varepsilon}(\epsilon)=

\begin{cases} |\epsilon|-\varepsilon &|e|>\varepsilon\\ 0&|e|<\varepsilon

\end{cases}</math>

The functional to be minimized is then


Eq (4)
<math>L_p=||{\mathbf{w}}||^2+C\sum_n L_\varepsilon({\xi_n},{\xi'_n})</math>

subject to <math>\xi_n, \xi'_n \geq 0</math> where <math>C</math> is the trade-off between the minimization of the norm (to improve generalization ability) and the minimization of the errors [2].

The optimization of the above constrained problem through Lagrange multipliers <math>\alpha_i</math>, <math>\alpha'_i</math> leads to the dual formulation [3]


Eq (5)
<math>L_d=-({\boldsymbol \alpha}-{\boldsymbol \alpha'})^T{\mathbf{R}}({\boldsymbol \alpha}-{\boldsymbol \alpha'})+({\boldsymbol

\alpha}-{\boldsymbol \alpha'})^T{\mathbf{y}}-({\boldsymbol

\alpha}+{\boldsymbol \alpha'}){\mathbf{1}}\varepsilon</math>

to be minimized with respect to <math>({\alpha}_i-{\alpha'}_i)</math>.

It involves the Gram matrix <math>\mathbf{R}</math> of the dot products of the data vectors <math>\mathbf{x}[n]</math>. This matrix may be singular and thus producing an ill-conditioned problem. To avoid this numerical inconvenience, a small diagonal <math>\gamma {\mathbf{I}}</math> is added to the matrix prior to the numerical optimization.

We present here a modified derivation of the SVM regressor which leads to a more convenient equivalent cost function (Fig. \ref{fig:robustcost})


Eq (6)
<math>L_{R}(e)=

\begin{cases} 0&|e|<\varepsilon\\ \frac{1}{2\gamma}(|e|-\varepsilon)^2-\varepsilon &\varepsilon \leq |e| \leq \varepsilon+e_C\\ C(|e|-\varepsilon)-\frac{1}{2}\gamma C^2 & e_C

\leq |e|
\end{cases}</math>

where <math>e_C=\varepsilon+\gamma C</math>

Figure 1. Cost function applied to the SVM beamformer.

This cost function provides a functional that is numerically regularized by the matrix <math>\gamma I</math>. As it can be seen, the cost function is quadratic for the data which produce errors between <math>\varepsilon</math> and <math>e_C</math>, and linear for errors above <math>e_C</math>. Thus, one can adjust the parameter <math>e_C</math> to apply a quadratic cost for the samples which are mainly affected by thermal noise (i.e., for which the quadratic cost is Maximum Likelihood). The linear cost is then applied to the samples which are outliers [4], [5]. Using a linear cost function, the contribution of the outliers to the solution will not depend on its error value, but only on its sign, thus avoiding the bias that a quadratic cost function produces.

Finally, we generalize the derivation to the complex-valued case, as it is necessary for array processing.

Support Vector Machine Beamformer

The signal received by an antenna array consists of a number tdesired or friendly signals, which may come from one or more directions of arrival with different amplitudes and phases and the interfering or undesired signals. The idea here is to use SVM to find the optimum weights w's that will yield the best cost function that will steer the antenna array to direct its radiation pattern (its beams) towards the direction of the desired signals and place nulls in the directions where interfering signals are coming. Next, we show how this optimization can be achieved using SVMs on a linear antenna array. The output vector of an <math>M</math>-element array receiving <math>K</math> signals, shown in Figure 2 , can be written in matrix notation as


Eq (7)
<math>{\mathbf{x}[n]} = {\mathbf{As}}[n] + {\mathbf{g}}[n]</math>

where


Eq (8)
<math>{\mathbf{A}} = [{\mathbf{a}}(\theta_1) \cdots {\mathbf{a}}(\theta_k)]</math>
<math>{\mathbf{a}}(\theta_i)=[1 e^{-jk_i} \cdots e^{-j(M-1)k_i}]^T</math>
<math>{\mathbf{s}}[n]=[s_1[n] \cdots s_k[n]]^T</math>
<math>{\mathbf{g}}[n]=[g_1[n] \cdots g_M[n]]^T</math>

are respectively the <math>M \times K</math> steering matrix and vectors, the received signals and the thermal noise present at the output of each array element. The spatial correlation matrix of the received signals is:


Eq (9)
<math>\begin{align}
{\mathbf{R}}&=\frac{1}{N}{\mathbf{X}}{\mathbf{X}}^H\\
&=E[({\mathbf{A}}{\mathbf{s}}[n]+{\mathbf{g}}[n])({\mathbf{A}}{\mathbf{s}}[n]+{\mathbf{g}}[n])^H]\\

&={\mathbf{A}}{\mathbf{P}}{\mathbf{A}}^H+\sigma_g^2I

\end{align}</math>

provided that the signal and noise are independent, where <math>{\mathbf{P}}</math> is the autocorrelation of <math>{\mathbf{s}}[n]</math>, and <math>\sigma_g^2</math> is the noise power.

Figure 2. An M-element antenna array with a K number of incident signals.

The optimal beamformer needs to satisfy


Eq (10)
<math>\min \{{\mathbf{w}}{\mathbf{R}}{\mathbf{w}}^H\}</math>

subject to the constraint


Eq (11)
<math>{\mathbf{w}}^H{\mathbf{s}}_d[n]=y[n]</math>

That is, it minimizes the output power subject to the condition that the desired signal <math>{\mathbf{s}}_d[n]</math> produces a given output <math>y[n]</math>. Applying Lagrange multipliers [6] to that constrained optimization problem leads to the solution


Eq (12)
<math>{\mathbf{w}}_{opt} = {\mathbf{R}}^{-1}{\mathbf{S}}_d[{\mathbf{S}}^H_d {\mathbf{R}}^{-1} {\mathbf{S}}_d]^{-1}{\mathbf{y}}</math>

where <math>{\mathbf{S}}_d</math> is the matrix of all desired input vectors, and <math>{\mathbf{y}}</math> is the vector of all desired output signals.

How do we apply SVMs?

The output vector <math>{\mathbf{x}[n]}</math> is linearly processed to obtain the desired output <math>d[n]</math> [7], [8]. The expression for the output of the array processor is


Eq (13)
<math>{y[n]} = {\mathbf{w}}^T{\mathbf{x}}[n] = d[n] + \epsilon[n]</math>

where <math>{\mathbf{w}} = [w_1 \cdots w_M]</math> is the weight vector of the array and <math>{\epsilon[n]}</math> is the estimation error.

For a set of <math>N</math> observed samples of <math>\{x[n]\}</math> and when nonzero empirical errors are expected, the functional to be minimized is:


Eq (14)
<math>\frac{1}{2} ||{\mathbf{w}}||^2 + \sum_{n=1}^N

L_R\left(e[n],\varepsilon,\gamma,C

\right)</math>

Therefore, particularizing to error cost function (6), we have to minimize


Eq (15)
<math>\frac{1}{2} ||{\mathbf{w}}||^2 + \sum_{} L_R\left({\xi_n +\xi'_n}\right) + \sum_{} L_R\left({\zeta_n + \zeta'_n}\right)</math>

subject to


Eq (16)
<math>\begin{align}

&Re\left(d[n]-{\mathbf{w}}^T {\mathbf{x}}[n]\right) \leq \varepsilon + \xi_n\\ &Re\left(-d[n]+{\mathbf{w}}^T {\mathbf{x}}[n]\right) \leq \varepsilon + \xi'_n\\ &Im\left(d[n]-{\mathbf{w}}^T {\mathbf{x}}[n]\right) \leq \varepsilon + \zeta_n\\ &Im\left(-d[n]+{\mathbf{w}}^T {\mathbf{x}}[n]\right) \leq \varepsilon + \zeta'[-]\\ &\xi[n], \xi'[n], \zeta[n], \zeta'[n] \geq 0

\end{align}</math>

where <math>\xi[n]</math> (<math>\xi'[n]</math>) stand for positive (negative) errors in the real part of the output -analogously for <math>\zeta[n]</math> (<math>\zeta'[n]</math>) in the imaginary part. Note that errors are either negative or positive and, therefore, only one of the losses takes a nonzero value, this is, either <math>\xi[n]</math> or <math>\xi'[n]</math> (either <math>\zeta[n]</math> or <math>\zeta'[n]</math>) is null. This constraint can be written as <math>\xi[n] \xi'[n] = 0</math> (<math>\zeta[n] \zeta'[n] = 0</math>). Finally, as in other SVM formulations, parameter <math>C</math> can be seen as a trade-off factor between the empirical risk and the structural risk.

It is possible to transform the minimization of primal functional (15) subject to constraints in (16), into the optimization of the dual functional or Lagrange functional. Firstly, we introduce the constraints into the primal functional by means of Lagrange Multipliers, obtaining the following primal-dual functional:


Eq (17)
<math>\begin{align}

L_{pd}&=\frac{1}{2} ||{\mathbf{w}}||^2 +\\ &C\sum_{n \in I_1}^N(\xi_n+\xi'_n) +C \sum_{n \in I_1}^N(\zeta_n+\zeta'_n) \\

& \frac{1}{2\gamma}\sum_{n \in I_2}^N \left({\xi_n^2 + {\xi}{'}_{n}^{2}}\right) +\frac{1}{2\gamma} \sum_{n \in I_2}^N \left({\zeta_n^2 +\zeta{'}_{n}^{2}}\right)\\

&-\sum_{n=n_0}^N(\lambda_n\xi_n+\lambda'_n\xi'_n)-\sum_{n=k_0}^N(\eta_n\zeta_n+\eta'_n\zeta'_n)\\ &\sum_{n=n_0}^N \alpha_n [Re\left(d[n] -{\mathbf{w}}^T {\mathbf{x}}[n] \right) -\varepsilon -\xi_n\\ &\sum_{n=n_0}^N \alpha'_n [Re\left(-d[n] +{\mathbf{w}}^T {\mathbf{x}}[n] \right) -\varepsilon -\xi'_n\\ &\sum_{n=n_0}^N \beta_n [Im\left(d[n] -{\mathbf{w}}^T {\mathbf{x}}[n] \right) -j\varepsilon -j\zeta_n\\ &\sum_{n=n_0}^N \beta'_n [Im\left(-d[n]+{\mathbf{w}}^T {\mathbf{x}}[n]+\right) -j\varepsilon -j\zeta'_n]

\end{align}</math>

with the dual variables or Lagrange Multipliers constrained to <math>\alpha_n, \beta_n, \lambda_n, \eta_n, \alpha'_n, \beta'_n, \lambda'_n, \eta'_n \geq 0</math> and with <math>\xi_n, \zeta_n, \xi'_n, \zeta'_n \geq 0</math>. Note that cost function (1) has two active segments, a quadratic one and a linear one. Thus, sets <math>n\in I_2</math> and <math>n \in I_1</math> in (17) denote respectively the slacks <math>\xi_n</math>, <math>\xi'_n</math> which lie in the quadratic and linear segments of the cost function.

The following constraints must also be fulfilled:


Eq (18)
<math>\begin{align}

\alpha_n \alpha_n' =0\\ \beta_n \beta_n' =0.

\end{align}</math>

Besides, the Karush-Kuhn-Tucker (KKT) conditions [2] force <math>\lambda_n \xi_n=0</math>, <math>\lambda'_n \xi'_n=0</math> and <math>\eta_n\zeta_n=0</math>, <math>\eta'_n \zeta'_n=0</math>. Functional (17) has to be minimized with respect to the primal variables and maximized with respect to the dual variables. By minimizing <math>L_{pd}</math> with respect to <math>w_i</math> we obtain an optimal solution for the weights


Eq (19)
<math>{\mathbf{w}} = \sum_{n=0}^N \psi_n {\mathbf{x}}^*[n]</math>

where <math>\psi_n=\alpha_n - \alpha'_n +j(\beta_n - \beta'_n)</math>. This result is analogous to the one for the real-valued SVM problem, except that now Lagrange multipliers <math>\alpha_n</math> and <math>\beta_n</math> for both real and imaginary components have been considered. Optimizing <math>L_{pd}</math> with respect to <math>\xi_n</math> and <math>\zeta_n</math> and applying the KKT conditions leads to an analytical relationship between the residuals and the Lagrange multipliers, this relationship being given by


Eq (20)
<math>\begin{align}

(\alpha-\alpha')& =

 \begin{cases}
   -C, & Re(e) \leq -e_C\\
   \frac{1}{\gamma}(Re(e)+\varepsilon), & -e_C \leq Re(e) \leq -\varepsilon\\
   0, & -\varepsilon \leq Re(e) \leq \varepsilon\\
  \frac{1}{\gamma}(Re(e)-\varepsilon), & \varepsilon \leq Re(e) \leq e_C\\
   C, & e_C \leq Re(e)
 \end{cases}\\

(\beta-\beta') & = \begin{cases} -C, & Im(e) \leq -e_C\\ \frac{1}{\gamma}(Im(e)+\varepsilon), & -e_C \leq Im(e) \leq -\varepsilon\\ 0, & -\varepsilon \leq Im(e) \leq \varepsilon\\ \frac{1}{\gamma}(Im(e)-\varepsilon), & \varepsilon \leq Im(e) \leq e_C\\ C, & e_C \leq -Im(e) \end{cases}

\end{align}</math>

As remarked earlier, it is possible to continue towards the purely dual formulation of the problem solvable using Quadratic Programming (QP) approaches as usually done in the literature but, this approach is computationally intensive, and lacks the required flexibility for our communications problem at hand. Alternative optimization methods such as those relying on Iterative Re-Weighted Least Squares (IWRLS) have been introduced in [9] [10], with clear advantages in terms of computational cost and flexibility of operation.

Applying (19), the norm of the complex coefficients can be written as


Eq (21)
<math>\left \|{\mathbf{w}} \right \|^2 = \sum_{i=0}^N \sum_{j=n_0}^N \psi_j \psi^*_i {\mathbf{x}}[j] {\mathbf{x}}^*[i]</math>

At this point, it is worth using matrix notation again and store partial correlations in (21) as matrices


Eq (22)
<math>R[j,i] = {\mathbf{x}}[j] {\mathbf{x}}^*[i]</math>

so that the norm of the coefficients can be written


Eq (23)
<math>||{\mathbf{w}}||^2 ={\mathbf{\psi}}^H {\mathbf{R}} {\mathbf{\psi}}</math>

<math>{\mathbf{R}}</math> being the matrix with elements <math>R[j,i]</math>, and <math>\psi =(\psi_{n_0} ... \psi_{N})^T</math>. By substituting (19) in functional (17), the dual functional to be maximized is as follows:


Eq (24)
<math>\begin{align}

&L_d=\frac{1}{2}{\mathbf{\psi}}^H {\mathbf{R}} {\mathbf{\psi}} - Re[{\mathbf{\psi}}^H {\mathbf{R}}({\mathbf{\alpha - \alpha'}})] + \\ &+Im[{\mathbf{\psi}}^H {\mathbf{R}}({\mathbf{\beta - \beta'}})] \\ &+ Re [({\mathbf{\alpha - \alpha'}})^T {\mathbf{y}}] - Im[({\mathbf{\beta - \beta'}})^T {\mathbf{y}}] -\\ &- ({\mathbf{\alpha}} +{\mathbf{\alpha}}'){\mathbf{1}}\varepsilon - ({\mathbf{\beta}} +{\mathbf{\beta}}'){\mathbf{1}}\varepsilon + L_C

\end{align}</math>

<math>L_C</math> being a function of <math>\psi</math>. In order to find <math>L_C</math> in (24), note that provided that in both intervals <math>\xi_m^{(')}, \zeta_m^{(')} \neq 0</math>, then, by KKT conditions, <math>\lambda_m^{(')}, \eta_m^{(')} = 0</math>. Intervals <math>I_1</math> and <math>I_2</math> must be treated separately:

  • Taking into account (20) into interval <math>I_1</math>, <math>\alpha_m^{(')}=\beta_m^{(')}=C</math>. Then, the last term of thefunctional for <math>I_1</math> is


Eq (25)
<math>L_C(I_1)=C {\mathbf{I}}</math>
<math>{\mathbf{I}}</math> being the identity.
  • Regarding (20) into interval <math>I_2</math>, so <math>\alpha_m^{(')}=\frac{1}{\gamma}\xi_m^{(')}</math> and <math>\beta_m^{(')}=\frac{1}{\gamma}\zeta_m^{(')}</math>. The last term for this interval is


Eq (26)
<math>L_C(I_2)=\frac{\gamma}{2} \psi_2^H {\mathbf{I}} \psi_2</math>
<math>\psi_2</math> being the elements of interval <math>I_2</math>.

Both terms can be grouped as


Eq (27)
<math>L_C=\frac{\gamma}{2} \psi^H I \psi + (1-\frac{\gamma}{2})C D_{I_2}</math>

<math>D_{I_2}</math> being a diagonal matrix with terms corresponding to <math>I_1</math> set to 1 and the remaining set to <math>0</math>. As the last term of <math>L_C</math> is just a constant, it can be removed from the optimization.

By regrouping terms and taking into account that <math>{\mathbf{\psi}}^H{\mathbf{R}} {\mathbf{\psi}} = {\mathbf{\psi}}^H Re({\mathbf{R}}) {\mathbf{\psi}}</math>, the functional (24) can be written in a more compact form


Eq (28)
<math>L_d=-\frac{1}{2}{\mathbf{\psi}}^H Re({\mathbf{R}} + \frac{\gamma}{2}{\mathbf{I}}){\mathbf{\psi}} + Re[\psi^H {\mathbf{y}}]-(\alpha+\alpha'+\beta+\beta'){\mathbf{1}}\varepsilon</math>

Expression (28) is formally equal to the real-valued expression SVM functional.

Examples

Bit Error Rate performance

We first compared the algorithm with the regular Least Squares approach for an antenna array of 6 elements. The desired signals come from the angles between <math>-0.1 \pi</math> and <math>0.25\pi</math>, with amplitudes <math>1</math> and <math>0.3</math>, and the interfering signals come from within the range of <math>-0.05\pi</math>, <math>0.1\pi</math> and <math>0.3\pi</math> with an amplitude of <math>1</math>.

In order to train the beamformer, a burst of 50 known symbols is sent. Then, the Bit Error Rate (BER) is measured with bursts of 10,000 unknown symbols.

In the first two examples, we fix <math>\gamma</math> of eq. (6) to <math>10^{-6}</math> and then the product <math>\gamma C</math> is adjusted to the noise standard deviation. That way, most of the samples which are corrupted only by (thermal) noise will fall in the quadratic area, whereas the outliers produced by interfering signals will fall in the linear area.


We calculated the BER performance of LS and SVM for different noise levels from <math>0 dB</math> to <math>-15 dB</math>. Each BER has been evaluated by averaging the results of 100 independent trials. The results can be seen in Figure 3.

Figure 3. BER performance for example 1. SVM (cont. line) and regular LS (dash) beamformers.

In the next case, the desired signal coming from the angles of arrival (AOA) <math>-0.1 \pi</math> and <math>0.25 \pi</math>, with amplitudes <math>1</math> and <math>0.3</math>, where the interfering signals come from the AOAs <math>-0.02\pi</math>, <math>0.2\pi</math>, <math>0.3\pi</math> with amplitude <math>1</math> (see Figure 4).

Figure 4. BER performance for example 2. SVM (cont. line) and regular LS (dash) beamformers.

In the last example, the interfering signals are much closer to the desired ones, thus biasing the LS algorithm. The better performance of SVM is due to its better robustness against the nongaussian outliers produced by the interfering signals.

Robustness against overfittng

One advantage of the SVM is that its generalization ability is controlled by the regularization imposed by the minimization of the weight vector norm. We highlight this fact by calculating the Bit Error Rate for different number of training samples. The results are shown in Figure 5.

Figure 5. BER performance against the number of training samples. SVM (cont. line) and regular LS (dash) beamformers.


Conclusions

We introduce in this work a way to apply the SVM framework to linear array beamforming. SVMs have a clear advantage over the minimum mean square error based algorithms in those cases in which small data sets are available for training and where nongaussian noise is present, due to the fact that the generalization ability of the machine is controlled. In order to make the algorithm adequate to array processing purposes, we first apply an alternative cost function which is suitable in problems in which there are gaussian noise and other nongaussian sources, as multiuser interference which may produce outliers in the signal. Also, this cost function provides a natural way to explain the numerical regularization present in any SVM.

Ongoing work is being done in the applications of nonlinear Support Vector Machines for beamforming and detection of angle of arrival.


Student Project

Consider two antenna arrays: One is composed of 6 elements and the other one with 10 elements. The elements are isotropic in both cases. Assume the desired signal comes from the angles of arrival (AOA) <math>-0.1 \pi</math> and <math>0.25 \pi</math>, with amplitudes <math>0.5</math> and <math>0.3</math>, and the interfering signals come from the AOAs <math>-0.05\pi</math>, <math>0.1\pi</math> and <math>0.35\pi</math> with amplitude <math>1</math>.

Can you train the beamformer to maximize its reception at the desired signals for both the 6 and the 10 element arrays?

What are the differences in both cases in terms of training, accuracy and BER.


Wiki Assessment

  1. What cost function is being used here?
  2. Why are we using the complex formulation of the SVM instead of the real-valued one?
  3. What happens if the desired signals to be detected are very close to the interfering signals?
  4. What happens if the interfering signals have a higher amplitude than the desired signals to be detected?
  5. What do we mean by primal optimization?
  6. What we mean by robustness?
  7. What is the Gram matrix?
  8. What is the Thikonov regularization?
  9. What other experiments we could have done with this particular algorithm?


References

  1. A. Tikhonov, V. Arsenen, Solution to Ill-Posed Problems, V.H. Winston & Sons, 1977.
  2. 2.0 2.1 2.2 V. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, 1998.
  3. A. Smola, B. Scholkopf, A Tutorial on Support Vector Regression, NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK (1988).
  4. P. J. Huber, Robust Statistics: a review, Vol. 43, Ann. Statistics, 1972.
  5. K.-R. Muller, A. Smola, G. Ratsch, B. Scholkopf, J. Kohlmorgen, V. Vapnik, Predicting time series with support vector machines, in: B. Scholkopf, C. Burges, A. Smola (Eds.), Advances in Kernel Methods — Support Vector Learning, MIT Press, Cambridge, MA, 1999, pp. 243–254.
  6. S. Haykin, Adaptive Filter Theory, 4th Edition, Prentice-Hall, Englewood Cliffs, N. J., 2001.
  7. M. Martınez-Ramon, N. Xu, C. Christodoulou, Beamforming using Support Vector Machines, IEEE Antennas and Wireless Propagation Letters 4 (2005) 439–442.
  8. M. Martınez-Ramon, C. Christodoulou, Support Vector Array Processing, IEEE Transactions on Antennas and PropagationSubmitted.
  9. A. Navia-Vazquez, F. Perez-Cruz, A. Artes-Rodrıguez, A. Figueiras-Vidal, Weighted Least Squares Training of Support Vector Classifiers Leading to Compact and Adaptive Schemes, IEEE Trans. Neural Networks Vol. 12, No 5, pp. 1047-1059.
  10. F. Perez-Cruz, P. L. Alarcon-Diana, A. Navia-Vazquez, A. Artes-Rodrıguez, Fast Training of Support Vector Classifiers, in: T. K. Leen, T. G. Dietterich, V. Tresp (Eds.), Advances in Neural Information Processing Systems 13, MIT Press, 2001.