norm of <math>\mathbf{w}</math>. Due to the minimization of the weight vector

norm, the solution will be regularized in the sense of Thikonov

<ref name="Tikhonov1977"> A. Tikhonov, V. Arsenen, Solution to Ill-Posed Problems, V.H. Winston & Sons, 1977.</ref>, improving the generalization performance.

<math>\varepsilon</math> to be zero. This is equivalent to the minimization of

the so-called <math>\varepsilon</math>-insensitive or Vapnik Loss Function

<ref name="Vapnik1998">V. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, 1998.</ref>, given by

subject to <math>\xi_n, \xi'_n \geq 0</math> where <math>C</math> is the trade-off between

the minimization of the norm (to improve generalization ability) and

the minimization of the errors <ref name="Vapnik1998"/>.

The optimization of the above constrained problem through Lagrange

multipliers <math>\alpha_i</math>, <math>\alpha'_i</math> leads to the dual formulation~~[3]~~

multipliers <math>\alpha_i</math>, <math>\alpha'_i</math> leads to the dual formulation <ref name="Scholkopf1988"> A. Smola, B. Scholkopf, A Tutorial on Support Vector Regression, NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK (1988).</ref>

<center><math>L_d=-({\boldsymbol \alpha}-{\boldsymbol \alpha'})^T{\mathbf{R}}({\boldsymbol \alpha}-{\boldsymbol \alpha'})+({\boldsymbol

the samples which are mainly affected by thermal noise (i.e., for which the

quadratic cost is Maximum Likelihood). The linear cost is then