norm of <math>\mathbf{w}</math>. Due to the minimization of the weight vector
 
norm of <math>\mathbf{w}</math>. Due to the minimization of the weight vector
 
norm, the solution will be regularized in the sense of Thikonov
 
norm, the solution will be regularized in the sense of Thikonov
[1], improving the generalization performance.
+
<ref name="Tikhonov1977"> A. Tikhonov, V. Arsenen, Solution to Ill-Posed Problems, V.H. Winston & Sons, 1977.</ref>, improving the generalization performance.
 
The minimization has to be subject to the constraints
 
The minimization has to be subject to the constraints
    
<math>\varepsilon</math> to be zero. This is equivalent to the minimization of
 
<math>\varepsilon</math> to be zero. This is equivalent to the minimization of
 
the so-called <math>\varepsilon</math>-insensitive or Vapnik Loss Function
 
the so-called <math>\varepsilon</math>-insensitive or Vapnik Loss Function
[2], given by
+
<ref name="Vapnik1998">V. Vapnik, Statistical Learning Theory, Adaptive and Learning Systems for Signal Processing, Communications, and Control, John Wiley & Sons, 1998.</ref>, given by
    
<center><math>L_{\varepsilon}(\epsilon)=
 
<center><math>L_{\varepsilon}(\epsilon)=
 
subject to <math>\xi_n, \xi'_n \geq 0</math> where <math>C</math> is the trade-off between
 
subject to <math>\xi_n, \xi'_n \geq 0</math> where <math>C</math> is the trade-off between
 
the minimization of the norm (to improve generalization ability) and
 
the minimization of the norm (to improve generalization ability) and
the minimization of the errors [2].
+
the minimization of the errors <ref name="Vapnik1998"/>.
    
The optimization of the above constrained problem through Lagrange
 
The optimization of the above constrained problem through Lagrange
multipliers <math>\alpha_i</math>, <math>\alpha'_i</math> leads to the dual formulation[3]
+
multipliers <math>\alpha_i</math>, <math>\alpha'_i</math> leads to the dual formulation <ref name="Scholkopf1988"> A. Smola, B. Scholkopf, A Tutorial on Support Vector Regression, NeuroCOLT Technical Report NC-TR-98-030, Royal Holloway College, University of London, UK (1988).</ref>
    
<center><math>L_d=-({\boldsymbol \alpha}-{\boldsymbol \alpha'})^T{\mathbf{R}}({\boldsymbol \alpha}-{\boldsymbol \alpha'})+({\boldsymbol
 
<center><math>L_d=-({\boldsymbol \alpha}-{\boldsymbol \alpha'})^T{\mathbf{R}}({\boldsymbol \alpha}-{\boldsymbol \alpha'})+({\boldsymbol
 
the samples which are mainly affected by thermal noise (i.e., for which the
 
the samples which are mainly affected by thermal noise (i.e., for which the
 
quadratic cost is Maximum Likelihood). The linear cost is then
 
quadratic cost is Maximum Likelihood). The linear cost is then
Exception encountered, of type "Error"