From a matrix or data.frame with dimension NxD, where N>1, D>0, `Dirac()` computes the simplest kernel for categorical data. Samples should be in the rows and features in the columns. When there is a single feature, `Dirac()` returns 1 if the category (or class, or level) is the same in two given samples, and 0 otherwise. Instead, when D>1, the results for the D features are combined doing a sum, a mean, or a weighted mean.
Arguments
- X
 Matrix (class "character") or data.frame (class "character", or columns = "factor"). The elements in X are assumed to be categorical in nature.
- comp
 When D>1, this argument indicates how the variables of the dataset are combined. Options are: "mean", "sum" and "weighted". (Defaults: "mean")
"sum" gives the same importance to all variables, and returns an unnormalized kernel matrix.
"mean" gives the same importance to all variables, and returns a normalized kernel matrix (all its elements range between 0 and 1).
"weighted" weights each variable according to the `coeff` parameter, and returns a normalized kernel matrix.
- coeff
 (optional) A vector of weights with length D.
- feat_space
 If FALSE, only the kernel matrix is returned. Otherwise, the feature space is also returned. (Defaults: FALSE).
References
Belanche, L. A., and Villegas, M. A. (2013). Kernel functions for categorical variables with application to problems in the life sciences. Artificial Intelligence Research and Development (pp. 171-180). IOS Press. Link
Examples
# Categorical data
summary(CO2)
#>      Plant             Type         Treatment       conc          uptake     
#>  Qn1    : 7   Quebec     :42   nonchilled:42   Min.   :  95   Min.   : 7.70  
#>  Qn2    : 7   Mississippi:42   chilled   :42   1st Qu.: 175   1st Qu.:17.90  
#>  Qn3    : 7                                    Median : 350   Median :28.30  
#>  Qc1    : 7                                    Mean   : 435   Mean   :27.21  
#>  Qc3    : 7                                    3rd Qu.: 675   3rd Qu.:37.12  
#>  Qc2    : 7                                    Max.   :1000   Max.   :45.50  
#>  (Other):42                                                                  
Kdirac <- Dirac(CO2[,1:3])
## Display a subset of the kernel matrix:
Kdirac[c(1,15,50,65),c(1,15,50,65)]
#>            1        15        50        65
#> 1  1.0000000 0.6666667 0.3333333 0.0000000
#> 15 0.6666667 1.0000000 0.3333333 0.0000000
#> 50 0.3333333 0.3333333 1.0000000 0.3333333
#> 65 0.0000000 0.0000000 0.3333333 1.0000000
