From a matrix or data.frame with dimension NxD, where N>1, D>0, `Dirac()` computes the simplest kernel for categorical data. Samples should be in the rows and features in the columns. When there is a single feature, `Dirac()` returns 1 if the category (or class, or level) is the same in two given samples, and 0 otherwise. Instead, when D>1, the results for the D features are combined doing a sum, a mean, or a weighted mean.
Arguments
- X
Matrix (class "character") or data.frame (class "character", or columns = "factor"). The elements in X are assumed to be categorical in nature.
- comp
When D>1, this argument indicates how the variables of the dataset are combined. Options are: "mean", "sum" and "weighted". (Defaults: "mean")
"sum" gives the same importance to all variables, and returns an unnormalized kernel matrix.
"mean" gives the same importance to all variables, and returns a normalized kernel matrix (all its elements range between 0 and 1).
"weighted" weights each variable according to the `coeff` parameter, and returns a normalized kernel matrix.
- coeff
(optional) A vector of weights with length D.
- feat_space
If FALSE, only the kernel matrix is returned. Otherwise, the feature space is also returned. (Defaults: FALSE).
References
Belanche, L. A., and Villegas, M. A. (2013). Kernel functions for categorical variables with application to problems in the life sciences. Artificial Intelligence Research and Development (pp. 171-180). IOS Press. Link
Examples
# Categorical data
summary(CO2)
#> Plant Type Treatment conc uptake
#> Qn1 : 7 Quebec :42 nonchilled:42 Min. : 95 Min. : 7.70
#> Qn2 : 7 Mississippi:42 chilled :42 1st Qu.: 175 1st Qu.:17.90
#> Qn3 : 7 Median : 350 Median :28.30
#> Qc1 : 7 Mean : 435 Mean :27.21
#> Qc3 : 7 3rd Qu.: 675 3rd Qu.:37.12
#> Qc2 : 7 Max. :1000 Max. :45.50
#> (Other):42
Kdirac <- Dirac(CO2[,1:3])
## Display a subset of the kernel matrix:
Kdirac[c(1,15,50,65),c(1,15,50,65)]
#> 1 15 50 65
#> 1 1.0000000 0.6666667 0.3333333 0.0000000
#> 15 0.6666667 1.0000000 0.3333333 0.0000000
#> 50 0.3333333 0.3333333 1.0000000 0.3333333
#> 65 0.0000000 0.0000000 0.3333333 1.0000000