In Machine Learning, single label classification problems are concerned with learning a model from a set of instances that are associated to only one label $l$ from a set of disjoint labels $L$. If the number of labels in $L$ is equal to 2, then the learning problem is called a $\it{binary}$ classification. If the number of labels is more than 2, then it is a $\it{multi\text{-}class}$ classification problem.
But applications such as text categorization, medical diagnosis, music categorization may belong to more than one class. For example: These types of problems belong to $\it{multi\text{-}label}$ classification. |
||||

## Single-label vs. Multi-labelLet $L$ be a finite set of labels $L = \{ \lambda_j : j = 1, \ldots, n\}$ and $D$ be the set of instances $D = \{(x_i , Y_ i ) : i = 1, \ldots, m\}$, where $x_i$ is the vector of features of an instance and $Y_i \subseteq L$ is the subset of labels of the instance $x_i$. The subset $Y_i$ is then defined as a binary vector $Y_i = \{y_1 , y_2 , . . . , y_n \}$, where each $y_j \in \{0, 1\}$. $y_j = 1$ indicates the presence of a label $λ_j$ in the set of relevant labels for $x_i$ . Suppose every instance $x_i$ has $k$ features $f_1, f_2, \ldots, f_k$. $\newcommand\T{\Rule{0pt}{1em}{0em}}$
Single-Label $\bf{ y \in \{0,1\}}$\begin{array}{|c|c c c c c|c|} \hline & f_1 & f_2 & f_3 & f_4 & f_5 & \lambda \T \\\hline x_1 & 2 \T & 0.1 & 4 & 1.3 & 2 & 1 \\ x_2 & 1 \T & 0.5 & 2 & 1.7 & 0 & 0 \\ x_3 & 3 \T & 0.4 & 1 & 2.1 & 3 & 0 \\ x_4 & 0 \T & 0.2 & 3 & 1.6 & 1 & 1 \\ x_5 & 5 \T & 0.3 & 0 & 1.1 & 2 & 1 \\ x_6 & 4 \T & 0.6 & 6 & 1.5 & 3 & 0 \\\hline \end{array} Multi-Label $\bf{y_1 , y_2 , . . . , y_n\in \{0,1\}^n}$\begin{array}{|c|c c c c c|c c c c|} \hline & f_1 & f_2 & f_3 & f_4 & f_5 & \lambda_1 & \lambda_2 & \lambda_3 & \lambda_4 \T \\\hline x_1 \T & 2 & 0.1 & 4 & 1.3 & 2 & 1 & 0 & 1 & 0\\ x_2 \T & 1 & 0.5 & 2 & 1.7 & 0 & 0 & 0 & 0 & 1\\ x_3 \T & 3 & 0.4 & 1 & 2.1 & 3 & 0 & 1 & 0 & 0\\ x_4 \T & 0 & 0.2 & 3 & 1.6 & 1 & 1 & 0 & 0 & 1\\ x_5 \T & 5 & 0.3 & 0 & 1.1 & 2 & 1 & 0 & 0 & 0\\ x_6 \T & 4 & 0.6 & 6 & 1.5 & 3 & 0 & 1 & 0 & 0\\\hline \end{array} |
||||

## Multi-label methodsThe multi-label learning approaches can be organized in three main families: ## Problem Transformation(1) Binary Relevance (BR) is probably the most popular transformation method. It learns |L| binary classifiers, one for each label. ## Binary Relevance (BR)Binary Relevance is one of the most popular transformation methods which learns $n$ binary classifiers ($n = |L|$) one for each label. BR transforms the original dataset into $n=|L|$ datasets, where each dataset contains all the instances of the original dataset. $n = |L|$
Once these datasets are ready, it is easy to train with any off-the-shelf binary classifier. ## Multi-label Data: Datasets\begin{array}{|c|c|c|c|c|c|} ## R Code
# include library library(RWeka) # specify number of features nFeatures <- list() nFeatures[["yeast"]] <- 103 nFeatures[["emotions"]] <- 72 nFeatures[["scene"]] <- 294 nFeatures[["medical"]] <- 1449 MultiLabel.Load.arff <- function(dataset, nFeatures) { # load train data trainFile <- paste(".../multilabel/",dataset,"/",dataset,"-train.arff",sep="") trainData <- read.arff(trainFile) # load test data testFile <- paste(".../multilabel/",dataset,"/",dataset,"-test.arff",sep="") testData <- read.arff(testFile) return(list(trainDataX=trainData[,1:nFeatures], trainDataY=trainData[,-(1:nFeatures)], testDataX=testData[,1:nFeatures], testDataY=testData[,-(1:nFeatures)])) } dataset <- "scene" # yeast emotions scene medical # load data data <- MultiLabel.Load.arff(dataset, nFeatures[[dataset]]) trainDataX = data$trainDataX trainDataY = data$trainDataY testDataX = data$testDataX testDataY = data$testDataY labelNames <- colnames(trainDataY) predictions <- matrix(0,nrow=nrow(testDataX), ncol=length(labelNames)) predictions <- data.frame(predictions) colnames(predictions) <- labelNames for (label in labelNames) { y <- trainDataY[c(label)] cat(label,"\n") J48Train = cbind(y, trainDataX) formula = as.formula(paste(label, "~.")) model = J48(formula, data=J48Train) # predict predictions[,label] <- predict(model, newdata=testDataX, type = c( "class", "probability" )) } # label ## Multi-label EvaluationThe evaluation of methods that learn from multi-label data requires metrics that differ from those employed for single-label data. Given an instance $x_i$ , the resulting set of labels predicted by a multi-label classifier is denoted by $Z_i$.
library(pROC) THRESHOLD <- 0.5 # # compute Hamming Loss. Note the predictions must be threshold into 0/1 # # Y is true label matrix N * L where N is # of test instances and L is # of labels # Z is prediction matrix N * L where N is # of test instances and L is # of labels # HammingLoss <- function(Y,Z) { if (nrow(Y) != nrow(Z) || ncol(Y) != ncol(Z)) { stop("Dim of Y and Z does not match...") } nRow <- nrow(Y) nCol <- ncol(Y) Z[Z < THRESHOLD] <- 0 Z[Z >= THRESHOLD] <- 1 return(sum(Y != Z)/(nRow*nCol)) } # evaluate results <- HammingLoss(data$testDataY, predictions) print(results) ## MEKA: A Multi-label Extension to WEKAJava implementations of multi-label algorithms are available in the Meka software packages. MEKA is a WEKA-based framework for multi-label classification and evaluation. It also serves as a wrapper for MULAN. MEKA can be used from the command line or GUI in any ensemble scheme and contains many evaluation metrics. Also thresholds calibrated automatically (or optionally, set ad-hoc) |

## Have something to add to the conversation? We’re all ears!