## Multi-Relational Learning

In many real world domains, hidden information is present in the inter-relationships between different classes within the data. This information can be relational, hierarchical or conditional in nature. Most of the times, this information is implicitly codified while designing the data schemas for the problem at hand. While data mining, all such schemas are denormalized since conventional data mining algorithms work on single table structures at a time. By denormalization, the implicit relationships present in the original schema are lost and thus, data mining starts off by losing valuable information.

To overcome this problem, data analysts denormalize data in an interactive fashion using their background domain knowledge to preserve data inter-relationships. However, challenge lies in fully automating this process and that is where the emerging field of Relational Data Mining appears. For a comprehensive book, check out. Relational Data Mining by Dzeroski and Lavrac.

Domains where RDM (Relational Data Mining) is holding great potential include bioinformatics, social networking, viral marketing, natural language processing and text mining to name a few. The inherent nature of all such domains is high data dimensionality, catagorical data and data that can be represented as a graph structure.

In high dimensional data mining, the main problem is the sparsity of feature vectors constructed. And the learned feature vectors tend to be larger than the orignal data set if the data is also catagorical in nature. A naive approach under such an environment is to try to superimpose data as a normal distribution but this is not a robust strategy. Adding on to this, in certain domains(like bioinformatics), it is quite often a problem for the data miner to fully understand the nature of the domain and thus there is a tendency to miss out important relationships while preparing the data for analysis.

The essence of RDMs is in an expressive language for patterns involving relational data, a language more expressive than conventional (propositional) data mining patterns and less complex than first order logic. Inductive Logic Programming (ILP) in context of KDD provides the language sometimes called relational logic to express patterns containing data inter-relationships.

There is a counterpart relational data structure for many data mining tasks, for CART, there is S – CART, for C4.5, there is a relational version called TILDE. Similary, there are relational association rules and relational decision trees which are build on the notion of a relational distance measure like RIBL.

However, even though Multi-relational learning holds promise, the field is still far from being able to generalize methodologies for the whole spectrum of data mining problems. The field of Statistical Relational Learning, as it is sometimes coined holds onto an assumption that models built over apparent data and relational data (within it) yields better results than models built over only apparent data. This however, as pointed out by [2] is not the general case and in certain data sets, only the intrinsic (apparent) data provides better models compared to those datasets containing relational data too.

Secondly, due to the inherent complexity of relational data, it has been observed that deterministic relational learners don’t produce as good results as probabilistic relational learners. Statistical relational learning accurately predicts structured data and is able to chalk out dependencies between data instances which have been ignored a lot in previous machine learning setups.

Besides the nature of the data sets, relational learning algorithms have also developed various approaches in solving the problem. Earlier relational learners concentrated more on propositionalization of relational data into ‘flat’ data and then applying conventional learners to it. However, recent tactics involve incorporating the relational data schemes in the learner’s framework directly. [1] However, both approaches continue to progress.

Thus, the field of relational learning is gaining wide acceptance and suitable methodologies for applications in general fields are being devised.

No trackbacks yet.