Titre : Relational concept analysis through constraint satisfaction

Sujet proposé dans : M2 MOSIG, Projet --- M2 MSIAM, Projet

Responsable(s) :

Mots-clés : Constraint programming, Data mining, Formal concept analysis, Relational concept analysis, Artificial intelligence
Durée du projet : 5 month, possibility to pursue PhD studies
Nombre maximal d'étudiants : 1
Places disponibles : 1
Interrogation effectuée le : 19 avril 2024, à 23 heures 04


Description

Formal concept analysis, extracting concepts found in a data set, can be expressed as a constraint satisfaction problem. Relational concept analysis deals simultaneously with several interrelated data sets. We aim at answering if it, or which part of it, can be expressed with constraints.

Formal concept analysis (FCA, [Ganter 1999]) is a data mining approach that identifies concepts within a data set. More precisely, from a set of data items, e.g., Researchers, identified by attributes, e.g., topic, age, it finds out concepts which are characterized by the set of objects sharing a set of attributes, e.g., Researchers in artificial intelligence of less than 25 years old.

Relational concept analysis (RCA, [Rouane-Hacene 2013]) enables to find interdependent concepts within related data sets, i.e., that Researchers are also characterised by the Papers they wrote and the Labs they belong to which, in turns, depend on other Researchers. For that purpose, it splits the data in related datasets (here Researchers, Papers, Labs) which are processed individually by formal concept analysis. The resulting concepts are then injected back to the initial FCA problems with the help of 'scaling operators' and formal concept analysis is performed again on these problems, for instance new attributes would be: has-some-members-as-Reserachers-in-AI-of-less-than-25-years-old, or wrote-all-her-papers-with-AI-Researchers. This process is iterated until it reaches a fixed point, i.e., until no more concepts are added.

We use such techniques, in the ANR Elker project, to extract sameAs links in RDF data sets [Atencia 2014d, Vizzini 2017a]

A recent trend is to express data mining algorithms as constraint satisfaction problems (CSP) [Bessiere 2017, Guns 2017]. Such problems can then be solved automatically through techniques known as constraint programming. In that context, each solution of the constraint satisfaction problem corresponds to a concept for the corresponding data mining problem. This has been achieved for formal concept analysis whose principles may be expressed in a few constraints.

The goal of this work is to find out if it is possible to express relational concept analysis as a constraint satisfaction problem. For datasets without cycles, the answer should be positive as it simply consists in performing several FCA steps in a row. It however remains to express the mapping between both problems.

More interestingly, relational concept analysis is designed to work when there are cyclic dependencies between data sets and concepts, that a Researcher belongs to a Lab whose members are Researchers. This is what requires fixed point computation. This is not an obstacle to using constraint satisfaction since FCA is already the computation of a fixed point. However, this requires to be cleanly defined.

Hence, the contribution of this work would be to answer positively or negatively the raised question: is it possible or not to express relational concept analysis to constraint satisfaction. There is an option to answer this question partially, e.g., YES for acyclic RCA and NO for cyclic RCA, or to develop specific techniques to precise cases of RCA. The question could be investigated theoretically, by designing a constraint satisfaction problem equivalent to RCA, or practically, by finding encodings for such problems to CSP.

Answering this question, beside providing the opportunity to use constraint programming to perform relational concept analysis, would lead to new questions such as 'Can relational concept analysis be then reduced to formal concept analysis?' and especially provide an encoding for that.

References:

[Atencia 2014d] Manuel Atencia, Jérôme David, Jérôme Euzenat, What can FCA do for database link key extraction?, Proc. ECAI workshop on "what can FCA do for AI?", Prague (CK), 2014
[Bessiere 2017] Christian Bessiere, Luc De Raedt, Tias Guns, Lars Kotthoff, Mirco Nanni, Siegfried Nijssen, Barry O'Sullivan, Anastasia Paparrizou, Dino Pedreschi, Helmut Simonis, The Inductive Constraint Programming Loop, IEEE Intelligent systems 32(5):44-52, 2017
[Ganter 1999] Bernhard Ganter, Rudolf Wille, Formal concept analysis: mathematical foundations, Springer, Berlin (DE), 1999
[Guns 2017] Tias Guns, Anton Dries, Siegfried Nijssen, Guido Tack, Luc De Raedt, MiningZinc: A declarative framework for constraint-based mining, Artificial intelligence 244:6-29, 2017
[Hacene 2013a] Mohamed Rouane Hacene, Marianne Huchard, Amedeo Napoli, Petko Valtchev, Relational concept analysis: mining concept lattices from multi-relational data, Annals of Mathematics and Artificial Intelligence
[Vizzini 2017a] Jérémy Vizzini, Data interlinking with relational concept analysis, Mémoire de Master Informatique, option Data science, Université Grenoble Alpes, 2017

Links: