MIMOSA
Mining Interpretable Models explOiting Sophisticated Algorithms

Principal Investigator (PI): Riccardo Guidotti
Host Institution: University of Pisa, Italy
Project duration: 25/03/2024 - 24/03/2029
The MIMOSA project introduces a paradigm shift from “data mining” to “model mining” that tries to revolutionize the theory and practice of AI and to establish an alignment between human reasoning and the logic of machines, bridging the gap between AI and cognitive science, and promoting the adoption of responsible AI systems. The project’s objective is to define a methodology to extract interpretable, accurate, and ethically responsible predictive models for AI-based decision support systems to be used in critical contexts such as the medical or financial sector. To achieve this goal, MIMOSA aims to define a framework that exploits sophisticated algorithms such as Deep Learning, Evolutionary Algorithms, and Quantum-Inspired Machine Learning to generate models that are not only accurate and interpretable, but also fair and respectful of privacy.
news
Jul 25, 2024 | Website is up! :) |
---|---|
Apr 01, 2024 | Kickoff Meeting FIS MIMOSA ![]() |
Aug 02, 2023 | Riccardo Guidotti won the funds from “Physical Sciences and Engineering” within FIS (Fondo Italiano per la Scienza) |
Selected Publications
- AAAIA Practical Approach to Causal Inference over TimeMartina Cinquini, Isacco Beretta, Salvatore Ruggieri, and 1 more authorIn AAAI-25, Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, USA, 2025
In this paper, we focus on estimating the causal effect of an intervention over time on a dynamical system. To that end, we formally define causal interventions and their effects over time on discrete-time stochastic processes (DSPs). Then, we show under which conditions the equilibrium states of a DSP, both before and after a causal intervention, can be captured by a structural causal model (SCM). With such an equivalence at hand, we provide an explicit mapping from vector autoregressive models (VARs), broadly applied in econometrics, to linear, but potentially cyclic and/or affected by unmeasured confounders, SCMs. The resulting causal VAR framework allows us to perform causal inference over time from observational time series data. Our experiments on synthetic and real-world datasets show that the proposed framework achieves strong performance in terms of observational forecasting while enabling accurate estimation of the causal effect of interventions on dynamical systems. We demonstrate, through a case study, the potential practical questions that can be addressed using the proposed causal VAR framework.
@inproceedings{cinquini2025practical, title = {A Practical Approach to Causal Inference over Time}, author = {Cinquini, Martina and Beretta, Isacco and Ruggieri, Salvatore and Valera, Isabel}, editor = {Walsh, Toby and Shah, Julie and Kolter, Zico}, booktitle = {AAAI-25, Association for the Advancement of Artificial Intelligence, February 25 - March 4, 2025, Philadelphia, PA, {USA}}, pages = {14832--14839}, publisher = {{AAAI} Press}, year = {2025}, url = {https://doi.org/10.1609/aaai.v39i14.33626}, doi = {10.1609/AAAI.V39I14.33626}, timestamp = {Thu, 17 Apr 2025 17:08:57 +0200}, biburl = {https://dblp.org/rec/conf/aaai/CinquiniBRV25.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, }
- IEEEA Bias Injection Technique to Assess the Resilience of Causal Discovery MethodsMartina Cinquini, Karima Makhlouf, Sami Zhioua, and 2 more authorsIEEE Access, 2025
Causal discovery (CD) algorithms are increasingly applied to socially and ethically sensitive domains. However, their evaluation under realistic conditions remains challenging due to the scarcity of real-world datasets annotated with ground-truth causal structures. Whereas synthetic data generators support controlled benchmarking, they often overlook forms of bias, such as dependencies involving sensitive attributes, which may significantly affect the observed distribution and compromise the trustworthiness of downstream analysis. This paper introduces a novel synthetic data generation framework that enables controlled bias injection while preserving the causal relationships specified in a ground-truth causal graph. The framework aims to evaluate the reliability of CD methods by examining the impact of varying bias levels and outcome binarization thresholds. Experimental results show that even moderate bias levels can lead CD approaches to fail to correctly infer causal links, particularly those connecting sensitive attributes to decision outcomes. These findings underscore the need for expert validation and highlight the limitations of current CD methods in fairness-critical applications. Our proposal thus provides an essential tool for benchmarking and improving CD algorithms in biased, real-world data settings.
@article{cinquini2025bias, author = {Cinquini, Martina and Makhlouf, Karima and Zhioua, Sami and Palamidessi, Catuscia and Guidotti, Riccardo}, journal = {IEEE Access}, title = {A Bias Injection Technique to Assess the Resilience of Causal Discovery Methods}, year = {2025}, volume = {13}, number = {}, pages = {97376-97391}, keywords = {Synthetic data;Data models;Europe;Resilience;Mathematical models;Generators;Benchmark testing;Reliability;Prevention and mitigation;Technological innovation;Fairness;bias;synthetic data generation;machine learning;causal discovery}, doi = {10.1109/ACCESS.2025.3573201}, }
- Inf. FusionExplainable Artificial Intelligence (XAI) 2.0: A manifesto of open challenges and interdisciplinary research directionsLuca Longo, Mario Brcic, Federico Cabitza, and 16 more authorsInformation Fusion, 2024
Understanding black box models has become paramount as systems based on opaque Artificial Intelligence (AI) continue to flourish in diverse real-world applications. In response, Explainable AI (XAI) has emerged as a field of research with practical and ethical benefits across various domains. This paper highlights the advancements in XAI and its application in real-world scenarios and addresses the ongoing challenges within XAI, emphasizing the need for broader perspectives and collaborative efforts. We bring together experts from diverse fields to identify open problems, striving to synchronize research agendas and accelerate XAI in practical applications. By fostering collaborative discussion and interdisciplinary cooperation, we aim to propel XAI forward, contributing to its continued success. We aim to develop a comprehensive proposal for advancing XAI. To achieve this goal, we present a manifesto of 28 open problems categorized into nine categories. These challenges encapsulate the complexities and nuances of XAI and offer a road map for future research. For each problem, we provide promising research directions in the hope of harnessing the collective intelligence of interested stakeholders.
@article{longo2024manifesto, author = {Longo, Luca and Brcic, Mario and Cabitza, Federico and Choi, Jaesik and Confalonieri, Roberto and Ser, Javier Del and Guidotti, Riccardo and Hayashi, Yoichi and Herrera, Francisco and Holzinger, Andreas and Jiang, Richard and Khosravi, Hassan and L{\'{e}}cu{\'{e}}, Freddy and Malgieri, Gianclaudio and P{\'{a}}ez, Andr{\'{e}}s and Samek, Wojciech and Schneider, Johannes and Speith, Timo and Stumpf, Simone}, title = {Explainable Artificial Intelligence {(XAI)} 2.0: {A} manifesto of open challenges and interdisciplinary research directions}, journal = {Information Fusion}, volume = {106}, pages = {102301}, year = {2024}, url = {https://doi.org/10.1016/j.inffus.2024.102301}, doi = {10.1016/J.INFFUS.2024.102301}, timestamp = {Sun, 04 Aug 2024 19:49:40 +0200}, biburl = {https://dblp.org/rec/journals/inffus/LongoBCCCSGHHHJKLMPSSSS24.bib}, bibsource = {dblp computer science bibliography, https://dblp.org}, }
- QuantumThe role of encodings and distance metrics for the quantum nearest neighborAlessandro Berti, Anna Bernasconi, Gianna M Del Corso, and 1 more authorQuantum Machine Intelligence, 2024
Over the past few years, we observed a rethinking of classical artificial intelligence algorithms from a quantum computing perspective. This trend is driven by the peculiar properties of quantum mechanics, which offer the potential to enhance artificial intelligence capabilities, enabling it to surpass the constraints of classical computing. However, redesigning classical algorithms into their quantum equivalents is not straightforward and poses numerous challenges. In this study, we analyze in-depth two orthogonal designs of the quantum K-nearest neighbor classifier. In particular, we show two solutions based on amplitude encoding and basis encoding of data, respectively. These two types of encoding impact the overall structure of the respective algorithms, which employ different distance metrics and show different performances. By breaking down each quantum algorithm, we clarify and compare implementation aspects ranging from data preparation to classification. Eventually, we discuss the difficulties associated with data preparation, the theoretical advantage of quantum algorithms, and their impact on performance with respect to the classical counterpart.
@article{berti2024role, title = {The role of encodings and distance metrics for the quantum nearest neighbor}, author = {Berti, Alessandro and Bernasconi, Anna and Del Corso, Gianna M and Guidotti, Riccardo}, journal = {Quantum Machine Intelligence}, volume = {6}, number = {2}, pages = {62}, year = {2024}, publisher = {Springer}, }
- IEEECounterfactual and Prototypical Explanations for Tabular Data via Interpretable Latent SpaceSimone Piaggesi, Francesco Bodria, Riccardo Guidotti, and 2 more authorsIEEE Access, 2024
Artificial Intelligence decision-making systems have dramatically increased their predictive power in recent years, beating humans in many different specific tasks. However, with increased performance has come an increase in the complexity of the black-box models adopted by the AI systems, making them entirely obscure for the decision process adopted. Explainable AI is a field that seeks to make AI decisions more transparent by producing explanations. In this paper, we propose CP-ILS, a comprehensive interpretable feature reduction method for tabular data capable of generating Counterfactual and Prototypical post-hoc explanations using an Interpretable Latent Space. CP-ILS optimizes a transparent feature space whose similarity and linearity properties enable the easy extraction of local and global explanations for any pre-trained black-box model, in the form of counterfactual/prototype pairs. We evaluated the effectiveness of the created latent space by showing its capability to preserve pair-wise similarities like well-known dimensionality reduction techniques. Moreover, we assessed the quality of counterfactuals and prototypes generated with CP-ILS against state-of-the-art explainers, demonstrating that our approach obtains more robust, plausible, and accurate explanations than its competitors under most experimental conditions.
@article{piaggesi2024counterfactual, title = {Counterfactual and Prototypical Explanations for Tabular Data via Interpretable Latent Space}, author = {Piaggesi, Simone and Bodria, Francesco and Guidotti, Riccardo and Giannotti, Fosca and Pedreschi, Dino}, journal = {IEEE Access}, year = {2024}, publisher = {IEEE}, }
- QuantumQuantum subroutine for variance estimation: algorithmic design and applicationsAnna Bernasconi, Alessandro Berti, Gianna M Del Corso, and 2 more authorsQuantum Machine Intelligence, 2024
Quantum computing sets the foundation for new ways of designing algorithms, thanks to the peculiar properties inherited by quantum mechanics. The exploration of this new paradigm faces new challenges concerning which field quantum speedup can be achieved. Toward finding solutions, looking for the design of quantum subroutines that are more efficient than their classical counterpart poses solid pillars to new powerful quantum algorithms. Herewith, we delve into a grounding subroutine, the computation of the variance, whose usefulness spaces across different fields of application, particularly the artificial intelligence (AI) one. Indeed, the finding of the quantum counterpart of these building blocks impacts vertically those algorithms that leverage this metric. In this work, we propose QVAR, a quantum subroutine, to compute the variance that exhibits a logarithmic complexity both in the circuit depth and width, excluding the state preparation cost. With the vision of showing the use of QVAR as a subroutine for new quantum algorithms, we tackle two tasks from the AI domain: feature selection and outlier detection. In particular, we showcase two AI hybrid quantum algorithms that leverage QVAR: the hybrid quantum feature selection (HQFS) algorithm and the quantum outlier detection algorithm (QODA). In this manuscript, we describe the implementation of QVAR, HQFS, and QODA, providing their correctness and complexities and showing the effectiveness of these hybrid quantum algorithms with respect to their classical counterpart.
@article{bernasconi2024quantum, title = {Quantum subroutine for variance estimation: algorithmic design and applications}, author = {Bernasconi, Anna and Berti, Alessandro and Del Corso, Gianna M and Guidotti, Riccardo and Poggiali, Alessandro}, journal = {Quantum Machine Intelligence}, volume = {6}, doi = {10.1007/s42484-024-00213-9}, number = {2}, pages = {78}, year = {2024}, publisher = {Springer}, }
- IEEEFast, Interpretable and Deterministic Time Series Classification with a Bag-Of-Receptive-FieldsFrancesco Spinnato, Riccardo Guidotti, Anna Monreale, and 1 more authorIEEE Access, 2024
The current trend in the literature on Time Series Classification is to develop increasingly accurate algorithms by combining multiple models in ensemble hybrids, representing time series in complex and expressive feature spaces, and extracting features from different representations of the same time series. As a consequence of this focus on predictive performance, the best time series classifiers are black-box models, which are not understandable from a human standpoint. Even the approaches that are regarded as interpretable, such as shapelet-based ones, rely on randomization to maintain computational efficiency. This poses challenges for interpretability, as the explanation can change from run to run. Given these limitations, we propose the Bag-Of-Receptive-Field (BORF), a fast, interpretable, and deterministic time series transform. Building upon the classical Bag-Of-Patterns, we bridge the gap between convolutional operators and discretization, enhancing the Symbolic Aggregate Approximation (SAX) with dilation and stride, which can more effectively capture temporal patterns at multiple scales. We propose an algorithmic speedup that reduces the time complexity associated with SAX-based classifiers, allowing the extension of the Bag-Of-Patterns to the more flexible Bag-Of-Receptive-Fields, represented as a sparse multivariate tensor. The empirical results from testing our proposal on more than 150 univariate and multivariate classification datasets demonstrate good accuracy and great computational efficiency compared to traditional SAX-based methods and state-of-the-art time series classifiers, while providing easy-to-understand explanations.
@article{spinnato2024fast, title = {Fast, Interpretable and Deterministic Time Series Classification with a Bag-Of-Receptive-Fields}, author = {Spinnato, Francesco and Guidotti, Riccardo and Monreale, Anna and Nanni, Mirco}, journal = {IEEE Access}, year = {2024}, publisher = {IEEE}, }
- ECMLData-Agnostic Pivotal Instances Selection for Decision-Making ModelsAlessio Cascione, Mattia Setzu, and Riccardo GuidottiIn Joint European Conference on Machine Learning and Knowledge Discovery in Databases, 2024
As decision-making processes become increasingly complex, machine learning tools have become essential resources for tackling business and social issues. However, many methodologies rely on complex models that experts and everyday users cannot really interpret or understand. This is why constructing interpretable models is crucial. Humans typically make decisions by comparing the case at hand with a few exemplary and representative cases imprinted in their minds. Our objective is to design an approach that can select such exemplary cases, which we call pivots, to build an interpretable predictive model. To this aim, we propose a hierarchical and interpretable pivot selection model inspired by Decision Trees, and based on the similarity between pivots and input instances. Such a model can be used both as a pivot selection method, and as a standalone predictive model. By design, our proposal can be applied to any data type, as we can exploit pre-trained networks for data transformation. Through experiments on various datasets of tabular data, texts, images, and time series, we have demonstrated the superiority of our proposal compared to naive alternatives and state-of-the-art instance selectors, while minimizing the model complexity, i.e., the number of pivots identified.
@inproceedings{cascione2024data, title = {Data-Agnostic Pivotal Instances Selection for Decision-Making Models}, author = {Cascione, Alessio and Setzu, Mattia and Guidotti, Riccardo}, booktitle = {Joint European Conference on Machine Learning and Knowledge Discovery in Databases}, pages = {367--386}, year = {2024}, organization = {Springer}, }
- AAAIGenerative Model for Decision TreesRiccardo Guidotti, Anna Monreale, Mattia Setzu, and 1 more authorProceedings of the AAAI Conference on Artificial Intelligence, Mar 2024
Decision trees are among the most popular supervised models due to their interpretability and knowledge representation resembling human reasoning. Commonly-used decision tree induction algorithms are based on greedy top-down strategies. Although these approaches are known to be an efficient heuristic, the resulting trees are only locally optimal and tend to have overly complex structures. On the other hand, optimal decision tree algorithms attempt to create an entire decision tree at once to achieve global optimality. We place our proposal between these approaches by designing a generative model for decision trees. Our method first learns a latent decision tree space through a variational architecture using pre-trained decision tree models. Then, it adopts a genetic procedure to explore such latent space to find a compact decision tree with good predictive performance. We compare our proposal against classical tree induction methods, optimal approaches, and ensemble models. The results show that our proposal can generate accurate and shallow, i.e., interpretable, decision trees.
@article{guidotti2024generative, title = {Generative Model for Decision Trees}, volume = {38}, url = {https://ojs.aaai.org/index.php/AAAI/article/view/30104}, doi = {10.1609/aaai.v38i19.30104}, number = {19}, journal = {Proceedings of the AAAI Conference on Artificial Intelligence}, author = {Guidotti, Riccardo and Monreale, Anna and Setzu, Mattia and Volpi, Giulia}, year = {2024}, month = mar, pages = {21116-21124}, }