Development of a tool for the democratisation of data analytics

Machine learning methods and artificial intelligence have become a driver for change in science, engineering and society. Computational approaches that can extract interesting patterns from comprehensive databases and develop predictive models are becoming ubiquitous. But only a few experts and even fewer lay individuals understand the basics of data science. What is needed is the democratisation of machine learning and the development of ways in which we can explain to anyone on a conceptual level what machine learning can do and how it can be used. The Bioinformatics Laboratory at the Faculty of Computer and Information Science of the University of Ljubljana developed a suitable environment, computational techniques and pedagogical approaches for this purpose.

In articles published in the journals Nature Communications and Bioinformatics, the researchers from the Faculty of Computer and Information Science (researcher Dr. Primož Godec, Assistant Matjaž Pančur, Technical Specialist Aleš Erjavec, Assistant Ajda Pretnar, Prof. Janez Demšar, Assistant Marko Toplak, researcher Jaka Kokošar, researcher Vesna Tanko, Assistant Pavlin Gregor Poličar, Assistant Lan Žagar, researcher Jan Hartman, Prof. Blaž Zupan) and Prof. Uroš Petrovič from the Biotechnical Faculty described and proposed an approach that facilitates the use of machine learning techniques, offering it to domain experts from biomedical laboratories for the purposes of image analysis (Nature Communications) or analysis of the expression profiles of individual cells (Bioinformatics).

The proposed approach is based on the Orange environment being developed in the Bioinformatics Laboratory. Orange employs visual programming with which the user determines the course of an analysis by assembling the basic analytical building blocks. In the article published in the journal Nature Communications, the authors presented the use of this tool in four different image collections, including mouse bone healing, the development of mouse egg cells, morphogenesis of social amoebae, and protein localisation in yeast cells. They demonstrated that accurate models for predicting phenotypes can be easily constructed in the Orange environment from image collections.

A different problem, but again with the application of the visual programming approach, was addressed in the article published in the journal Bioinformatics, where they presented the use of the Orange environment to analyse the gene expressions of individual cells. Here as well, the main achievement is the breaking down of the data analysis problem into simple analytical building blocks which the user can then stack like Lego blocks into an analytical scheme to search for laws in a given set of data by combining graphical representations, building models and interactive research interfaces. Although the articles focus on molecular biology domains, the approach they have developed is generally applicable in science, industry and elsewhere where we handle data.

Source: Godec P., Pančur M., Ilenič N., Čopar A., Stražar M., Erjavec A., Pretnar A., Demšar J., Starič A., Toplak M., Žagar L., Hartman J., Wang H., Bellazzi R., Petrovič U., Garagna S., Zuccotti M., Park D., Shaulsky G., Zupan B. (2019). Democratised image analytics by visual programming through integration of deep models and small-scale machine learning, Nature Communications 10(1):4551. doi: 10.1038/s41467-019-12397-x, [COBISS.SI-ID 32755751], IF(2019)=12.1, »multidisciplinary sciences«: 1A1 (Z, A'', A', A1/2). Stražar M., Žagar L., Kokošar J., Tanko V., Erjavec A., Poličar P., Starič A., Demšar J., Shaulsky G., Menon V., Lamire A., Parikh A., and Zupan B. (2019). scOrange – A Tool for Hands-On Training of Concepts from Single Cell Data Analytics, Bioinformatics 35(14):i4-i12, doi: 10.1093/bioinformatics/btz348, [COBISS.SI-ID 1538307523], IF(2019)=5.6, »mathematical & computational biology«: 1A1 (Z, A'', A', A1/2).