In the past, research in various scientific fields was conducted by observing (generally dynamic) phenomena of interest, and the research progressed by repeating generalizations and analysis through expressions using mathematical models. With the rapid developments made in measurement technologies and information infrastructure, the importance of data-driven approaches using observation and measurement data has been recognized in recent years. In fact, numerous scientific discoveries in areas such as materials informatics that are based on predictions (of static properties, such as the prediction of physical properties in the case of a material composition search) using machine learning and engineering results are being reported and have been gaining attention. At present however, the methodology of extracting scientific knowledge based on a direct understanding of phenomena from data rather than predictive methods (of static characteristics) is not well established, and it remains an area where future development is awaited. Here, direct understanding of a phenomenon refers to the extraction of information that is directly linked with a mathematical model representing the (dynamic) phenomenon and its parts or to the acquisition of a reduced model or the associated laws of physics (principle mechanism). In general, knowledge of (dynamic) phenomena is mathematically represented by ordinary or partial differential equations. The main impediment to the development of the methodologies mentioned above is the absence of an established statistical framework or calculation method that would allow knowledge to be directly extracted from data and applied.

On the other hand, in some fields of applied mathematics and physics (particularly in the field of fluid dynamics), operatortheoretic analysis, and in particular the series of dynamics analysis methods using Koopman operators have attracted much attention in recent years. By expressing the time evolution of a (generally non-linear) system as an operator and treating it in a linear domain (such as a functional space), this framework analyses the universal dynamic characteristics of the system, avoiding the nonlinearities that are difficult to handle directly with mathematical methods. A key feature of this framework is that it does not depend on the functional system that constitutes the mathematical model, and it can be applied across phenomena and across fields. Recently, estimation methods have been proposed that use data that can be interpreted by operatortheoretic analysis; an example is dynamic mode decomposition. As these estimators can be directly linked to various physical properties and principles (such as reduced models) under certain given assumptions, they have found applications in various fields and have therefore been drawing much academic attention.

Against this background, operatortheoretic analysis and its estimation are approached simultaneously from both perspectives: forward analysis by mathematical and physical methods and backward analysis by machine learning and mathematical statistical methods. By fusing and expanding the two, we construct a methodology to directly link the mathematical model developed in the domain with the information extracted from data through their dynamic characteristics. We believe that this will enable the extraction of scientific knowledge from data based on the understanding of the phenomena, leading to the creation of a new framework that will help realize more precise predictions and simulations of complex phenomena. Given this stance, the main purpose of this research proposal is 1) to establish a method for identifying major dynamics of complex phenomena based on operatortheoretic analysis through statistical inference, 2) to create a theory for the physical interpretation, use, and mathematical extension of the method, and 3) to construct a methodology for knowledge discovery, learning, and prediction through their integration with mathematical models. The scope of this research extends from applying the obtained methods and theoretical framework to data analysis across phenomena in various scientific fields, such as complex biological phenomena and fluid phenomena, up to the point of proving the practical usefulness of the methods. The computational infrastructure technology created in this research can be applied to dynamic and complex phenomena of interest across multiple scientific and engineering fields and is expected to have a large ripple effect in academic circles. In addition, because this study aims to construct a generic methodology that uses mathematical models and data-driven information extraction in an integrated manner, it can be applied to detailed simulations and predictions for complex phenomena. It can therefore become an information technology that can be used to address a wide range of social issues such as disaster prevention and medical care.

Four groups worked on this study: the Machine Learning and Mathematical Statistics Group (Kawahara, (IMI, Kyushu University / RIKEN AIP)), the Mathematics Group (Sakauchi (Keio University / RIKEN AIP)), the Nonlinear Physics Group (Nakao (Tokyo Institute of Technology)), and the Biological Modelling Group (Kurosawa (RIKEN iTHEMS)). The Machine Learning Group works from a data-driven perspective, the Nonlinear Physics Group works from a mathematical model perspective, and the Mathematics Group explores the mathematical principles that connect the two. The Biological Modelling Group verifies the applicability of development methods throughout the research and provides feedback on methodologies and principles. The study of research items with specific applications is done by applying development methodologies based on collaboration with domain researchers across multiple fields and focusing on the participants’ collaborative research in the past.