# BSc/MSc Projects '23

### LEONARDO STELLA

# BSc and MSc Project Topics

For 2023, I am planning to organise a quick hybrid (in-person/online) session to discuss the below project proposals (I'll update in due course on this page). Although I welcome students' ideas, I will evaluate this on a case-by-case basis if I am happy to supervise projects beyond the ones proposed below. In general, the following areas of research are of interest to me: Reinforcement Learning, Multi-Agent Systems, Game Theory and Control, including sequential and collective decision-making.

Projects flagged as experimental would require less research work and involve more practical work around the implementation of an algorithm. Projects flagged as research-intensive would be suitable for students with the ambition to have a great project and carry out research in a cutting-edge area.

Finally, it is worth mentioning that the proposed projects are not fixed in stone and can be modified/updated to fit a range of research developments and directions. Outstanding work in some of these projects can lead to a conference submission (to say that it happened in the past and I would be happy to supervise it, but it is optional).

## Reinforcement Learning & Multi-agent Systems

Credits: proposed decentralised approach for MARL in sensing design (project 4).

Multi-agent Reinforcement Learning (MARL)

In this area, we look at the situation where a group of agents has to complete a task in a distributed (or, sometimes, also centralised) manner. Agents can communicate through a network, with the aim to achieve cooperation to tackle a given problem. Several issues arise, including scalability, non-stationarity of the environment, communication bottlenecks, etc.

MARL for Cascading Failures in Financial Markets (experimental): The first project looks at the situation (which has become more prominent again with the bankrupt of huge financial organisation worldwide) where financial failures can propagate through the financial market, leading to catastrophic consequences for investors as well as the market as a whole . The goal is to design a MARL model for interconnected financial organisation where the action is to choose where to invest and to what extent. The evaluation of the algorithm will follow some theoretical results in control and economics.

Controlling Learning in Networks of Agents (research-intensive): Starting from an initial implementation of a provably efficient multi-agent reinforcement learning algorithm for parallel Markov decision processes (MDPs), the goal of this project is to extend the initial algorithm and understand the mechanisms for achieving cooperation in a group of agents. This would involve the extension from parallel MDPs to networked MDPs, where the action of an agent can substantially change the reward of another agent involved in the task. Another option can be to look at heterogeneous agents and their interactions patterns to improve the quality of the learning. We can focus on a toy problem to evaluate the performance of the work, including a robot navigation problem or pattern-matching problem.

Decision-Making in Robotics Swarms (research-intensive): In multi-agent systems, by consensus we denote the situation where all agents find an agreement on how to carry out a given task. Imagine you have a discrete set of options and a group of robots (a swarm) that has to reach consensus on one of the options. Also, the communication element can be explored: for example, if there are N features in an environment, but each robot has only a buffer of size M < N, each feature can be present multiple times. Could a swarm determine the most abundant features collectively? Another variant could be where a swarm tries to identify which feature is present the least often. Perhaps the mobility of the robots could be explored, and while each robot may have insufficient memory, if they collaborated in a good way with others - perhaps requiring them to assume certain topologies - they could collectively have sufficient information. For example, a robot that has a full buffer and detects a feature may decide to forward the detection to its neighbours. The use of a real drone can be possible for this project.

MARL in sensing design (experimental): This project aims to provide a decentralised approach for a problem where many sensors have to coordinate for sensing design. The goal is to build a MARL model that can be trained . The goal of this project is to extend the above approach to consider three process parameters, namely, power, scan speed and hatch spacing, and their correlation to the laser parameters, namely, beam compensation (BC) and contour distance (CD). Finally, you will conduct experiments to validate the proposed framework and evaluate its performance against previous applications. This project, despite challenging, can be a candidate for publication as well as future research.

Game AI for Cooperative Multi-Agent Contexts (experimental): This research project focuses on the development of algorithms with an emphasis on cooperation and communication in a multi-agent context. The aim is to combine ideas from recent research on hierarchical reinforcement learning and possibly evolutionary game theory to create a framework to explore the creation of mechanisms for effective communication in cooperative multi-agent systems. For example, one goal would be to understand how crowds behave in computer games and the potential to improve their collective behaviour in different contexts.

Credits: Francesco Careri (PhD student, UoB).

Materials Science

Motivated by the rapid traction and impact of metal additive manufacturing (AM), the proposed projects aim at addressing some of the problems in metal AM via a combined approach that underpins experimental results via machine learning (ML):

Systematic Analysis of ML Approaches in Materials Science (experimental): The first project aims to provide a comprehensive analysis of common predictive approaches in ML to combine the available set of experimental measurements to the predictive power of machine learning (ML) models. The goal is to infer the best combination of the parameters, given a dataset of past experiments in the context of statistical learning. The goal of the project is to provide a systematic analysis of recent advances in ML and deploy a set of models that predict the optimal thicknesses or the optimal combination of process parameters to get a specified dimension.

Reinforcement Learning for AM Process Optimisation (research-intensive): The second project aims at leveraging the strengths of model-free reinforcement learning (RL) to optimise the process parameters for metal AM. The goal is to build a RL model that can be extensively trained on the above problem and used to query the best combination of parameters to be tested in experimental setups to extend and improve the RL model based on the data. The goal of this project is to extend the above approach to consider three process parameters, namely, power, scan speed and hatch spacing, and their correlation to the laser parameters, namely, beam compensation (BC) and contour distance (CD). Finally, you will conduct experiments to validate the proposed framework and evaluate its performance against previous applications. This project, despite challenging, can be a candidate for publication as well as future research.

## Game Theory & Control

Credits: game-environment dynamics for interconnected agents (project 1).

Evolutionary Game Theory

Game theory is s the study of mathematical models of strategic interaction among rational decision-makers. It has applications in all fields of social science, as well as in computer science, economics, logic and systems science, to name a few. Evolutionary game theory has extended the notions obtained from traditional game theory to explain the evolution of species and their behaviours, as well as being used in several other contexts.

Evolutionary Game Theory for Cooperation and Bio-inspired Collective Decision-Making (research-intensive): Honeybees choose their future nest in a collaborative fashion, through different behaviours such as the waggle dance and the stop signal. Other systems take inspirations from ants (see shortest path problem for example and the stigmergy), or termites. This project focuses on what we can learn from many different biological systems and how this knowledge can be applied to different contexts. Initial results include an evolutionary game theoretic framework for collective decision-making in honeybees swarm.

Evolutionary Game Theory for Reinforcement Learning (experimental): This project aims to investigate the how players learn in a framework at the intersection between machine learning and game theory. When a player chooses a strategy, this can be seen as an action in a typical reinforcement learning problem formulation and the payoff corresponds to the reward. How can players benefit from learning the strategies in a population and how does this contribute to cooperation in a competing setting? Some preliminary works have explored the intersection of evolutionary game theory and reinforcement learning (more precisely, a stateless approach called cross learning): the goal of this project is to look at practical aspects of this link to benefit the design of RL approaches and their explainability.

Credits: cascading failure propagation in a network (project 1).

Mean-field Game Theory & Control

Game theory is s the study of mathematical models of strategic interaction among rational decision-makers. It has applications in all fields of social science, as well as in computer science, economics, logic and systems science, to name a few. Evolutionary game theory has extended the notions obtained from traditional game theory to explain the evolution of species and their behaviours, as well as being used in several other contexts.

Mean-field Games for Cascading Failures in Financial Markets (research-intensive): Similar to the MARL counterpart, this project looks at the situation (which has become more prominent again with the bankrupt of huge financial organisation worldwide) where financial failures can propagate through the financial market, leading to catastrophic consequences for investors as well as the market as a whole . The goal is to study a mean-field game model for interconnected financial organisation where the action is to choose where to invest and to what extent. This involves the use of .

Controlling Learning in Networks of Agents (research-intensive): Starting from an initial implementation of a provably efficient multi-agent reinforcement learning algorithm for parallel Markov decision processes (MDPs), the goal of this project is to investigate the mechanisms for achieving cooperation in a group of agents and to control the network to achieve the most efficient learning, e.g., by controlling which agent(s) are responsible for leading/coordinating the others to achieve a desired state. Compared to the project in the MARL section, this looks at the control aspects of the communication.