Speaker: Milad Maleki Pirbazari, Bilkent University
Date & Time: October 8, 2021, Friday, 13:30.
Place: EE-01
Title: Risk-averse Allocation Indices for Multi-armed Bandit Problem
Abstract: In classical multi-armed bandit problem, the aim is to find a policy maximizing the expected total reward, implicitly assuming that the decision-maker is risk-neutral. On the other hand, the decision makers are risk-averse in some real life applications. In this study, we design a new setting based on the concept of dynamic risk measures where the aim is to find a policy with the best risk-adjusted total discounted outcome. We provide a theoretical analysis of multi-armed bandit problem with respect to this novel setting, and propose a priority-index heuristic which gives risk-averse allocation indices having a structure similar to Gittins index. Although an optimal policy is shown not always to have index-based form, empirical results express the excellence of this heuristic and show that with risk-averse allocation indices we can achieve optimal or near-optimal interpretable policies.
Bio: Milad Maleki Pirbazari is currently a teaching instructor in the department of Industrial Engineering at Bilkent University. He received his B.S. and M.S. degrees in Chemical Engineering from the University of Tehran and Tarbiat Modares University in 2008 and 2011, respectively. He also received an M.S. degree in Industrial and Systems Engineering from İstanbul Şehir University in 2015 and the Ph.D. degree in Industrial Engineering from Bilkent University in 2021. His research interests include stochastic and risk-averse optimization, Markov decision processes, and statistical machine learning.