Jérôme Bolte is a Full Professor at the Toulouse School of Economics and holds a Chair in Artificial Intelligence at the Artificial Intelligence Institute of Toulouse (ANITI). He studied pure and applied mathematics before completing a degree in mathematics and then a doctorate at Montpellier University. Prior to moving to Toulouse in 2010, he spent six years as an associate professor at Sorbonne University in Paris and one year at École Polytechnique. He received the SIAM Optimization Prize in 2017, along with S. Sabach and M. Teboulle, for work at the crossroads of semi-algebraic geometry and first-order methods. He is currently an associate editor at Mathematical Programming and Foundations of Computational Mathematics. His research interests range from continuous optimization to machine learning.
Title:
A Smooth Path to Nonsmoothness: An O-Minimal Optimization Tour
Abstract:
This talk explores o-minimal optimization, a framework that enables the treatment of a wide class of structured problems—such as piecewise linear, semi-algebraic, and, more generally, problems defined by simple analytic formulas—making it particularly well-suited to modern computational languages like Python. Rather than seeking generality, we shall focus on a simple tool of the o-minimal world called the projection formula. This formula bridges two views on nonsmoothness: for geometers, nonsmooth functions are smooth up to a partition of the domain into manifolds; for optimizers, “this is not necessary,” as subdifferentials carry the right amount of first-order information.
We demonstrate how a rather direct application of this formula yields powerful results such as nonsmooth Sard-type theorems, a nonsmooth Łojasiewicz inequality, a nonsmooth chain rule, and a formal subdifferential framework known as conservative calculus. These, in turn, have notable algorithmic implications: proximal gradient and Lagrangian-like methods, subgradient stochastic methods, as well as automatic differentiation.
John Duchi is an associate professor of Statistics and Electrical Engineering and (by courtesy) Computer Science at Stanford University. His work spans statistical learning, optimization, information theory, and computation, with a few driving goals. (1) To discover statistical learning procedures that optimally trade between real-world resources---computation, communication, privacy provided to study participants---while maintaining statistical efficiency. (2) To build efficient large-scale optimization methods that address the spectrum of optimization, machine learning, and data analysis problems we face, allowing us to move beyond bespoke solutions to methods that robustly work. (3) To develop tools to assess and guarantee the validity of---and confidence we should have in---machine-learned systems.
He has won several awards and fellowships. His paper awards include the SIAM SIGEST award for "an outstanding paper of general interest" and best papers at the Neural Information Processing Systems conference, the International Conference on Machine Learning, the International Conference on Learning Theory, and an INFORMS Applied Probability Society Best Student Paper Award (as advisor). He has also received the Society for Industrial and Applied Mathematics (SIAM) Early Career Prize in Optimization, an Office of Naval Research (ONR) Young Investigator Award, an NSF CAREER award, a Sloan Fellowship in Mathematics, the Okawa Foundation Award, the Association for Computing Machinery (ACM) Doctoral Dissertation Award (honorable mention), and U.C. Berkeley's C.V. Ramamoorthy Distinguished Research Award.
Title:
Abstract:
Lin Xiao is a Research Scientist at Facebook AI Research (FAIR) in Seattle, Washington. He received BE from Beijing University of Aeronautics and Astronautics (Beihang University) and PhD from Stanford University, and was a postdoctoral fellow in the Center for the Mathematics of Information at California Institute of Technology. Before joining Facebook, he spent 14 great years as a Researcher at Microsoft Research. He currently serves as an associate editor for the SIAM Journal on Optimization, and has served as area chairs for several machine learning conferences including NeurIPS, ICML and ICLR.
He won the Young Researcher competition at the first International Conference on Continuous Optimization in 2004 for his work on fastest mixing Markov chains, and the Test of Time Award at NeurIPS 2019 for his work on the regularized dual averaging method for sparse stochastic optimization and online learning. His current research interests include theory and algorithms for large-scale optimization and machine learning, reinforcement learning, and parallel and distributed computing.
Title:
Policy Mirror Descent with Dual Function Approximation
Abstract:
Policy gradient methods constitute a paradigm shift in reinforcement learning from value-based methods to a more direct approach of policy optimization. The mirror descent framework, specifically equipped with KL divergence, plays a critical role in their convergence analysis because of the simplex structure of stochastic policies in Markov decision processes. However, extending such analysis to function approximation, especially when the policy is parametrized by neural networks, remains a challenge due to the lack of convexity. In this talk, we first give an overview of policy mirror descent in the tabular setting, then present a dual function approximation approach that bridges the gap between theory and practice in deep reinforcement learning. In particular, this duality framework includes several well-known practical methods, such as Soft Actor-Critic, as special cases, thus immediately providing them with strong convergence guarantees.
Tim Hoheisel received his doctorate of mathematics from Julius-Maximilians University (Würzburg) under the supervision of Christian Kanzow in 2009. He was a postdoctoral researcher there until 2016. During this time, he was a visiting professor at Heinrich-Heine University (Düsseldorf) in the winter semester 2011/12 as well as a visiting researcher at University of Washington (Seattle) under mentorship of James V. Burke in 2012 and 2014. In 2016, he became an assistant professor at the department of mathematics and statistics at McGill University (Montreal) where he was awarded early tenure in 2021 and promoted to associate professor. Since 2022 he has been director of the applied mathematics laboratory at the "Centre de Recherches Mathématiques" in Montreal. His research interests lie in nonsmooth optimization and variational analysis where he is particularly interested in stability of nonsmooth problems arising in various applications.
Title:
A computational framework for linear inverse problems via the maximum entropy on the mean method
Abstract:
We present a framework for solving linear inverse problems that is computationally tractable and has mathematical certificates. To this end, we interpret the ground truth of a linear inverse problem as a random vector with unknown distribution. We solve for a distribution which is close to a prior P (guessed or data-driven) measured in the KL-divergence while also having an expectation that yields high fidelity with the given data that defines the problem. After reformulation this yields a strictly convex, finite dimensional optimization problem whose regularizer, the MEM functional, is paired in duality with the log-moment generating function of the prior P. We exploit this computationally via Fenchel-Rockafellar duality. When no obvious guess for P is available, we use data to generate an empirical prior. Using techniques from variational analysis and stochastic optimization, we show that, and at what rate, the solution of the empirical problems converges (as the sample size grows) to the solution of the problem with known prior.
This is based on work with Matthew King-Roskamp (McGill) and Rustum Choksi (McGill).
Mingyi Hong received his Ph.D. degree from the University of Virginia, Charlottesville, in 2011. He is currently an Associate professor in the Department of Electrical and Computer Engineering at the University of Minnesota, Minneapolis. His research has been focused on developing optimization theory and algorithms for applications in signal processing and machine learning, and most recently applying these techniques for foundation model training, finetuning and alignment. He is an Associate Editor for IEEE Transactions on Signal Processing. His work has received two IEEE Signal Processing Society (SPS) Best Paper Awards (2021, 2022), an International Consortium of Chinese Mathematicians Best Paper Award (2020), and a few Best Student Paper Awards in signal processing and machine learning conferences. He is an Amazon Scholar, and he is the recipient of an IBM Faculty Award, a Meta research award, a Cisco research award, and the 2022 Pierre-Simon Laplace Early Career Technical Achievement Award from IEEE SPS.
Title:
Bilevel Optimization: Recent Algorithmic & Theoretical Advances, and Emerging Applications in Training Large Language Models
Abstract:
Bilevel Optimization (BLO) is a class of challenging optimization problems that has two levels of nested optimization subproblems. It can be used to model applications in signal processing, machine learning and game theory, and more recently, in large language model training. In the first part of this talk, we will discuss recent advances that addressed two of its key challenges, (1) efficient implementation (e.g., how to efficiently deal with stochasticity, Hessian computation, etc.); (2) structural complexity (e.g., how to deal with non-convexity in lower-level problems). These works together provide a set of useful tools for the practitioners to customize for different application domains. In the second part of this talk, we will dive deep into a recent application – aligning Large Language Models (LLMs) with human values. We will show that the challenging LLM alignment problem can be cast as a special BLO problem – the inverse reinforcement learning problem – whose upper-level recovers a human reward model while the lower-level solves for the optimal policy. This perspective unifies popular alignment pipelines, such as the reinforcement learning with human feedback (RLHF) and its direct preference optimization (DPO) variants. Leveraging recent BLO advances yields algorithms that (i) outperform standard RLHF baselines in both sample- and compute-efficiency, and (ii) reveal key design principles that have already influenced industrial practice. Finally, we will discuss open problems and opportunities for future BLO research.
Ruth Misener is Professor of Computational Optimisation at Imperial College where she holds a BASF / Royal Academy of Engineering (RAEng) Research Chair in Data-Driven Optimisation. In 2017, Ruth received the MacFarlane Medal as overall winner of the RAEng Young Engineer Trust Engineer of the Year competition. Ruth also received the 2023 Roger Needham Award from the British Computing Society. Ruth's best paper awards are from the: Journal of Global Optimization (2013), International Conference on Autonomous Agents & Multi-Agent Systems (AAMAS 2020, Best Innovative Demo), Conference on the Integration of Constraint Programming, Artificial Intelligence, & Operations Research (CPAIOR 2021), Optimization & Engineering (2021), and Computers & Chemical Engineering (2023). She has given named lectures at Princeton University (2023 Saville Lecture) and Georgia Institute of Technology (2018 Mellichamp Lecture). Ruth leads a team of research engineers in developing the Optimization & Machine Learning Toolkit (OMLT), which won the 2022 COIN-OR Cup as the best contribution to open-source operations research software development. Ruth is associate editor for INFORMS Journal on Computing and Operations Research. She is also a NeurIPS Area Chair.
Title:
Bayesian optimization for mixed feature spaces using tree kernels and graph kernels
Abstract:
Bayesian optimization is effectively a two-step iterative process that first trains a surrogate model using continuous optimization over hyperparameter space and then optimizes the acquisition function over the search space. We investigate Bayesian optimization for mixed-feature search spaces using both tree kernels and graph kernels for Gaussian processes. With respect to trees kernels, our Bayesian Additive Regression Trees Kernel (BARK) uses tree agreement to define a posterior over sum-of-tree functions. With respect to graph kernels, our acquisition function with shortest paths encoded allows us to optimize over graphs, for instance to find the best graph structure and/or node features. We formulate both acquisition functions using mixed-integer optimization and show applications to a variety of challenges in molecular design, engineering and machine learning.
This work is joint with Toby Boyne, Alexander Thebelt, Yilin Xie, Shiqiang Zhang, Jixiang Qing, Jose Folch, Robert Lee, Nathan Sudermann-Merx, David Walz, Behrang Shafei, and Calvin Tsay.
Anthony Man-Cho So is currently Dean of the Graduate School, Deputy Master of Morningside College, and Professor in the Department of Systems Engineering and Engineering Management of The Chinese University of Hong Kong (CUHK). His research focuses on the theory and applications of optimization in various areas of science and engineering, including computational geometry, machine learning, signal processing, and statistics. Dr. So has been a Fellow of IEEE since 2023 and an Outstanding Fellow of the Faculty of Engineering at CUHK since 2019. He currently serves on the editorial boards of Journal of Global Optimization, Mathematical Programming, Mathematics of Operations Research, Open Journal of Mathematical Optimization, Optimization Methods and Software, and SIAM Journal on Optimization. He has also served as the Lead Guest Editor of IEEE Signal Processing Magazine Special Issue on Non-Convex Optimization for Signal Processing and Machine Learning. Dr. So has received a number of research and teaching awards, including the 2024 SIAM Review SIGEST Award, the 2018 IEEE Signal Processing Society Best Paper Award, the 2015 IEEE Signal Processing Society Signal Processing Magazine Best Paper Award, the 2014 IEEE Communications Society Asia-Pacific Outstanding Paper Award, and the 2010 INFORMS Optimization Society Optimization Prize for Young Researchers, as well as the 2022 University Grants Committee Teaching Award (General Faculty Members Category), the 2022 University Education Award, and the 2013 CUHK Vice-Chancellor’s Exemplary Teaching Award.
Title:
Universal Gradient Descent Ascent Method for Smooth Minimax Optimization
Abstract:
Smooth minimax optimization has attracted much attention over the past decade. Considerable research has focused on developing algorithms that are tailored for smooth minimax problems with specific structures, such as convexity of primal function/ concavity of dual function and Polyak-Lojasiewicz (PL)/ Kurdyka-Lojasiewicz (KL) conditions. However, verifying these structural properties is often challenging in practice,
thus complicating, among other things, the choice of step sizes of those algorithms. In this talk, we present the doubly smoothed optimistic gradient descent ascent method (DS-OGDA), a universal single-loop algorithm for smooth minimax optimization. With a single set of parameters, DS-OGDA can be applied to convex-concave, nonconvex-concave, convex-nonconcave, nonconvex-KL, and KL-nonconcave scenarios. In
particular, there is no need for prior structural knowledge to determine the step sizes. Moreover, by exploiting structural information, DS-OGDA can achieve the optimal or best-known iteration complexity result for
each of the said scenarios.
Angelika Wiegele is Professor at the Mathematics Department at Alpen-Adria-Universität Klagenfurt and is currently a member of the Global Faculty of the University of Cologne. She received her Ph.D. in Mathematics at the Alpen-Adria-Universität Klagenfurt in 2006 and was a researcher and lecturer at TU Graz and at Alpen-Adria-Universität Klagenfurt. She was also a researcher at IASI-CNR in Rome and at the University of Cologne, and visiting professor at the Università degli Studi di Roma "Tor Vergata".
Her research interests lie in the field of semidefinite optimization, in particular, in the application of semidefinite methods for solving mixed-integer nonlinear optimization problems. Her research projects have been funded by the Austrian Science Fund FWF and by the European Union's Horizon 2020 program. She is associate editor of the Open Journal of Mathematical Optimization, OR Spectrum and TOP (Transaction in Operations Research) and is currently guest editor of Mathematical Programming B.
Title:
Nonlinear and Semidefinite Approaches to Discrete Optimization
Abstract:
Semidefinite Programming (SDP) has proven to be a powerful tool for solving problems across various domains, like combinatorial optimization, control theory, engineering, or polynomial
optimization. Given its wide range of applications, solving SDPs has become a widely studied topic. While interior point methods are the most popular algorithms for solving SDPs, they become impractical for large-scale problems, either because of the high number of constraints or the size of the matrix variables.
In this talk, we will show how problems from discrete optimization can be modeled using semidefinite matrices. We will present alternative methods for computing approximate solutions to these SDPs in reasonable time and using affordable memory requirements. These approaches are based on the augmented Lagrangian algorithm and are particularly effective when the set of variables can be naturally split into two (or more) groups, enabling efficient optimization over
each group. In cases where the SDP lacks strict feasibility, the use of facial reduction techniques becomes essential for developing effective solution algorithms. We will illustrate the use of facial
reduction on the SDPs arising from our discrete optimization models.