Dan Gusfield

Tutorial on Integer Linear Programming in Computational Biology

Abstract:

Integer Linear Programming is a versatile modeling and optimization technique that was developed for complex planning and operational decision making.  However, it is increasingly used in computational biology in non-traditional ways, most importantly and  inventively as a computational tool and language to model and study biological phenomena, to analyze biological data, and to extract biological insight from the models and the data.  Integer linear programming is often very effective in solving *instances* of biological problems on realistic data of current importance, even for hard computational problems that lack a worst-case efficient solution method.  Moreover, even if a worst-case efficient algorithm is possible for a particular problem, the time and effort needed to find and implement such an algorithm is typically much greater than the time and effort needed to find and implement an ILP formulation for the problem.  Then, for any given problem instance, highly engineered ILP solvers can be used to solve the ILP formulation.  The improvement of the best ILP solvers has been spectacular, with an estimate that (combined with faster computers) benchmark ILP problems can now be solved 200-billion times faster than twenty-five years ago.  The effectiveness of the best modern ILP solvers on problem instances of importance in biology opens huge opportunities —  broad exploitation of integer programming could have a truly transformative effect on computation in biology and perhaps medicine.

The goal of the tutorial is to introduce and detail *modeling* and *solving* of real problems in computational biology using integer linear programming.  This is in contrast to the goal of explaining the theory of linear programming or how integer programming or linear programming solvers work. We will use a commercial ILP solver, from Gurobi Optimization, to solve specific ILPs that we formulate. The tutorial starts with an introduction for people with no prior background in linear programming or integer linear programming. We will discuss several problems, models and ILP solutions selected from network biology, phylogenetics, RNA and protein folding, clustering, SNP haplotyping, DNA sequence assembly, genome-wide association studies, sequence analysis, protein function analysis, etc.  The ILP formulations will illustrate many different ILP “idioms” and their use.  Finally, after studying several ILP examples in computational biology, we will discuss what is distinctive about the use of integer programming in computational biology, compared to more traditional uses of integer programming.

Educational material will be provided. If you want some hands on experience, please install the Gurobi ILP solver (free for academics and researchers) on your laptop before the tutorial.