The design of experiments (DOE) enables experimenters to plan data collection for maximum statistical information gain with minimal cost. Since the 1930s, Fisher's pioneering work on the DOE has made a profound impact on agricultural science. The introduction of orthogonal arrays by CR Rao in the 1940s further revolutionized DOE with its far-reaching implications on manufacturing and quality control. More recently, the proliferation of large datasets has highlighted the need for optimal hyperparameter selection in machine learning. DOE-guided tuning of hyperparameters can significantly reduce computational resource consumption, contributing to climate change mitigation efforts.
The collection of settings at which inputs are adjusted for an experiment is defined as a design. Mathematically, the DOE constructs efficient designs by optimizing suitable criteria defined in terms of information gain. The DOE frequently requires tailoring of experimental setups to suit specific objectives. For instance, 'screening experiments' identify the key factors that influence the response and are crucial in manufacturing. As another example, 'choice experiments' are used to discern the implicit preferences of human respondents and are employed in public health and transportation. In this presentation, I will discuss some key challenges and offer their solutions in the areas of screening and choice experiments.
Screening experiments use a small number of runs to study a large number of factors, of which only a small portion is expected to be important in explaining variability in a response variable. Designs used for screening are called supersaturated designs. The commonly used design optimality criteria are inadequate for selecting supersaturated designs. As a result, there is an extensive literature on alternative optimality criteria. Most of these criteria are rather ad hoc, and unlike almost any other optimal design problem, the criteria are not directly related to the method of analysis. A popular method of analysis for supersaturated designs is the Gauss-Dantzig Selector. I will briefly introduce the two new design selection criteria we proposed that are inspired by the asymptotic properties of the Gauss-Dantzig Selector. Further, I will introduce the Pareto efficient designs that we constructed using a multi-objective Pareto-based coordinate exchange algorithm. I will further show that our designs perform better as screening designs than those obtained using other criteria.
A choice experiment consists of N choice sets, each containing m options. A respondent is shown each choice set in turn and asked for the preferred option as per their perceived utility. Each option in a choice set is described by a set of attributes, where each attribute has two or more levels. For two-level choice experiments, I will first derive a simple form of the information matrix of a choice design for estimating the main effects. This form then enables us to construct D- and MS-optimal paired choice designs that attain their respective lower bounds under the main effects model for any number of choice sets. I will also show that optimal choice designs with a choice set size two often outperform their counterparts with larger choice set sizes, which contributes to overcoming respondent fatigue. To further mitigate the problem of respondent fatigue, I will briefly introduce the block effects in the main effects model, which enabled us to provide theoretically tractable optimal block designs. These designs provide experimenters with the flexibility to choose the number of questions asked to each respondent.