This year’s Conference on Digital Experimentation brought together analysts from academia and companies, including Uber, Facebook, Microsoft, and LinkedIn. Beyond the technical sophistication (covered in my last blog), a key theme was how to better help humans to improve decisions based on digital experimentation and machine learning.
First, Stanford’s Susan Athey discussed the trade-off between personalization and cost in her work with the World Bank. The same intervention program for e.g. unemployed men may work for some but backfire for others, who do not like the household chores you recommend they’d train to do. Ideally, you would a personalized intervention for each person, but that comes at a huge cost. Bringing this personalization cost into machine learning, forces the algorithm to explore simpler policies. In projects where the cost of personalization is low, the algorithm virtually always choses to personalize the treatment. When the cost of personalization is high, the algorithm often goes with the simpler solution, which is also much easier for decision makers to understand, communicate and implement. For instance, we learned that telling people that many others voted, on average helps to get out the vote. We can now research when this does not work or backfire, and whether personalizing would be worth the cost.
Second, MIT’s Edmond Awad discussed the findings from the Moral Machine Experiment: humans across the world show different trade-offs for how a self-driving car should save individuals based on e.g. age, socio-economic status and crossing the sidewalk illegally. Western countries had a stronger preference for saving the young than Eastern countries did. High income inequality countries such as Columbia showed a clear preference for saving the higher-status person, while e.g. Scandinavian countries did not. And countries with strong government institutions, such as Finland and Japan, showed a stronger preference for sacrificing the jaywalker than countries such as Nigeria or Pakistan did. Edmond noted that many of these findings contradict with what experts believe should be the tradeoffs for autonomous vehicles. Should the algorithms follow the experts’ normative preferences or the revealed preferences of humans in a given region of the world?
The discussion boiled down to the question: what does a machine need to have for us humans to entrust it with making decisions? For Susan Athey, it should be ‘a causal model and based on experimentally varied data’. For me, it also needs to be based on the right goals (objective function). Just as I do not fully trust e.g. Facebook or YouTube to have my or society’s best interest in mind when providing my Feed, algorithms are more likely aiming to e.g. optimize clicks or time spent instead of what I understand as my physical and emotional health. Indeed, a key conclusion of the discussion was that most organizations, despite claiming to ‘make the world a better place’ do not really have long-term metrics to reflect such ideals. Auditing and peer review by committees of experts could help, but a clear definition of our vision is needed for such measures to work. In the concluding words of the moderator: “Humanity can now optimize all we want, we just don’t know what we want”. What do we want?
 For a full explanation of this contextual bandit platform, see Zhou Athey and Wager (2018, who use historical data to estimate policy benefits. Set up as a classification problem, each observation contributes to finding out which arm gives best results for a given individual, using not her own data but the cross-individual regressions of others weighted by propensity score.