For every push to a Gecko repository, and for periodic things like nightlies, we generate a task-graph containing all of the tasks to run in response. This is all controlled in-tree, making it easy for any developer to add new tasks when necessary.
The full task-graph is currently just over 8000 tasks, and growing. Running 8000 tasks for every push would be slow and wasteful, so we have two mechanisms to limit the tasks we run. The first is “target tasks”, which selects the desired tasks based on try syntax or the tree to which the push took place. The second is the topic of this post: optimization.
Before diving into the details, let’s address why optimization is important.
First and foremost, in many cases it gets important results back to developers sooner, increasing development velocity. Not displaying feedback unrelated to a push also helps to focus attention on the important results.
Optimization also helps to ease real resource issues Mozilla faces. Tasks such as Talos or OS X tests must run on our own hardware, and we have a finite (but large - about 500 just for OS X!) quantity of that hardware. When the quantity of work to do exceeds the capacity of that hardware pool, tasks begin to queue, delaying important feedback to developers. Buying more hardware is always an option, but the fixed costs and long lead times mean that effort spent optimizing the work done on the existing assets has a big return.
Most tasks take place in the cloud (AWS), which provides a more elastic environment that is able to burst when load is high. We run well over 10,000 spot instances simultaneously every business day, producing terabytes of data. Even at pennies per hour and gigabyte, that quickly adds up to “real money”. Reducing that cost, or limiting its growth, is an important goal in its own right.
Optimization has risks, too: over-optimization can skip tasks with important information. That might mean that a try push looks fine but has failures when it lands, or that push containing an error appears green but causes a failure on a subsequent push. In any case, time-consuming bug hunts and backouts ensue.
The try server causes a lot of consternation - it’s difficult to figure out what syntax to use to run all of the tasks that might be relevant to a push, resulting in either over-estimation (and thus wasted capacity) or under-estimation (risking missed bugs and backouts). Ideally, machines could figure this out: a push to try with no syntax would run just the necessary jobs, no more and no less.
Back to the task-graph generation process. During the optimization step, each task is examined for reasons that it might not be run. For some tasks, such as toolchain or docker image builds, it is possible to find and substitute an existing task that used the same inputs. SETA also applies at this stage.
Finally, some jobs are annotated with “when.files-changed”, a list of filename patterns. When the set of files changed in a push does not match this list, the job is optimized away.
Fine-tuning when we run eslint is only 1/8000’th of the problem. We need an approach to optimization that can take some “big bites” out of task graphs, such as omitting entire platforms or test suites. We have done a little of this for servo, e10s, and a few other projects, but using ad-hoc approaches.
While pursuing large impact, we need to ensure we do not over-optimize, so the approach must fail open: if in doubt, a task should run. Once wasteful runs are observed, it should be fairly simple to define the circumstance and represent that in code. This approach is the opposite of “when.files-changed”, which over-optimizes unless the author has named every file that might affect the task.
The proposed approach is to tag each source file with named task groups that it
“affects”. In this case, “affects” is taken strictly. For example, a change to
a chrome file is best tested by the
browser-chome suite, but could
potentially affect other tests or even builds if it contains a syntax error.
However, a push containing only changes under
possibly affect anything but reftests. Similarly, changes limited to
possibly affect any platform but Android.
The tagging is done using familiar clauses in
with Files('mobile/android/**'): AFFECTS_TASKS += ['android']
(note that this syntax is illustrative; the details are not yet decided)
To “fail open”, files which do not have any tags are treated as having all tags. Given these annotations, for a given push, it is straightforward to calculate the set of affected tags.
Task configuration can contain an optimization element specifying a set of tags containing this task:
label: android-test-android-4.3-arm7-api-15/debug-reftest-1 optimizations: - [skip-unless-affects, [android, reftests]]
Meaning, skip unless the changes affect either
most cases, this value would be calculated by the
rather than included directly in the YAML source files.
Directories such as
browser/ that are only used on
certain platforms are an easy target. Changes limited to these directories can
omit entire platforms.
Test-only changes can also be aggressively optimized. While each test file technically affects only one task, we can achieve a good balance of effectiveness and detailed configuration by tagging by test suite (browser-chrome, wpt, mochitest, etc.)
Servo synchronizes its changes into the
servo/ directory in the Gecko source
tree, and changes to that directory only realistically affect the
linux64-servo platforms. Tagging
servo, and adding
skip-unless-affects for tasks on those two platforms would be enough to
achieve this result, with the advantage that tasks for other platforms can be
re-added easily using the “add jobs” or “backfill” actions in Treeherder.