mozilla :: #rust-infra

11 Aug 2017
16:05aidanhsnotriddle: have you deleted the aelita repo?
16:09notriddle is still here.
16:09notriddleI kept a copy. I just deleted the org.
17:16erickthere: everyone update their git!
17:17ericktmalicious ssh:// urls can do arbitrary code execution
19:55shep@acrichto did you mean to r+ ?
19:55acrichtoshep: nah just a few comments to address
19:56shepacrichto: ok, you may want to be clear which you meant to address and which for the future; the back-and-forth made it seem like it was good and there's fixes in the future
19:56shepacrichto: wait, you said "this seems great enough to land in the meantime."
19:57acrichtoI'll clarify
19:57shepenglish is hard :-)
21:30aturonTimNN: erickt: tomprince: howdy
21:31frewsxcvhey all, i haven't had much time to do anything rust infra-related the past few months and (unfortunately) i don't see that changing anytime soon, so i messaged aaron early today and told him i'm going to take myself off the infra roster. i'm still going to be on docs team though and still have time for that team, and i'll do a rollup occasionally :)
21:31ericktaturon: howdy! I&#39;m finishing up something up, but then I&#39;ll be around
21:31frewsxcvkeep up the great work everyone
21:31aturonthanks frewsxcv for all you&#39;ve been doing -- and you&#39;re welcome back any time
21:32aturonsimulacrum and brson are away
21:32aturonok, let&#39;s get started!
21:32aturonoverall PR tracker seems to be following the usual pattern
21:32aturonqueue has been moving well
21:33aturonanybody have issues to report? i know some sccache failures cropped up this week
21:33acrichtoyeah those were
21:33aturonacrichto: any idea about those?
21:33acrichtono :(
21:33acrichtothe linked sccache issue has a bit of discussion
21:34acrichtowe&#39;d have to have more debugging to figure it out
21:34aturonhow often are we seeing these? worth focusing on?
21:34acrichtokennytm may know more about the spurious failures though
21:34acrichtoafaik we&#39;ve seen two so far, so not super urgent
21:34acrichto6 and 2 days ago
21:34aturonok; let&#39;s keep an eye
21:35acrichtoafaik I haven&#39;t seen much else
21:35aturonhow are we doing otherwise? any macos issues?
21:35* aturon looks forward to eventually having spurious failure logging on retries
21:36acrichtoI haven&#39;t looked at the build times in ages, but spurious-wise we&#39;ve been pretty good lately
21:36aturonbut ok, it&#39;s sounding like another week with nothing on fire spurious-wise
21:36aidanhsoh, I think I saw the appveyor failure to remove a dll once
21:36acrichtooh our osx build times are still &quot;real bad&quot; :(
21:36acrichtoah yeah dll ==
21:37aturonacrichto: can you expand? the issue before was timeouts, do you mean that or just more generally they are hurting cycle times?
21:37aidanhsacrichto: have you seen any osx timeouts? I could do with a couple of recent examples if so for graph testing
21:37acrichtooh sure, osx build times have for many moons now been 2.5+ hrs
21:37acrichtothey&#39;re managing to come in below 3 hrs, so not timing out
21:37acrichtobut the target goal for travis is 2hrs
21:37acrichtowhich is what almost everything else runs at
21:37aturonacrichto: i wonder if we could record this info somewhere
21:38acrichtoafaik this just slows the queue down, so not an urgent problem
21:38aturon(in terms of what we&#39;d like to see and what we commonly see)
21:38aturonactually we&#39;d wanted to get the build matrix documented somewhere
21:38acrichtohm perhaps!
21:38aturonwhat would you think about a super simple google sheet
21:38acrichtodescribing the whole matrix?
21:38aturonjust showing the various builders, anything of note in terms of config/test differences, desired cycle time, and current cycle time?
21:39aturonotherwise it&#39;s hard to keep track of how we&#39;re doing on this front (for me at least)
21:39aidanhsoh the subject of PRs, the queue did get pretty huge until someone did a rollup so it&#39;d be nice to get a writeup on forge on how to do them
21:39acrichtoseems fine, although I&#39;d probably prefer it goes directly into .travis.yml if possible
21:39aturonacrichto: oh that&#39;d be fine too :)
21:40aturonaidanhs: agreed
21:40aturonok so there are a couple of possible action-items
21:40aturonacrichto: so re: .travis.yml, you&#39;re imagining just putting a comment next to each?
21:41acrichtoperhaps yeah, or just a big block comment at the top describing what&#39;s what
21:41acrichtoalthough a google sheet is also fine
21:41acrichtowe can start w/ that
21:41aturonsomeone want to take that on and work out a good format?
21:41aturonkeeping it in the .yml seems like a good idea to me
21:42aturonacrichto: how&#39;s about i hit you with questions about it and write up the comment?
21:43aturonaidanhs: i agree re: rollup instructions, i don&#39;t think it&#39;s terribly complicated so should be an easy writeup
21:43aturonwho&#39;s equipped to do that?
21:43aidanhsjust so I&#39;m clear - what information are we going for here? general commentary like &quot;ideally everything should be below 2 hours but osx isn&#39;t&quot;?
21:44aturonaidanhs: i was thinking overview of the matrix setup -- i.e. an explanation of how things are structured at a high level
21:44aidanhsaturon: frewsxcv could do it as a parting gift ;) I can look up rollupers and find one and get them to volunteer
21:45aturonaidanhs: (together with cycle time observations and expectations)
21:45aturonaidanhs: sounds good
21:45aturonOK, i think that&#39;s all for spurious/PRs
21:46aturonaidanhs: you had an item on cargobomb, want to talk about that real quick?
21:46aidanhsyeah, I&#39;ve got a cargobomb run going
21:47aidanhsthere were some papercuts that I&#39;m going to write down
21:47aturonaidanhs: this is running on the shared server, right?
21:47aidanhsseems like the current point of reference is the cargobomb readme, not sure if it should move to the forge or not
21:47aidanhsone of the shared servers, yes
21:48aturonreadme seems fine, we can just link to it from the forge
21:48aturonaidanhs: so other than the planned papercut notes, anything else to discuss right now?
21:48aidanhsanyway, I&#39;ll be caretaking that until it&#39;s done, so by next week there will hopefully be cargobomb 0 to pro instructions
21:49aturonooh that sounds awesome
21:49aturonok cool! so, thin agenda today -- the other topic is the roadmap
21:50aturonbasically, i think we should set out a rough plan for the remainder of 2017
21:50aidanhs(I&#39;ll just also note the jemalloc thing I mentioned last week should be fixed)
21:50aturonthere&#39;s a doc here:
21:50ericktaturon: actually get the dev account setup ;)
21:50aturonyeah so we should talk through it
21:50aturoni had an internal list of stuff i think is high priority, which matched with acrichto&#39;s mental list anyway :)
21:51aturonso i think the place to start is, what are the biggest things we want to focus on
21:51acrichtooh I forgot about perf!
21:51aturoni feel like cargobomb is the next most important service used by the project that isn&#39;t in great shape right now
21:51acrichtoI&#39;d advocate for cargobomb, perf, and queue triage
21:52aturonacrichto: queue triage?
21:52acrichtohm I guess that&#39;s probably not right
21:52acrichtoin the sense of not much new there, mostly just &quot;make sure we don&#39;t let up&quot;
21:52aturoni def agree with that!
21:53acrichtook so s/queue triage/config management/
21:53aidanhsin place of queue triage, I would like to insert &quot;more shell scripts to manage infra&quot;
21:53aturonbut in terms of deeper work
21:53aturonaidanhs: this is what i mean by &quot;config management&quot; basically --
21:53aturonor at least, cfg management is a prerequisite
21:53aturonso we should get a bit more detailed on each of these
21:54aturonfor cargobomb, at minimum i&#39;d like to get to a place where requests for a run are handled routinely via PR triage
21:54aturoneven if that requires a fair amount of manual handling for the time being
21:54ericktaturon: to trigger a cargobomb?
21:54aidanhsi.e. anyone capable of doing triage can start a run?
21:54ericktoh, you just said that
21:55aturonyes re: starting run, and also coordinating runs to the extent that&#39;s needed
21:55aturonthat seems like the MVP of having cargobomb as a reliable service
21:55aturonpartly human-powered :)
21:55ericktheh. how worried are we about the cargobomb cost?
21:56aidanhsI don&#39;t think aturon is suggesting we get more than two boxes
21:56aturonno i&#39;m not :)
21:56aidanhspart of PR triage can just be checking in on any ongoing runs
21:56aturonyeah that&#39;s what i meant about manual coordination
21:57aturonlike again, this is very MVP territory
21:57aturoni&#39;d just like to get us to a state where it&#39;s being done consistently
21:57aturonand then we can work on improving the process, automation, job maangement etc
21:57aturondoes that seem like the right goal to start with to everybody?
21:58aturonaidanhs: given your experience so far, it seems plausible we could meet that goal by the end of august?
21:59aturon(i&#39;m trying to map things out on a rough calendar, to help us track what&#39;s in flight and set some concrete time-frames)
21:59aidanhswthe bastion may be problematic if some people don&#39;t have access to static ips
22:00aidanhspotentially we could have a separate rotation for cargobomb
22:00aidanhsso cautiously yes
22:00aturonalright, i&#39;ll put it down for now, and we can circle back to logistics
22:00aturon(and likewise, depending on how things go, there may be more we want to do on cargobomb this year, ofc)
22:01aturonaidanhs: i&#39;ll also assign you as lead for the MVP?
22:01aturon(as opposed to enhancements to cargobomb itself)
22:01aidanhsfine by me
22:01aturonso config management/scripted control
22:02aturonhere again, it&#39;d be good once and for all to get on the same page for our MVP
22:02aturonso, first off, the service list: Play, perf, bastion, rfcbot/rusty-dash, highfive
22:02aturon(plus of course stuff on rcs today)
22:02aturonanything missing?
22:02aidanhsRustStatus ish
22:02aturonah yep
22:03acrichtonginx/doc instance
22:03* acrichto pulls up a list
22:03acrichtocrater (it&#39;s got instances)
22:04acrichtothere&#39;s a rustup-deploy but I don&#39;t know what that is, otherwise that&#39;s the list
22:04aturonok great
22:04aturonso next question: what do we all mean by config management/scripting?
22:04aturonacrichto: you go first?
22:04aturon(again with MVP focus)
22:05acrichtofor me it&#39;s a way to run a command and rebuild an instance from a &quot;known and well documented good state&quot;
22:05acrichtoe.g. `docker build` to get a container or &quot;run a script on a fresh aws instance&quot;
22:05acrichtoand then it also entails actually deploying our instances w/ this management
22:05shepbut not &quot;create that aws instance&quot;, right?
22:05acrichtoso i guess just a declarative form of the infra, but starting from a &quot;known good point&quot;
22:06acrichtoso yeah shep not necesarily scripting the creation of the instance
22:06acrichtojust a way to , for example, go from a new instance to play and/or the bastion
22:06aturonok, aidanhs i believe this aligns well with your definition/goals as well
22:06acrichtothe primary goal here is to enable code review on deltas
22:07ericktacrichto: what do you mean by an instance? like the cargobomb instance?
22:07ericktor just the process in general?
22:07acrichtooh just like an ec2 instance
22:07acrichtoit&#39;s sort of vague b/c but I really just mean to leave us room
22:07aidanhsyeah, I&#39;d just add - common operations defined by shell script (or other documentation, I just like shell scripts because they&#39;re &#39;living&#39; documentation)
22:07acrichtoe.g. I wouldn&#39;t consider an MVP being &quot;full terraform&quot;
22:08acrichtobut the scripts so far are a good example
22:08acrichtoah good point!
22:08* erickt still really needs to write up some documentation on terraform
22:08aidanhsthere is a hairy aspect of setting up from known good state, which is secret management
22:08aturonacrichto: aidanhs: so, scripts to rebuild from known state, and scripts for common workflows?
22:09aidanhsaturon: +1
22:09aturonok sgtm
22:09aturonaidanhs: so, uh, i think you&#39;re lead on this MVP as well :)
22:09aidanhsif erickt has thoughts on secret management I&#39;d like to hear them, possibly before the terraform docs
22:09ericktaidanhs: if we&#39;re okay with AWS support, it does have a service called Parameter Store, which provides a way to fetch key/values, and it&#39;ll take care of encrypting them on disk
22:10aidanhsok, I&#39;ll look into that any maybe ask you about it at some other time
22:10ericktthe whole &quot;encrypt a s3 object with kms&quot; also works out
22:10ericktpretty simply. I&#39;m evaluating hashicorp vault, which is a hardcore secret management system
22:10ericktbut I&#39;m not yet comfortable enough with it to put it into production
22:10dikaiosunevault is fun
22:10dikaiosunebut hard
22:10aturonaidanhs: ok i&#39;ve put you down for MVP lead here as well, at least for the moment
22:10aidanhsyes I&#39;ve heard of vault
22:11aturoncan we carve out a milestone or two toward MVP?
22:11aturon(and schedule them)
22:11aidanhssecret management is most pressing in my mind, so I&#39;d like to have a direction for that by the end of aug
22:12aturonaidanhs: ok and i know you also had a subset of the services you were hoping to get up for review
22:12aturoncan we write down that list on the calendar?
22:13aidanhsmaybe: full RCS with secret management, plus play (hi shep!)
22:14aturonacrichto: sgty?
22:14aidanhsI don&#39;t want to overcommit to ruststatus as well as I did last time, as secret management is too much of an unknown to me
22:14aturonok so that&#39;s milestone #2
22:14aturonshall we tentatively put that in sept?
22:15aidanhsRustStatus? yeah that sounds reasonable
22:15aturonoh sorry, i meant the RCS + secrets + play
22:15aturonor did you intend that the first milestone would cover that?
22:15aturoni&#39;m tryign to detail all this in the roadmap doc
22:15aturonwhich might make it more clear
22:15aidanhsahhh. let&#39;s put it in sept and maybe I can overachieve
22:16aidanhsplus ruststatus
22:16aturonaidanhs: ok, does the doc look right?
22:16aidanhsaturon: lgtm
22:16acrichtoI think another item for step that may be good is perf &quot;back online&quot;
22:17aturonacrichto: say more?
22:17acrichtoright now it&#39;s got a few hiccups which make it difficult to work with
22:17acrichtoso as of this writing we have massive &quot;regressions&quot; due to difference in measurements, and we&#39;ve got one benchmark that&#39;s been ICE&#39;ing for a few weeks
22:17aturonacrichto: to be clear, are you adding this to the cfg management stuff or switching topics?
22:17acrichtooh sorry, yes switching
22:18acrichtoin terms of &quot;more action items for september&quot;
22:18acrichtoI will hold off!
22:18aturonnp, it was the next topic anyway
22:18aturonbut i&#39;d like to approach it the same way, like what are reasonable MVP/milestones
22:18acrichtooh ok, basically I think we should have fully evaluated the alternative, instruction counting
22:18acrichtobasically we should also switch how we time things by default, using wall clock instead of summing time-passes
22:19acrichtoI say &quot;basically&quot; too much
22:19aturonhm so to summarize:
22:19aturonmilestone #1: get robust perf measurements
22:19aturonseem right?
22:20acrichtoyeah and #2 is: avoid using time-passes summation for the default view
22:20aturon(which might involve switching timing/measurement strategies, going back to a dedicated server etc)
22:20acrichtobut yeah I can see these being lumped together
22:20aturonso #2, i feel like there are many issues on the visualizaiton side
22:20aturon(which hasn&#39;t really been getting much attention)
22:21aturondo you feel like there&#39;s a useful view today that we could make the default?
22:21aturonwhen we talked before, it seemed like the answer is no, and that we need to take a different approach to visualization
22:21aidanhsdoes &quot;get robust perf measurements&quot; mean &quot;decide on some appropriately robust perf measurements&quot; or &quot;given we know how we want to measure, make that measurement robust&quot;?
22:21aturonaidanhs: yes :)
22:21acrichtoI do believe we have a useful view by defautt now --
22:21acrichtowe just need to do some work to reduce the noise there
22:22acrichtoso in essence yes, there&#39;s a more useful default view
22:22aturonOK, switching the default shoudl be quite easy
22:22aturonmaybe that shoudl be milestone 1
22:22acrichtoyeah that sgtm
22:22aturonafter both of these, we can talk about further improvements to UI
22:22acrichto&quot;switch the default view to something more evocative&quot;
22:23aidanhsyes please, the current default is totally opaque to me :P
22:23aturonok i&#39;ve updated the doc
22:24aturonput milestone #2 in sept
22:24erickthey sorry folks, I gotta peel off a little early
22:24aturonah and a cross-cutting thing for planning: we want, by mid-sept, to have a good set of tasks ready to go for the impl period, for more folks to join
22:24aturonerickt: np, have a great weekend!
22:24ericktcan&#39;t wait to see some/all of you next week!!
22:25aturonah yeah, we won&#39;t have a meeting next week
22:25aturontoo many of us will be at RustConf
22:25ericktps: i&#39;m showing up thursday, so if any of you are coming early lets get dinner
22:25aturonok so, between the milestones we laid out and impl period prep, i feel like we&#39;re full up through sept
22:26aturonso i propose we declare that the plan for now, and we can revisit at the end of august
22:26aturonhow&#39;s that sound?
22:27aturonwell that pretty much closes out the agenda
22:27aidanhsthere was at least one nominated issue
22:27aturonoh shoot
22:27aidanhsbut we&#39;re a little short on time
22:27aturonthank you
22:28aturoni don&#39;t mind going a bit long personally, but others should feel free to take off
22:28aturonso aidanhs i assume this comment is the one you want to discuss:
22:28acrichtoheh, so conceptually I don&#39;t even know what a &#39;merge driver&#39; is
22:28aidanhsit&#39;s less directly related to this issue and more a) who has responsibility for &#39;scripts&#39;, e.g. in src/etc
22:29shepacrichto: it&#39;s a plugin to git that makes merges more better ;-)
22:29acrichtoeh I don&#39;t think there&#39;s really a &quot;one owner&quot; of src/etc
22:29aidanhsand b) if the who is &#39;the infra team&#39;, then what&#39;s the process for deciding whether a script is sufficiently useful to merge?
22:29acrichtoit&#39;s more just if the script touches your stuff you probably own it
22:29aturonyeah it seems likely to be split between infra and dev-tools
22:30aturonhowever for this particular PR, clearly infra
22:30aturonre: process, for this kind of thing we often use rfcbot
22:30aidanhsI raised on the issue the idea of a &#39;contrib&#39; directory, as git has, for things that we won&#39;t support but people may find useful (paraphrasing)
22:30aturon(which you can use on PRs and issues)
22:30aidanhsah that&#39;s a good point
22:31aturonit&#39;s feeling like there are a few different issues to sort through here
22:31aturoni don&#39;t think etc has had a whole lot of direction over-all
22:32aturonso if someone wanted to work on a more careful organization and propose it, that&#39;d be welcome imo
22:32aturon(and as you say could inform questions about merging)
22:32aidanhsre lack of direction: indeed, I started to wonder if older parts of etc would actually move into a hypothetical contrib directory
22:32aturonheh, seems about right :)
22:33shepaidanhs: I read this PR as being for people working on rust-lang/rust, not just in general; do you think it&#39;s intended to be more general?
22:33aidanhsno, it is just for contributors to rust-lang/rust
22:34aidanhsso it&#39;d be slightly out of the scope of the git contrib directory
22:34shepI always parsed git&#39;s contrib as &quot;these things are for users of git, but unmaintained&quot;
22:34aturonseems like the etc/contrib stuff was a mostly distinct line of thought
22:34aidanhsbut there is an interesting note in the git contrib philosophy, which is to get acceptance on the mailing list first, or maybe internals in our case
22:35aturonso ok, i think the answer to the nominated question is: plausible! but we need a slightly more detailed strawman proposal
22:35aturonfor the sake of time, let&#39;s move on to the other nomination,
22:35aturonacrichto: ^
22:35shepAnd if we wanted such a thing, we could always have a nursery project
22:35aidanhsok, I&#39;ll have a think about that
22:35aturonaidanhs: <3
22:35acrichtooh right now I think this is a &quot;no&quot;
22:36acrichtoin terms of eventually yes, for now we&#39;ve decided we need money for this
22:36aturonacrichto: ah right this is in the &quot;big builder&quot; category
22:36acrichtoand we don&#39;t quite have the infra set up to accept money
22:36aturonyep ok
22:36aturonwe can talk offline about whether to ask mozilla to sponsor this one
22:36aturonalright, that&#39;s a wrap folks!
22:36aturonthanks everybody, we&#39;ve got an exciting couple of months planned!
22:36aturonsee y&#39;all in two weeks (and many of you in one)
22:39shepaidanhs: you gonna be at the conf?
22:40aidanhsshep: sadly not
22:40shepphew! Now you can&#39;t remind me in person teehee
22:40aidanhsI&#39;m going to rework my PR though to be more in line with what you suggested as a more incremental approach
22:42shepthat sounds wonderful
22:42shepAny idea how the *other* bits would occur?
22:42shepfor example, there&#39;s the crontab to update the compilers and there&#39;s a upstart file to make sure things are running
22:43aidanhsthe crontab is relatively simple, just docker pull rather than download
22:44aidanhsupstart is slightly trickier, but I happen to know of a docker book which talks about this so I&#39;ll dig it up
22:45aidanhs(&#39;this&#39; being &#39;managing containers with a linux service manager&#39;)
12 Aug 2017
No messages
Last message: 10 days and 22 hours ago