mozilla :: #taskcluster

14 Jul 2017
00:33jonasfjdmose: sorry, we only have spot VMs, and the interactive stuff probably has a few bugs to keep it interesting :)
00:34jonasfjyou can try the "edit loaner task" button, and tweak deadline, expires and maxRunTime, but it's all spot...
00:41bstackjonasfj: a new error?
00:46* Callek triggers a new win nightly set -- I think I have l10n fixed, and with luck we'll be at beetmover-ready for both 32 and 64 now including l10n
00:46* Callek doesn't expect to see results until AM
02:20glandiumfwiw, having had a job fail because I forgot to add the script to run it, we now have, between docker image download and mercurial, a setup time of over 4 minutes
02:21glandiumcorrection: slightly under 4 minutes
02:21glandiumgps: ^
02:22glandium(for new ec2 instances, that is)
02:42glandiumwhy, oh, why do windows builders not expose the same set of TASKCLUSTER_* variables as linux ones?
03:52dmosejonasfj: thanks
06:18tomprincedustin: Anything I can do to help with ? (Write a script that can live in the repo?)
07:53Callekpmoore|away: grenade : ping, what happened to chain of trust in this job. ... we had a bunch of fails like that last night
08:09Callekpmoore|away: grenade : this seems to also have broken for osx cross, since the 12th. (12 being last good) bisect doesn't show any obvious changes in tree.
08:10Callek Of course I'm investigating on my phone while I wait for daughter to get back to sleep
08:12pmoore|awayCallek: I also can't check now but can a bit later (my daughter had an ear operation this morning)
08:13Callekpmoore|away: ouch hope she made out ok.
08:14CallekI think double eat infection is what's keeping my daughter up
08:14CallekShe's going to see a doctor when they open today
09:40pmoore|away:( Callek I'm sorry to hear that, hope she gets better quickly.
10:19grenadeCallek: the linked log is to a job that reports success. i haven't yet worked out which failures are interesting in the task group.
10:20grenadeignore that, i see it
10:42pmooregrenade: Callek: which task(s) in that task group had an issue?
10:43grenadeit looks like everything subsequent to the beetmover task to me
10:44grenadei don't recognise the failure logs as coming from g-w though
10:44grenadethere's no headers showing the worker config, etc
10:45grenadei've never looked at repackage logs before, so not sure but my guess is it's running on tc-worker
10:45pmoorei think this might be a releng signingworker
10:46pmoorebut maybe it failed due to missing chain-of-trust artifacts in generic worker builds - this is my guess why callek pinged us
10:46pmoorei'll have a look
10:46pmooree.g. i'll have a look into
10:46pmooregrenade: thanks for the link above :)
10:47grenadeno worries
10:48grenadescriptworker.exceptions.CoTError: 'path public/build/mk/target.complete.mar not in signing:repackage CKeH8b-cTgKISyQ6FoT5Eg chain of trust artifacts!'
10:48pmoorechain of trust for that task is
10:48grenadeis a mar file an osx thing?
10:50grenadeit strikes me as not the right file extension for a windows package
10:50grenadebut maybe its a signing thing..
10:50glandiumgrenade: mar is the extension of the update packages for firefox
10:50glandiumon all platforms
10:50pmooremar i believe is the extension for firefox builds
10:51grenadeah, thanks
10:51pmoorestrange, artifacts list is weird:
10:51pmoorean explicit list of files, and then a catch-all directory at the end
10:52pmoorethis causes double production of artifacts, e.g. public/build/target.complete.mar and public/build/mk/target.complete.mar
10:52pmoorefor example
10:53pmoore(and they are the same file)
10:53pmoorei suspect we 1) either want to list files, or have a single directory artifact
10:54pmoore2) if we use a directory artifact, and "/mk" to the "name" property
10:54pmooresince we now have:
10:55pmoore(i.e. everything doubled)
10:55pmooreCallek: ^
10:56pmoorei'd propose removing the "directory" artifact, i suspect that will fix it
10:58pmoorelooks like this task is generated from
10:59pmoorewhich doesn't seem to be indexed by dxr
11:03* pmoore pulls down from date project
11:18pmooreCallek: grenade: it also looks like generic-worker indexes artifacts by path rather than name in chain-of-trust (from legacy days when it didn't distuinguish between the two) and i'll update it to index artifacts by name instead (and add a couple of tests)
11:19pmooreCallek: you'll still want to remove that directory artifact, anyway, as it isn't needed and causes double artifact entries
11:24pmooreCallek: i've created bug 1380976 for that
11:24firebot NEW, l10n repacks have duplicate build artifacts
11:35glandiumwhat could explain that sometimes "relengapi" resolves and sometimes not? (sometimes as in for some workers)
11:35glandium(for the same job)
11:35glandiumpmoore: ^
11:36pmooreglandium: strange - no idea - do you have a link?
11:37glandiumpmoore|lunch: the two linux builds on
11:37glandiumthe green is just a retrigger of the red
11:55Callekpmoore|lunch: I think you may have gone down a rabbit hole in wrong direction. Isn't .asc a *detatched* sig?
11:58Callek The log I remember reading said it couldn't find the chain of trust . Admittedly I was tired so could have misread
12:14glandiumis one-click loaner supposed to work for windows instances?
12:40jhfordthe internet is barely working here today
12:41jhfordit's 3pm and I doubt it's going to get fixed at a reasonable time
12:41jhfordi've deployed node 8.1.4 on the remaining services, otherwise I'm calling it a day
12:49pmooreCallek: there were two issues, bug 1380976 and bug 1380978
12:49firebot NEW l10n repacks have duplicate build artifacts
12:49firebot ASSIGNED, pmoore generic-worker: chain of trust artifacts should be indexed by artifact name, not artifact path
12:50pmooreCallek: if you saw something that isn't covered by one of those bugs, please can you create another bug with details?
12:55pmooreglandium: that is weird indeed
12:56pmooreglandium: i'll raise a bug
13:07glandiumpmoore: fwiw, it's a PITA that windows workers have subtle differences with linux workers. Like the fact that the mercurial checkout is in $PWD/workspace/build/src in one and $PWD/build/src in the other ; the fact that the artifacts directory is different ; the fact that not all the same TASKCLUSTER_* variables are defined, etc.
13:08glandiumthe fact that git is available on one end and not on the other...
13:10pmooreglandium: these all sound like in-tree things, not taskcluster architecture things
13:11pmooreglandium: i.e. can you fix them please? ;)
13:13dustinpmoore: I don't think we have in-tree images yet
13:13dustinbut maybe the workspace thing could be fixed in-tree
13:14pmooreright, i don't think these are machine-setup things
13:14dustingit being installed is
13:14dustin'relengapi' resolving is
13:14dustinthe TASKCLUSTER_ variables is
13:14pmooregit being installed is
13:14dustinassuming 'relengapi' resolving has to do with g-w not supporting relengapiProxy
13:15glandiumdustin: relengapi is another problem
13:15pmoorerelengapi not resolving is a docker-worker problem
13:15dustinah, ok
13:15pmooresee bug 1381000
13:15firebot NEW, docker-worker: Host relengapi not resolved on some workers of a worker type where other workers reso
13:15dustinanyway, I think wanting to have the environments as similar as possible is good.. maybe separating those out into individual bugs would help figure out what can be done in-tree
13:16pmooreyeah, agreed
13:16pmoorethe checkout directory is certainly an in-tree thing
13:17pmoorei'm not sure what the TASKCLUSTER_* variable differences are, whether those are task env vars, or native to the worker, but if we get it in a bug, then we'll be able to see
13:17glandiumdustin: while I have your attention, does seem reasonable for ?
13:17firebotBug 1374940 NEW, Add taskgraph support for toolchain definitions
13:18glandium(modulo variable names)
13:18dustinyeah, I like it
13:18glandiumpmoore: how is meant to be used?
13:18dustinglandium: so neither of those relengapi tasks has features.relengapi = true
13:19dustinor am I missing something
13:19glandiumdustin: one is a retrigger of the other, so presumably they are configured the same way
13:19glandiumthat's the disturbing thing
13:19dustinfrom what I can tell, the bug is that one of them managed to talk to tooltool
13:20pmooreglandium: if you want to use git in a windows task, this is an example of how you can use the mounts feature of the worker to make git available to your task (in this case, git version 2.11.0 64 bit)
13:20dustinsince neither should be able to
13:20glandiumdustin: the behavior of one of them is obviously not wanted. I'm not sure which is the wanted one :)
13:20glandiumpmoore: where do I put that?
13:21dustinglandium: I suspect it will work fine if you enable the feature
13:21pmooreglandium: see the mounts feature described in
13:21pmooreit is inside task.payload
13:21pmoorei.e. task.payload.mounts
13:22glandiumdustin: so the funny thing is that there is no reference to relengapi in taskcluster/ci except for "android-stuff", yet, plenty of jobs use it and it works for them
13:22dustinright, most stuff uses the public interface
13:22dustinexcept android-stuff that needs the NDK
13:22pmooreglandium: here is an example windows task that uses git:
13:24dustinand any build task with 'tooltool-downloads: internal' has it enabled in a transform
13:24pmooreglandium: can you raise a bug about the TASKCLUSTER_* env variables?
13:24pmoore(and any other differences that are a "PITA"?)
13:24dustinthere are lots more references to android-stuff than you say actually :)
13:25dustin& I don't mean to be arguing with these bugs - you've definitely uncovered something interesting with docker-worker, but I think the fix for you is going to be pretty easy
13:26glandiumdustin: fwiw, all the toolchain jobs using tooltool-download are using relengapi
13:26dustinI wonder if this proxy gets started for one job on the host and doesn't go away?
13:26glandiumbut they have "tooltool-downloads: public"
13:27glandiumbut that specific one that had the failure doesn't
13:27glandiumso yeah, that "proxy leaks from a previous job" is a possibility
13:27dustinoh, interesting, if it's set to public they still set up the proxy
13:28dustinso why does not have that feature set
13:29dustinI don't see it in taskcluster/ci/toolchain/linux.yml
13:29glandiumdustin: as I said, I forgot to add tooltool-downloads: public on that job
13:30glandiumso the question becomes why did the green job get the relengapi proxy?
13:30dustinah, I missed that flipping tabs
13:30dustinyeah, that's the question
13:31glandiumdustin: and btw, windows workers seem to have unconditional access to the relengapi proxy
13:31dustinthat doesn't make any sense
13:32dustinit's not even implemented there
13:32glandiumbecause I didn't have tooltool-downloads: public there either, and relengapi has always worked there
13:32dustinare you sure `mach tooltool` doesn't fall back to the public IP?
13:32glandiumdustin: no it doesn't
13:32dustinrelengapiProxy is a docker container, so I am 101% sure it's not working on windows :)
13:32glandiumwell maybe it resolves somehow
13:32dustinI wonder to what
13:33dustinpmoore: ^^ do you have a quick way to check that?
13:34glandiumas for the discrepancies, I'll file bug later. It's already late :)
13:34pmooredustin: to check what, sorry?
13:35dustinwhat 'relengapi' resolves to on a windows worker
13:37dustin :(
13:38pmooreoh you did it yourself already
13:39glandiumsee where mach artifact toolchain is used with http://relengapi/tooltool
13:39dustinpmoore: I did it on linux
13:39dustinglandium: are you *sure* it's not using another hostname too?
13:39glandiumdustin: the same command line is what failed on the linux worker, using the same code
13:40glandiumwith this error: (Caused by NewConnectionError(&#39;<requests.packages.urllib3.connection.HTTPConnection object at 0x7f8acb9fff50>: Failed to establish a new connection: [Errno -2] Name or service not known&#39;,))
13:40glandiumso yeah, I&#39;m pretty sure that when the host doesn&#39;t exist, it doesn&#39;t fallback
13:40dustinwell, something here doesn&#39;t add up :)
13:40glandiumI certainly can agree with this :)
13:42dustinI&#39;ve never used mach tooltool (ugh, the world really needed a 27th way to use tooltool), so I don&#39;t know much about its implementation
13:42dustinbut that would be the place to look, I think.
13:42dustinsorry, &#39;mach artifact toolchain&#39;
13:42glandiumdustin: the 27th way to use tooltool is the only way used for builds now
13:43dustinwell that&#39;s good at least :)
13:43glandiumand the point was to have a wrapper that takes either from tooltool or taskcluster artifacts
13:44dustindoes it invoke the in-tree directly?
13:44dustinif so then it&#39;s probably the best of the 27 :)
13:44glandiumit uses some of its code, but not for downloads
13:44dustinah, cool, so just importing, not invoking
13:46dustinglandium: oh, maybe it&#39;s finding those files in the cache on success?
13:46dustineh, I&#39;ll let you go, sorry
13:46dustinwlil comment in th ebug
13:47glandiumlet me check the logs
13:48dustinit says &quot;Downlodaed .. to ../tooltool-cache/..&quot;
13:48dustinso I&#39;m guessing not
13:48glandiumthose messages are actually very confusing
13:48glandiumthat is, I&#39;m not sure they mean what they say
13:51glandiumyeah, there should be some &quot;Downloading... x%&quot; if it was really downloading
13:51glandiumso we can breathe
13:51glandiumthat&#39;d explain the weird stuff on windows too
13:51dustinyep, code confirms that
13:51glandiumwould you mind filing a bug about those messages being completely bonkers?
13:52dustinthanks for sticking around :)
13:52glandiumglad we could sort it out
13:53glandium+ find /c/ /z/ -name &#39;*.lib&#39; -o &#39;*.LIB&#39;
13:53glandiumfind: paths must precede expression: *.LIB
13:54glandiumfind, you&#39;re being unhelpful
13:54glandiummissing -name
13:54glandiumwhich means it&#39;s really time to stop
13:57dustinoo, cool, duplicate issue tagging in github
14:00glandiumdustin: finally!
14:02glandiumpmoore: do windows workers ever have some sort of &quot;recycling&quot; like the linux ones do? (by which I mean they don&#39;t have to redo a full mercurial clone) I never seem to hit that if there is
14:03Callekpmoore: sooo, looks like the CoT thing you cited may be an issue indeed, however I also did and just now, which may solve the &#39;multiple entries&#39; thing
14:04Callekpmoore: the windows not having a stable (absolute) directory to throw artifacts in, coupled with needing all artifacts (including directory) to exist or job fails is the biggest issue on why I did it this way
14:04Callekpmoore: you&#39;ll notice in the log shows /home/worker/artifacts is missing but didn&#39;t fail
14:05pmooreglandium: ah, is there a full clone on every checkout without any caching?
14:06glandiumpmoore: at least I&#39;ve never seen a case where it&#39;s not happening, so I&#39;m wondering if there&#39;s supposed to be one
14:06glandium&quot;not happening&quot; being the full clone not happening
14:08pmooreglandium: we should probably use the caches feature of the worker to optimise that - the mounts feature that allows you to mount a read only directory (like the git example before) can also be used to mount a read/write cache
14:08pmooreglandium: i thought in the past, we had an hg share directory somewhere that was shared between jobs, in a fixed location (like C:\hg-shared or something like that)
14:08glandiumdoes z:/task_.... point to c:/windows/system32?
14:09catleejmaher: join #tcmigration for migration focused discussions
14:09pmooreglandium: but maybe that is no longer used(?) in any case, now we have caches in generic worker, that is probably the best path forward (and means we have scope protected caches too)
14:10pmooreglandium: i don&#39;t think so, but any symbolic links would be defined in the task, i believe
14:11glandiumpmoore: I&#39;m asking because I see rustc and vs2015u3, that look like they come from tooltool, in c:/windows/system32
14:11pmoore302 grenade :)
14:13pmooreCallek: ah ok - because docker worker doesn&#39;t fail a task if an artifact is missing?
14:14Callekpmoore: I think it may on type: file, but I&#39;m not sure there; I know it doesn&#39;t on type:directory
14:33jmahercatlee: will do
14:44Callekgrenade: was there anything specific you pinged about win signing for, besides xpcshell?
14:44Callekare there other failures you&#39;re suspecting break without signing?
14:50Callekgrenade: to be extra clear, its tier *3* on central...
15:27pmooreCallek: generic worker doesn&#39;t enforce public/build - that must be an in-tree thing
15:27pmooreone of the transforms?
15:28Callekpmoore: ooo yes, in tree enforces it, (de-enforcing is a bit convoluted)
15:28Callekpmoore: but the main point was I couldn&#39;t easily/cleanly remove the directory entry without making lots of code harder to read, since it happens all the way down at the job/ level
15:28pmooreah ok
15:28Callekpmoore: and since docker-worker doesn&#39;t fail if a directory artifact is missing....
15:29pmooreglad you got it sorted, anyway
15:29Callekyea, I&#39;m waiting on results to *confirm* its sorted
15:29Callekmy change may have even solved the CoT bug (by way of workaround) that blocked me
15:30Callekwe still need to fix that CoT bug, but this should at least avoid the urgency :-)
15:31pmooreCallek: the cot bug is fixed and currently being deployed to gecko-1-b-win2012-beta for testing in bug 1380978
15:31firebot ASSIGNED, generic-worker: chain of trust artifacts should be indexed by artifact name, not artifact path
15:31Callekpmoore: well that was fast!
15:31pmooreon that note, my dinner is ready! have a nice weekend folks!
15:32Callekpmoore: you too
15:36dustingonna run some errands, but feel free to SMS if I&#39;m needed (I know TC is thin on staff today..)
17:23bstackthat makes sense to me :)
17:23bstackperhaps also it could do something other than just blankly display nothing when the old links are used?
17:23bstackmaybe a little warning that says &quot;that is an old style link and won&#39;t work anymore&quot;
17:24bstackbut that is less important than just fixing the wiki
17:38* hassan requested an account for pending for approval
17:39armenzgbstack: fine if I file a bug for moving &quot;add new jobs&quot; to use actions.json?
17:46armenzgbstack: do you have a bug filed for the json-e work for retrigger?
17:46armenzgbstack: I&#39;ve composed this; would you mind reviewing it and see if it is accurate?
17:55bstackarmenzg: yes to all of the above :p
17:58bstackarmenzg: retrigger bug is
17:58firebotBug 1380454 ASSIGNED, port retrigger to action.json
18:39wcostadmose: I heard you are having problems with one-click-loaner
18:42arrI just sshed to a variety of hosts
18:42arrincluding to cruncher-aws from relengwebadm
18:42arrjust to make sure we&#39;re not blocking other users
18:43dmosewcosta: yeah, but unforch i have to run out the door now. will ping another time. thanks!
18:43arrderp, mischan
18:54* hwine looks for someone who can answer quick questions about things deployed on
18:56jonasfjhwine: bstack was sitting closest to Eli when they did it.. :)
18:56jonasfj(Eli being PTO)
18:57hwinejonasfj: thanks -- I&#39;ll hit him up on dm
18:57jonasfjafaik, it&#39;s just a static site, either through S3/cloudfront or heroku (to get the headers right)
18:58hwineoh, I thought the various dynamic sites hung off of there, too -- I meant *
19:07bstackyeah, that&#39;s correct
19:32Callekgrenade: by some weird stroke of luck, are you around now?
19:34dustinstroke of insomnia?
19:48catleewas for working on the TC worker on windwos?
19:48firebotBug 1379603 FIXED, Enable triggering of hardware and talos tests on TaskCluster Windows
19:48catleeor HW tests still via BBB?
19:53dustinI think tc worker on windows
19:54dustinthat was my understanding of the patch, I&#39;m not sure how widespread he intended the results to be
19:54dustin> So, the effect here is that anything that includes will *always* run on hardware, whereas everything else will run in BBB. Is that what you&#39;re going for?
19:54dustin> yes!
20:06catleehm, ok
20:07catleewe&#39;ve just landed the BBB stuff on date
20:07catleewhich makes merging this in very confusing
20:16catleedustin: ^^
20:16catleeerr, wrong channel
21:35dmosedustin: wcosta: here&#39;s a link of the most recent time my one-click got killed:
21:35dmosei have to afk again now, unforch
21:35dmosebut maybe that&#39;ll be helpful
22:11jonasfjyeah, ^ that task was resolved deadline-exceeded, which so clearly that the deadline bug in the one-click-loaner create process..
22:12jonasfjhassan: were you looking into fixing that^ or do mind if I jump on it?
22:12jonasfjnvm, I see you already did..
22:13* jonasfj doing the review..
22:16dmosejonasfj: hawt! can you give me a link to the bug?
22:16firebotBug 1359468 REOPENED, In task-creator, adjust *all* timestamps relative to created
22:19jonasfjhassan: I gave an r+ on
22:19jonasfjhassan: I don&#39;t know if it&#39;s worth the hassle to avoid moment, but you and eli seems to like when we keep things small and lean :)
22:53glandiumsome windows builds have a tough time cloning mercurial...
22:54glandiumactually, that&#39;s just checking out, the network part is finished
22:54glandiumIOW, terrible I/O
22:54glandiumgps: ^
22:57glandiumpmoore|away: it also seems I was wrong, there *is* a hg-shared and it&#39;s used
22:57glandiumwhich is good. but the terrible I/O makes the subsequent hg update terribly long in some cases
22:58gpsglandium: this came up in this channel yesterday
22:58gpsi /think/
22:58gpsit came up somewhere
22:59glandiumseriously, 1 hour estimated for the checkout
23:00glandium(on another attempt)
23:00gpsbug 1378381 is related
23:00firebot NEW, relops@infra-ops.bugs OpenCloudConfig: avoid long-running format of EBS backed Z: drive
23:00gpsyeah, the ebs volumes can&#39;t even do 1 MB/s if initialized from an AMI
23:00gpsat least for random I/O
23:01gpssequential is /slightly/ better, but not by much
23:01glandiumwhy are they killing instance volumes?
23:01gpsit is only an issue if the EBS volume is initialized from an AMI
23:01gpsfresh EBS volumes are fine
23:02gpsand if you provision them with enough IOPS, they are just as good if not better than an instance volume
23:02glandiumyay, it jumped to 1h40
15 Jul 2017
No messages
Last message: 13 days ago