--- Log opened wo sep 14 00:00:07 2016 00:07 -!- rmatinata: has quit [Quit: This computer has gone to sleep] 01:01 -!- kaminohana (Kami no Hana): has joined #vdsm 05:17 -!- kaminohana: has quit [Quit: Computer on sleep...] 05:25 -!- kaminohana (Kami no Hana): has joined #vdsm 05:53 -!- tim (Tim): has joined #vdsm 06:12 -!- tim: has quit [Ping timeout: 240 seconds] 06:57 -!- kaminohana: has quit [Ping timeout: 265 seconds] 07:13 -!- tim (Tim): has joined #vdsm 07:28 -!- edwardh (purple): has joined #vdsm 07:41 -!- mode/#vdsm: by ChanServ 07:41 -!- danken (purple): has joined #vdsm 08:21 -!- sbonazzo (purple): has joined #vdsm 08:22 -!- irit (purple): has joined #vdsm 08:27 -!- ishaby_ (Idan Shaby): has joined #vdsm 08:27 -!- ishaby: has quit [Read error: Connection reset by peer] 08:38 -!- mzamazal (Milan Zamazal): has joined #vdsm 08:39 -!- fromani: has quit [Quit: Leaving] 08:57 -!- fromani (Francesco Romani): has joined #vdsm 08:58 -!- saggi (purple): has joined #vdsm 08:59 -!- irit: has quit [Quit: Leaving.] 09:03 -!- mskrivanek_away is now known as mskrivanek 09:05 -!- pkliczew (Piotr Kliczewski): has joined #vdsm 09:05 -!- saggi: has quit [Ping timeout: 255 seconds] 09:13 -!- irit (purple): has joined #vdsm 09:19 -!- pkliczew: has quit [Ping timeout: 276 seconds] 09:20 -!- irit: has quit [Quit: Leaving.] 09:21 -!- pkliczew (Piotr Kliczewski): has joined #vdsm 09:22 -!- phoracek (phoracek): has joined #vdsm 09:29 -!- fsimonce (Federico): has joined #vdsm 09:50 -!- mmirecki (Marcin Mirecki): has joined #vdsm 10:05 -!- fabiand (Fabian Deutsch): has joined #vdsm 10:31 -!- rmohr (Roman Mohr): has joined #vdsm 10:42 -!- ishaby_: has quit [Ping timeout: 265 seconds] 10:50 -!- ishaby_ (Idan Shaby): has joined #vdsm 12:00 -!- rmohr: has quit [Remote host closed the connection] 12:00 -!- rmohr (Roman Mohr): has joined #vdsm 12:13 -!- mliu (purple): has joined #vdsm 12:14 -!- wu_ng (wu_ng): has joined #vdsm 12:32 -!- fromani: has quit [Read error: No route to host] 12:33 -!- fromani (Francesco Romani): has joined #vdsm 12:34 -!- tim: has quit [Ping timeout: 244 seconds] 13:00 -!- apahim: has quit [Ping timeout: 244 seconds] 13:12 -!- mliu: has left #vdsm [] 13:15 -!- apahim (Amador Pahim): has joined #vdsm 13:15 -!- danken: has quit [Ping timeout: 244 seconds] 13:40 -!- osvoboda (purple): has joined #vdsm 13:48 -!- mode/#vdsm: by ChanServ 13:48 -!- danken (purple): has joined #vdsm 13:52 -!- phoracek: has quit [Ping timeout: 260 seconds] 14:00 -!- osvoboda: has quit [Ping timeout: 250 seconds] 14:01 -!- phoracek (phoracek): has joined #vdsm 14:02 -!- wu_ng: has quit [Ping timeout: 265 seconds] 14:07 -!- apahim: has quit [Ping timeout: 244 seconds] 14:08 -!- apahim (Amador Pahim): has joined #vdsm 14:16 -!- ndarshan: has quit [Quit: Leaving] 14:18 -!- osvoboda (purple): has joined #vdsm 14:28 -!- nsoffer (Nir Soffer): has joined #vdsm 15:08 -!- osvoboda: has quit [Ping timeout: 244 seconds] 15:23 -!- mskrivanek is now known as mskrivanek_away 15:25 -!- phoracek: has quit [Ping timeout: 240 seconds] 15:27 -!- phoracek (phoracek): has joined #vdsm 15:29 -!- phoracek: has quit [Client Quit] 15:30 -!- wu_ng (wu_ng): has joined #vdsm 15:37 -!- mskrivanek_away is now known as mskrivanek 15:38 -!- danken: has quit [Quit: Leaving.] 15:46 -!- irit (purple): has joined #vdsm 15:56 -!- gshereme: has quit [Quit: Leaving] 16:21 -!- osvoboda (purple): has joined #vdsm 16:22 -!- rmohr: has quit [Quit: rmohr] 16:22 -!- rmohr (Roman Mohr): has joined #vdsm 16:24 -!- wu_ng: has quit [Read error: Connection reset by peer] 16:26 -!- wu_ng (wu_ng): has joined #vdsm 16:27 -!- rmatinata: has quit [Quit: This computer has gone to sleep] 16:32 -!- tim (Tim): has joined #vdsm 16:36 -!- ishaby_: has quit [Ping timeout: 260 seconds] 16:38 -!- jewnix (David Twersky): has joined #vdsm 16:45 -!- tim: has quit [Ping timeout: 265 seconds] 16:46 -!- osvoboda: has quit [Quit: Leaving.] 16:52 -!- pkliczew: has quit [Ping timeout: 265 seconds] 16:57 -!- irit: has quit [Ping timeout: 240 seconds] 16:57 -!- nsoffer: has quit [Ping timeout: 265 seconds] 17:01 -!- phoracek (phoracek): has joined #vdsm 17:15 -!- bazulay: has quit [Quit: Leaving.] 17:16 -!- mzamazal: has quit [Remote host closed the connection] 17:27 -!- mskrivanek is now known as mskrivanek_away 17:29 -!- sbonazzo: has quit [Quit: Leaving.] 17:44 -!- tim__ (Tim): has joined #vdsm 17:56 -!- ishaby_ (Idan Shaby): has joined #vdsm 18:16 -!- fabiand: has quit [Ping timeout: 250 seconds] 18:22 -!- ishaby_: has quit [Ping timeout: 240 seconds] 18:25 -!- ishaby_ (Idan Shaby): has joined #vdsm 18:27 -!- fromani: has quit [Remote host closed the connection] 18:29 -!- fabiand (Fabian Deutsch): has joined #vdsm 18:34 -!- ishaby_: has quit [Ping timeout: 276 seconds] 18:40 -!- irit (purple): has joined #vdsm 18:45 -!- #vdsm ircuser-1: has quit [Quit: because] 18:52 -!- ircuser-1 (Johnny Von Neumann): has joined #vdsm 18:52 -!- irit: has quit [Ping timeout: 248 seconds] 18:58 -!- irit (purple): has joined #vdsm 19:03 -!- mmirecki: has quit [Ping timeout: 250 seconds] 19:07 -!- irit: has quit [Ping timeout: 276 seconds] 19:10 -!- ishaby: has quit [Ping timeout: 265 seconds] 19:19 -!- nsoffer (Nir Soffer): has joined #vdsm 19:24 -!- irit (purple): has joined #vdsm 19:27 -!- #vdsm dyasny: has quit [Ping timeout: 244 seconds] 19:34 -!- dyasny (Dan Yasny): has joined #vdsm 19:42 -!- edwardh: has quit [Ping timeout: 250 seconds] 19:44 -!- irit: has quit [Ping timeout: 244 seconds] 19:50 -!- irit (purple): has joined #vdsm 19:56 -!- fabiand: has quit [Ping timeout: 250 seconds] 19:57 -!- irit: has left #vdsm [] 20:06 < alitke> nsoffer, I am verifying the jobs patches now. Have you had a look? 20:06 < nsoffer> not yet, will look soon 20:09 -!- tim__: has quit [Ping timeout: 244 seconds] 20:18 < nsoffer> alitke, see https://gerrit.ovirt.org/#/c/63937/1/lib/vdsm/jobs.py@a166 20:19 < alitke> ok 20:22 < alitke> nsoffer, any other comments before I resend? 20:22 < nsoffer> not for this patch, I'm looking in the next patches now 20:22 < nsoffer> alitke, ^^^ 20:24 < nsoffer> alitke, see: https://gerrit.ovirt.org/#/c/62002/8/lib/vdsm/jobs.py@a121 20:29 < nsoffer> alitke, https://gerrit.ovirt.org/#/c/63711/3/lib/vdsm/define.py@91 20:33 < alitke> ok, done with those. 20:44 < nsoffer> alitke, this one is tricky https://gerrit.ovirt.org/#/c/63712/4/lib/vdsm/jobs.py@134 20:44 -!- rmohr: has quit [Ping timeout: 244 seconds] 20:44 < nsoffer> alitke, taking the lock during abort works, but can cause a deadlock 20:45 < alitke> ugh. 20:46 < nsoffer> alitke, the best would be if abort would not waiting until the job finish, but simply setting a flag 20:46 < alitke> ? 20:46 < nsoffer> Then we can take the lock while we call it, but we need a way to wake up the the operation thread 20:47 < nsoffer> lets see what fromani thinks about it 20:47 < nsoffer> abort should be fast typically, you kill a running process 20:47 < alitke> seems we can merge all of the others before it. I don't want to block spdm work on every possible problem with a jobs manager. 20:47 < nsoffer> only in pathological cases it can be long, e.g. process in D state 20:48 < alitke> We can hold that race fixing patch until later. 20:48 < nsoffer> alitke, we cannot merge job management code which is not thread safe 20:48 < alitke> It's already not thread safe! 20:48 < nsoffer> alitke, but you want to start using it for real code now 20:49 < alitke> It is already being used for real code. 20:49 < nsoffer> alitke, which code? 20:49 -!- edwardh (purple): has joined #vdsm 20:49 < alitke> create_volume and copy_data 20:49 < alitke> it's merged. 20:49 < alitke> warts and all. 20:49 < alitke> Let's merge the earlier patches and keep this last one open. 20:50 < alitke> We'll be in a better state than we were before. 20:50 < nsoffer> we need to get ack from fromani/piotr 20:50 < nsoffer> send mail about the patches and ask them to review 20:51 < nsoffer> I'll check what can be merge now, anything up to autodelete should be safe 20:53 < nsoffer> The last patch is +1 for me, we can improve locking later, current way has no races 20:54 -!- edwardh: has quit [Quit: Leaving.] 20:58 < nsoffer> alitke, see also this: https://gerrit.ovirt.org/#/c/63712/5/tests/jobsTests.py@203 20:58 < nsoffer> should be easy to fix, we have lot of examples in current tests 20:59 -!- edwardh (purple): has joined #vdsm 21:00 < nsoffer> alitke, added few comments how to avoid flaky tests: https://gerrit.ovirt.org/#/c/63712/5/tests/jobsTests.py 21:08 < nsoffer> alitke, I fixed the commit message in https://gerrit.ovirt.org/#/c/63937/ 21:08 < nsoffer> alitke, if it is ok, I can merge it 21:32 -!- edwardh: has quit [Ping timeout: 244 seconds] 21:50 < alitke> nsoffer, looks good. 21:50 < nsoffer> alitke, merged 21:51 < alitke> thanks! 22:10 < nsoffer> alitke, can you look at https://gerrit.ovirt.org/#/q/status:open+project:vdsm+branch:master+topic:storage-cleanup ? 22:11 < nsoffer> alitke, trivial thread renames, very easy review 22:13 < alitke> in a few minutes. Working on something else at the moment. 22:16 -!- phoracek: has quit [Ping timeout: 248 seconds] 22:26 < nsoffer> alitke, thanks 22:29 < nsoffer> alitke, https://gerrit.ovirt.org/#/c/63967/ is nasty :-) 22:30 < alitke> yes 22:30 < nsoffer> alitke, I'm not sure disabling autorelease is safe, this code is way too complex 22:30 < alitke> That's why I hate magic overcomplicated things like resourceManager. 22:30 < alitke> blockVolume uses it. 22:30 < alitke> see llprepare 22:30 < nsoffer> alitke, better keep a reference like old code if we can 22:31 < alitke> of course we can keep a reference but I thought this way is nicer. 22:31 < nsoffer> alitke, we have the locks in the context manager while the locks are held, right? 22:32 < nsoffer> so a the ResourceManagerLock can hold a reference 22:32 < nsoffer> when we drop it from the lock list, the resource will be released 22:32 < alitke> Yeah, certainly. But that is magic behavior too. 22:32 < nsoffer> alitke, this is much safer then disabling auto release 22:33 < nsoffer> alitke, I don't know what will happen if you disable auto release 22:33 < alitke> same thing that happens in blockVolume. That part of the code is actually pretty simple, 22:33 < nsoffer> alitke, the only clean way to fix this is to drop this monster, replace it with the simpler lock manager you posted last year 22:34 < nsoffer> alitke, I'll look at it next week 22:34 < nsoffer> alitke, but this is very strange, we use this same api without saving the reference in many places 22:34 < alitke> I don't want to bother with a new lock manager. I'll be reviewing that code till Christmas 22:35 < nsoffer> alitke, I don't believe all the places do not have locks 22:35 < nsoffer> alitke, for example sp.py line 1627 22:36 < alitke> we use with alot 22:36 < alitke> uses with 22:36 < alitke> I think the context manager must keep the reference internally 22:36 < nsoffer> alitke, the locks are released inside that with? 22:37 < alitke> no, but when you leave the context the ref is destroyed so autorelease kicks in. 22:37 < nsoffer> alitke, right, the code works with with, but not when you call without context 22:37 < nsoffer> alitke, I never saw such worse infrastructure 22:38 < alitke> yes 22:38 < alitke> It's trying to be way too clever and it's awful. 22:38 < nsoffer> alitke, complex like hell, confusing api which does not work sliently 22:40 < nsoffer> alitke, for 4.1 we can use it, I will not use this after that 22:40 < alitke> +1 22:42 < nsoffer> alitke, autorelease uses __del__, was a source for leak few month ago 22:42 < alitke> yep 22:42 < nsoffer> alitke, and it starts a new thread in __del__ - nothing can be more wrong 22:42 < alitke> heh 22:44 < nsoffer> alitke, how did you find this? tests failed randomly? 22:44 < alitke> Nope. The HostJobs api showed a ReleaseError 22:44 < alitke> Since it happened at the very end of the job my functional tests still passed. 22:44 < alitke> Need to fix those up a bit. 22:45 < alitke> Also we saw the release errors in the log. 22:47 < nsoffer> alitke, at least our new infra works and reveal the issues 22:47 < alitke> yes 22:50 < alitke> approved and commented on your cleanup series... 22:50 < alitke> see comments for an open question about short-ids 22:51 < nsoffer> alitke, replied 22:52 < nsoffer> alitke, do you think it will be better to have the full uuid in the python thread name (seen in vdsm logs) 22:52 < nsoffer> alitke, and the short name in the system tools? 22:52 < nsoffer> alitke, do you use htop? 22:53 < nsoffer> in vdsm we will see names like monitor/95430c8c-bbfc-4811-84fc-4ea0351d32fe 22:53 < nsoffer> alitke, upgrade/95430c8c-bbfc-4811-84fc-4ea0351d32fe 22:54 < nsoffer> alitke, etc 22:54 < alitke> Short ones are ok. I guess it will not likely conflict but if it does it could be very annoying. 22:55 < nsoffer> maybe decrease the chance by using monitor/95430c8cbbfc4811 22:56 < nsoffer> or monitor/95430c8c-bbfc-4811 22:57 < nsoffer> alitke, 64 bit are unlikely to collapse 22:58 -!- fabiand (Fabian Deutsch): has joined #vdsm 22:59 < nsoffer> alitke, ? 22:59 < alitke> nah. It was more about what happens if they do. 23:00 < alitke> I prefer the names without the dashes in them 23:00 < nsoffer> alitke, ok, if we see a clash, we can easily avoid it 23:01 < nsoffer> alitke, hopefully we will not find it in user logs, when you don't have any other clue about the thread/task etc. 23:01 < alitke> ok. Gotta run to dinner. 23:01 < nsoffer> alitke, night 23:01 < alitke> merge if you want, otherwise you can change it. 23:02 < nsoffer> I'll ask Dan about this 23:08 < nsoffer> I''l post patch for this instead 23:08 -!- rmatinata: has quit [Quit: This computer has gone to sleep] --- Log closed do sep 15 00:00:09 2016