--- Log opened vr jun 26 00:00:01 2015 00:02 -!- apuimedo (Antoni Segura Puimedon): has joined #vdsm 00:04 -!- mmirecki: has quit [Ping timeout: 276 seconds] 00:13 -!- ibarkan_ (Ido Barkan): has joined #vdsm 00:15 -!- ibarkan (purple): has joined #vdsm 00:17 -!- adahms (Andrew Dahms): has joined #vdsm 00:48 -!- mpolednik1: has quit [Ping timeout: 246 seconds] 01:24 -!- ybronhei (purple): has joined #vdsm 01:28 -!- nsoffer: has quit [Ping timeout: 276 seconds] 01:35 -!- #vdsm ybronhei: has quit [Ping timeout: 276 seconds] 01:41 -!- apuimedo: has quit [Ping timeout: 264 seconds] 02:44 -!- mpolednik1 (mpolednik): has joined #vdsm 02:55 -!- mpolednik1: has quit [Ping timeout: 265 seconds] 03:08 -!- bala (purple): has joined #vdsm 03:25 -!- bala: has quit [Ping timeout: 265 seconds] 03:38 -!- bala (purple): has joined #vdsm 03:48 -!- #vdsm dougsland: has quit [Ping timeout: 256 seconds] 03:51 -!- mpolednik1 (mpolednik): has joined #vdsm 03:55 -!- dougsland (Douglas): has joined #vdsm 03:56 -!- mpolednik1: has quit [Ping timeout: 244 seconds] 04:07 -!- gpadgett: has quit [Quit: Leaving] 04:07 -!- bala: has quit [Read error: Connection reset by peer] 04:30 -!- adahms: has quit [Quit: Computer on sleep...] 04:33 -!- adahms (Andrew Dahms): has joined #vdsm 04:52 -!- #vdsm dougsland: has quit [Ping timeout: 264 seconds] 05:54 -!- #vdsm dyasny_: has quit [Ping timeout: 248 seconds] 06:39 -!- sshnaidm: has quit [Ping timeout: 276 seconds] 06:48 -!- mmirecki (Marcin Mirecki): has joined #vdsm 06:59 -!- ndarshan (Darshan n): has joined #vdsm 07:22 -!- Madhu_ (Madhu_): has joined #vdsm 07:53 -!- mpolednik1 (mpolednik): has joined #vdsm 07:58 -!- timothy (Timothy Asir): has joined #vdsm 07:58 -!- mpolednik1: has quit [Ping timeout: 272 seconds] 07:58 -!- sbonazzo (purple): has joined #vdsm 07:58 -!- timothy is now known as Guest74192 08:07 -!- _tim_07 (Tim): has joined #vdsm 08:12 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 08:34 -!- ybronhei (purple): has joined #vdsm 09:02 -!- _tim_07: has quit [Ping timeout: 264 seconds] 09:03 -!- fabiand (Fabian Deutsch): has joined #vdsm 09:05 -!- bala (purple): has joined #vdsm 09:13 -!- adahms: has quit [Quit: Leaving] 09:50 -!- pkliczew (Piotr Kliczewski): has joined #vdsm 09:55 -!- mpolednik1 (mpolednik): has joined #vdsm 09:59 -!- mpolednik1: has quit [Ping timeout: 246 seconds] 10:10 -!- fromani (Francesco Romani): has joined #vdsm 10:20 < pkliczew> fromani, hi 10:46 < fromani> pkliczew: hi 11:04 -!- nsoffer (Nir Soffer): has joined #vdsm 11:10 < evilissimo1> hey fromani :-) 11:10 < fromani> hi evilissimo1 ! 11:10 < evilissimo1> fromani: I have updated that comment on the filtering patch 11:11 < evilissimo1> https://gerrit.ovirt.org/#/c/36949/ 11:14 < evilissimo1> fromani: and also I just commented on https://gerrit.ovirt.org/#/c/42570/3 11:20 -!- Guest74192: has quit [Ping timeout: 272 seconds] 11:26 -!- mpolednik (mpolednik): has joined #vdsm 11:35 < fromani> evilissimo1: ok! will have another look ASAP 11:49 < evilissimo1> fromani: thanks 12:00 -!- Madhu_: has quit [Remote host closed the connection] 12:00 -!- Madhu_ (Madhu_): has joined #vdsm 12:40 -!- mode/#vdsm: by ChanServ 12:40 -!- danken (purple): has joined #vdsm 12:42 < fromani> danken: hello! we had yet another followup discussion about https://gerrit.ovirt.org/#/c/42479/4 - do you have some time to chat about it? (it takes waaaaaay longer otherwise...) 12:43 <@danken> for me its very simple. is abarlev happy about this? 12:44 < fromani> danken: last time we spoke he was not exactly excited... 12:46 <@danken> fromani: you know what, I'll leave this politics to mskrivanek. The patch conforms to other hard-requirement of vdsm 12:47 <@danken> and frankly, you could drop https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=commit;h=b062834e458adc1ae3c546c91b469db78aec46b8 12:47 <@danken> but don't do that... When Engine is smarter and can control what to install on a host it could be useful. 12:48 < mskrivanek> danken: well, he's not, but i don't see any way out. To me crippling user experience and requiring few weeks of additional development is not the price I want to pay for something so insignificant 12:48 < nsoffer> fromani, can you review again https://gerrit.ovirt.org/40712? 12:49 < fromani> nsoffer: sure, added to my list 12:49 < mskrivanek> danken: and indeed that patch can remain. We anyway should plan a cleanup once we're far behind F20... 12:59 <@danken> mskrivanek: but I would LOVE to finally have custom properties per cluster. If Engine had that, making sure that ovirt-host-deploy installs a subset of requirements per host would have been soooo simple 13:01 <@danken> fromani: marking your patch as verified is a formality that I cannot ignore... 13:02 < fromani> danken: I'm _actually_ veryfing right now 13:02 < fromani> I mean installing the actual rpm 13:07 < evilissimo1> danken: regarding the suicide patch I made, do you have any other suggestions? 13:08 < evilissimo1> danken: starting go awall on that thread, because we're getting spammed by errors for no obivous reason isn't gonna help us either that much, however indeed as francesco said we could end up being in a restarting loop then 13:09 < evilissimo1> because the issue seems to be caused somehow by qemu, at least that's how I understand that bug 13:09 < evilissimo1> but then again, why epoll is not closing those fd's is also a thing I don't really understand 13:10 < evilissimo1> or at least to stop track those 13:10 < pkliczew> danken, hi 13:11 < mskrivanek> danken: well, it would not really help us. we'd need proper reporting, sort of a policy on cluster to enforce or not, non operational reason, additional warning before you do 3.5->3.6 to say that all hosts are going to be non-operational, REST API..... 13:11 <@danken> evilissimo1: when does these errors pop up? how often does it happen? Does it really need a fix? 13:12 <@danken> mskrivanek: ok ok, it is more complex than a simple install 13:12 < evilissimo1> well, the reason for this patch is that we have a bug report in which the vmlistener thread starts sucking the life out of one core at 100% 13:12 < mskrivanek> danken: and we would still have a usability issue that before actually using the feature once you do host deplay you need to know *somehow*, that you have to remove the device, save vm, re-add the device 13:12 < evilissimo1> danken: https://bugzilla.redhat.com/show_bug.cgi?id=1226911 13:13 < evilissimo1> the thing is, that we have had removed that fd already 13:13 <@danken> mskrivanek: sorry, I do not understand your recent issue. remove of what device? 13:14 < mskrivanek> danken: on the engine side you would be required to remove the virtio-console device from vm properties (to get rid of the <3.6 format), save vm, open again, add it back (to get teh 3.6 format with the right specparam) 13:14 <@danken> how reproducible is this bug? why is it for 3.5.4? 13:14 < evilissimo1> danken: I couldn't think of any reproducibility I think no one really knows what happened, that why we tried to add some defense 13:15 < evilissimo1> that's* 13:15 < mskrivanek> danken: as it can't be done on upgrade, it must happen only when you update to 3.6 cluster level with all hosts updated.....so even more complicated;-) 13:15 <@danken> mskrivanek: this could be done automatically, on upgrade to clusterLevel 3.6, isnt it? 13:15 <@danken> mskrivanek: ah, you just said that 13:16 <@danken> evilissimo1: "suicide is a permanent solution for a temporary problem" say the signs over tall bridges 13:17 < evilissimo1> well the logs didn't indicate an end there 13:17 <@danken> evilissimo1: unless this bug affects gazillions of customers, we should understand it better before killing ourselves 13:18 < evilissimo1> I wish I would, but I can't see how we would get more information from there 13:29 < mskrivanek> danken: I think we should eliminate the busy loop either way. we shouldn't spin even in case of errors 13:35 -!- firemanxbr (Marcelo Barbosa): has joined #vdsm 13:36 -!- dougsland (Douglas): has joined #vdsm 13:38 -!- apahim: has quit [Ping timeout: 264 seconds] 13:40 <@danken> mskrivanek: We can add a short sleep on error (which is ugly, but not as ugly as dying) 13:41 <@danken> evilissimo1: but we should know that the problem is really worse that vdsm dying due to qemu behavior (security! security!) 13:43 < evilissimo1> not sure what you are referring to with 'security' right now 13:43 < evilissimo1> danken: ^ 13:44 < evilissimo1> and a sleep... 13:44 < evilissimo1> well, it's not solving our issue of having those errors 13:45 < evilissimo1> we're just going to slow down the processing and give the no more time. If I would slow down the whole things, what's the gain? 13:45 < evilissimo1> if+ 13:46 < evilissimo1> ok, now that doesn't make any sense 13:47 -!- sshnaidm: has quit [Ping timeout: 276 seconds] 13:47 < evilissimo1> if we're just slowing down the processing we won't be able to get out of this for ages. The attempt to fix it was the patch before, in the hopes that we'll stop getting more messages. However, I have a suspicion. And that's that we're getting a HUGE list of those errors on the same fds 13:48 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 13:48 < evilissimo1> I wonder if we should place those events when received into a set, to only process them once per epoll reply 13:49 -!- bala: has quit [Quit: Leaving.] 13:54 -!- nsoffer: has quit [Ping timeout: 246 seconds] 13:54 -!- apahim (Amador Pahim): has joined #vdsm 14:01 < fromani> evilissimo1: danken: AFAIU the problem is that work piles up and VDSM is less and less capable to keep up, and that snowballs eventually into VDSM eating 100% of CPU and doing no useful work. (aka: consumer WAY slower than producer). If so, sleep won't help 14:02 < fromani> actually, if I'm correct, sleep will make things worse... 14:02 < evilissimo1> in that case, yeah 14:11 -!- sshnaidm: has quit [Ping timeout: 276 seconds] 14:11 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 14:37 -!- fabiand: has quit [Ping timeout: 256 seconds] 14:39 -!- ndarshan: has quit [Quit: Leaving] 14:50 -!- fabiand (Fabian Deutsch): has joined #vdsm 15:01 -!- dkuznets (Dima Kuznetsov): has joined #vdsm 15:08 -!- danken is now known as danken_afk 15:09 <@danken_afk> fromani: when we see fd errors, the damage has already been done, so a sleep after is not so harmful 15:15 -!- dyasny_ (Dan Yasny): has joined #vdsm 15:18 -!- dkuznets: has quit [Ping timeout: 246 seconds] 15:28 -!- amarchuk_ (Anton Marchukov): has joined #vdsm 15:28 -!- mskrivanek1 (mskrivan): has joined #vdsm 15:28 -!- evilissimo2 (Vinzenz Feenstra): has joined #vdsm 15:28 -!- amarchuk: has quit [Read error: Connection reset by peer] 15:28 -!- timothy_ (Timothy Asir): has joined #vdsm 15:28 -!- evilissimo1: has quit [Read error: Connection reset by peer] 15:31 -!- msivak_: has quit [Ping timeout: 252 seconds] 15:31 -!- mskrivanek: has quit [Ping timeout: 272 seconds] 15:32 -!- msivak (Martin Sivak): has joined #vdsm 15:32 -!- amarchuk_: has quit [Ping timeout: 256 seconds] 15:32 -!- mskrivanek1: has quit [Ping timeout: 244 seconds] 15:33 -!- evilissimo2: has quit [Ping timeout: 252 seconds] 15:33 -!- amarchuk_ (Anton Marchukov): has joined #vdsm 15:33 -!- mskrivanek1 (mskrivan): has joined #vdsm 15:33 -!- evilissimo2 (Vinzenz Feenstra): has joined #vdsm 15:35 -!- sshnaidm: has quit [Ping timeout: 276 seconds] 15:40 -!- hchiramm: has quit [Ping timeout: 244 seconds] 15:40 -!- timothy_: has quit [Ping timeout: 272 seconds] 15:54 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 15:58 -!- bala (purple): has joined #vdsm 16:07 -!- mmirecki: has quit [Ping timeout: 256 seconds] 16:15 -!- Madhu_: has quit [Quit: Madhu_] 16:29 -!- sshnaidm: has quit [Ping timeout: 276 seconds] 16:34 -!- mskrivanek1 is now known as mskrivanek1_away 16:37 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 16:42 -!- sshnaidm: has quit [Ping timeout: 276 seconds] 16:47 -!- gpadgett (Greg Padgett): has joined #vdsm 16:58 -!- Madhu_ (Madhu_): has joined #vdsm 16:58 -!- Madhu_: has quit [Client Quit] 16:59 -!- Madhu_ (Madhu_): has joined #vdsm 17:05 < fromani> danken_afk: please check this trivial one-liner: https://gerrit.ovirt.org/#/c/42917/1 17:09 -!- Madhu_: has quit [Quit: Madhu_] 17:19 -!- Madhu_ (Madhu_): has joined #vdsm 17:22 -!- sbonazzo: has quit [Quit: Leaving.] 17:24 -!- fabiand: has quit [Quit: Verlassend] 17:24 -!- fabiand (Fabian Deutsch): has joined #vdsm 17:25 -!- fabiand: has quit [Remote host closed the connection] 17:27 -!- pkliczew: has quit [Ping timeout: 246 seconds] 17:28 -!- ibarkan: has quit [Ping timeout: 246 seconds] 17:28 -!- ibarkan_: has quit [Ping timeout: 265 seconds] 17:31 -!- Madhu_: has quit [Quit: Madhu_] 17:37 -!- mpolednik: has quit [Ping timeout: 265 seconds] 17:42 -!- ibarkan_ (Ido Barkan): has joined #vdsm 17:43 -!- sshnaidm: has quit [Client Quit] 17:43 -!- ibarkan (purple): has joined #vdsm 17:44 -!- timothy_ (Timothy Asir): has joined #vdsm 18:04 -!- mpolednik (mpolednik): has joined #vdsm 18:09 -!- mpolednik: has quit [Ping timeout: 276 seconds] 18:10 -!- fromani: has quit [Quit: Leaving] 18:21 -!- mpolednik (mpolednik): has joined #vdsm 18:31 -!- gshereme: has quit [Ping timeout: 248 seconds] 18:53 -!- bala: has quit [Read error: Connection reset by peer] 19:14 -!- timothy_: has quit [Remote host closed the connection] 19:24 -!- apahim: has quit [Ping timeout: 272 seconds] 19:28 -!- mpolednik: has quit [Ping timeout: 256 seconds] 19:40 -!- apahim (Amador Pahim): has joined #vdsm 20:11 -!- gshereme (Greg Sheremeta): has joined #vdsm 20:17 -!- #vdsm ybronhei: has quit [Ping timeout: 250 seconds] 20:56 -!- ybronhei (purple): has joined #vdsm 21:01 -!- apahim: has quit [Quit: Leaving] 21:09 -!- nsoffer (Nir Soffer): has joined #vdsm 21:09 -!- #vdsm ybronhei: has quit [Ping timeout: 244 seconds] 21:30 -!- gshereme: has quit [Ping timeout: 248 seconds] 21:46 -!- Madhu_ (Madhu_): has joined #vdsm 21:52 -!- mpolednik (mpolednik): has joined #vdsm 22:26 -!- gshereme (Greg Sheremeta): has joined #vdsm 22:33 -!- firemanxbr: has quit [Quit: Leaving] 22:41 -!- nsoffer: has quit [Ping timeout: 264 seconds] 22:47 -!- Madhu_: has quit [Ping timeout: 246 seconds] 22:47 -!- Madhu_ (Madhu_): has joined #vdsm --- Log closed za jun 27 00:00:03 2015