--- Log opened ma okt 19 00:00:35 2015 00:15 -!- apahim: has quit [Ping timeout: 256 seconds] 00:42 -!- ybronhei (purple): has joined #vdsm 00:54 -!- #vdsm ybronhei: has quit [Ping timeout: 264 seconds] 01:40 -!- ircuser-1 (ircuser 1): has joined #vdsm 02:37 -!- nsoffer: has quit [Ping timeout: 244 seconds] 05:19 -!- bala (purple): has joined #vdsm 05:31 -!- bala1 (purple): has joined #vdsm 05:34 -!- gpadgett: has quit [Quit: Leaving] 05:52 -!- Humble (Humble Chirammal): has joined #vdsm 06:04 -!- bala: has quit [Ping timeout: 240 seconds] 06:32 -!- bala1: has quit [Remote host closed the connection] 07:02 -!- shubhendu (Shubhendu Tripathi): has joined #vdsm 07:07 -!- ndarshan (Darshan n): has joined #vdsm 07:23 -!- ibarkan (Ido Barkan): has joined #vdsm 07:24 -!- ybronhei (purple): has joined #vdsm 07:42 -!- Humble: has quit [Read error: Connection reset by peer] 07:52 -!- sbonazzo (purple): has joined #vdsm 07:53 -!- ishaby (Idan Shaby): has joined #vdsm 07:57 -!- Humble (Humble Chirammal): has joined #vdsm 08:04 -!- rmohr: has quit [Remote host closed the connection] 08:14 -!- nsoffer (Nir Soffer): has joined #vdsm 08:19 -!- adahms: has quit [Quit: Leaving] 08:20 -!- fabiand (Fabian Deutsch): has joined #vdsm 08:23 -!- derez_ (Daniel Erez): has joined #vdsm 08:25 -!- bala (purple): has joined #vdsm 08:26 -!- mskrivanek_away is now known as mskrivanek 08:31 -!- hchiramm (Humble Chirammal): has joined #vdsm 08:35 -!- bala: has quit [Ping timeout: 255 seconds] 08:36 -!- rmohr: has quit [Quit: rmohr] 08:39 -!- #vdsm ybronhei: has quit [Quit: Leaving.] 08:39 -!- ybronhei (purple): has joined #vdsm 08:43 -!- pkliczew (Piotr Kliczewski): has joined #vdsm 08:53 -!- fromani (Francesco Romani): has joined #vdsm 09:03 -!- nsoffer: has quit [Ping timeout: 265 seconds] 09:06 -!- mode/#vdsm: by ChanServ 09:06 -!- danken1 (purple): has joined #vdsm 09:18 -!- rmohr: has quit [Remote host closed the connection] 09:21 -!- amarchuk: has quit [Ping timeout: 240 seconds] 09:22 -!- ykaplan (Yeela Kaplan): has joined #vdsm 09:26 -!- mmirecki (Marcin Mirecki): has joined #vdsm 09:32 -!- fsimonce (Federico): has joined #vdsm 09:47 -!- oved (ovedo): has joined #vdsm 10:07 < pkliczew> fromani, hi 10:07 < fromani> pkliczew: hi 10:08 < pkliczew> fromani, I tried to analyze the heap but Dowser crashed vdsm 10:09 < pkliczew> fromani, when I asked for all the dicts it grew heap to 40 GB and nicely crashed 10:09 < fromani> pkliczew: I see. 10:09 < pkliczew> fromani, it seems that this tool is good for small memory footprints 10:10 < fromani> pkliczew: yep. Looks like we're too large 10:10 < pkliczew> fromani, it makes the heap less stable when it runs the deltas are +/- 10MB 10:10 < pkliczew> fromani, where as without it there is +/- 2MB 10:11 < pkliczew> fromani, so at this stage we know that number of dicts, tuples and lists grew over time 10:11 < pkliczew> fromani, and the heap increased by ~90MB over the weekend 10:11 < fromani> pkliczew: yes, we didn't gained much more info 10:11 < pkliczew> fromani, and with crash we lost the data 10:12 < fromani> pkliczew: yes. Not even a coredump 10:12 < fromani> (just checked) 10:12 < pkliczew> fromani, :/ 10:12 < fromani> *so* frustrating 10:12 < pkliczew> fromani, the most significant change was for list 10:12 < pkliczew> it is 10:13 < pkliczew> fromani, number of instances changed from 8k to 44k 10:13 < fromani> pkliczew: I agree. Furthermore I seen increase in instance method, which together with __dict__ grow may suggest an increase in python objects 10:14 < pkliczew> fromani, instancemethod was 16k and it ended up being 17k 10:14 < fromani> pkliczew: ok, then maybe not :) 10:14 < pkliczew> fromani, at least according to objgraph logs 10:15 < fromani> pkliczew: we can run some analysis on objgraph logs to learn about the grow pattern, but still, this gives us little 10:16 < pkliczew> fromani, yes, I think we need more fine grain analysis because our tools seems not to scale 10:16 < fromani> pkliczew: yes. Unfortunately they just can't handle the load 10:17 < pkliczew> fromani, is there any frequently used code which uses lists extensively? 10:17 < fromani> pkliczew: I'll review the virt sampling code in 3,5,4 10:17 < fromani> 3.5.4* 10:18 < fromani> pkliczew: but the problem is that primitive objects (list, tuple, dicts) are used extensively in libraries, modules and so forth 10:18 < fromani> so may not be a direct usage 10:18 < pkliczew> hmm, right 10:19 < pkliczew> fromani, I will take a look at infra code 10:23 -!- #vdsm ybronhei: has quit [Ping timeout: 244 seconds] 10:27 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 10:30 < pkliczew> fromani, I sent an update for everyone to understand current situation 10:30 < fromani> pkliczew: reading 10:31 -!- sshnaidm: has quit [Ping timeout: 256 seconds] 10:34 -!- danken1: has quit [Ping timeout: 256 seconds] 10:35 < pkliczew> fromani, hmm, I am looking at the list again and something struck me 10:35 < fromani> pkliczew: shoot 10:36 < pkliczew> fromani, we focused on the most # or instances but when I am looking at in again I see objects that were not there and appeared on the list 10:37 < fromani> pkliczew: like InterfaceSample? 10:37 < pkliczew> fromani, and those objects are: Text, NodeList, InterfaceSample, Token, Attr and Element 10:38 < pkliczew> fromani, is it part of xml parsing code? 10:38 < fromani> pkliczew: InterfaceSample is sampling (as name suggests :) ) the rest is xml parsing code, yes 10:39 < pkliczew> fromani, maybe we leak somewhere around parsing or sampling (maybe both) 10:39 < fromani> pkliczew: do you have numbers about the grow of these objects? 10:40 -!- amarchuk (Anton Marchukov): has joined #vdsm 10:40 < pkliczew> fromani, nope, I only configured the most 20 10:41 < fromani> pkliczew: but still, xzgrepping the numbers I can track an increase of InterfaceSample over the weekend 10:41 < pkliczew> fromani, but we can use dowser to see it now (after the crash) 10:41 < fromani> pkliczew: furthermore, I *do not* have network configured on my own scale test (my poor tiny dns server can't handle the load) 10:41 -!- bazulay (purple): has joined #vdsm 10:41 < fromani> so I may well have missed them 10:41 < pkliczew> I understand 10:41 < fromani> pkliczew: sure, this is a good lead eventually 10:42 < pkliczew> fromani, I do not see InterfaceSample on the UI at all 10:42 < pkliczew> fromani, checking for xml types now 10:46 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 10:51 -!- mode/#vdsm: by ChanServ 10:51 -!- danken1 (purple): has joined #vdsm 10:56 -!- bala (purple): has joined #vdsm 11:13 -!- bala: has quit [Remote host closed the connection] 11:15 -!- bala (purple): has joined #vdsm 11:34 -!- Humble: has quit [Ping timeout: 250 seconds] 11:35 -!- hchiramm: has quit [Ping timeout: 268 seconds] 11:43 < fromani> pkliczew: ok, in theory we should have len(network_link) * sampling_window_size InterfaceSamples, which lead us to 62 * 5 = 310 11:44 < fromani> pkliczew: so something's wrong there 11:44 < pkliczew> fromani, I am in the meeting, give me a sec 11:44 < fromani> pkliczew: np 11:46 -!- mskrivanek is now known as mskrivanek_away 11:47 -!- Humble (Humble Chirammal): has joined #vdsm 11:47 -!- hchiramm (Humble Chirammal): has joined #vdsm 12:04 -!- firemanxbr (Marcelo Barbosa): has joined #vdsm 12:13 < pkliczew> fromani, where can I see this code? 12:13 < fromani> pkliczew: vdsm/virt/sampling 12:13 < fromani> pkliczew: vdsm/virt/sampling.py 12:13 < pkliczew> fromani, looking 12:29 -!- ykaplan: has quit [Ping timeout: 252 seconds] 12:37 -!- apahim (Amador Pahim): has joined #vdsm 12:43 -!- apahim: has quit [Ping timeout: 265 seconds] 12:44 < pkliczew> fromani, I have an idea 12:44 < fromani> pkliczew: shoot 12:44 < pkliczew> fromani, can we speed up the cycle of sampling 12:45 < pkliczew> fromani, that we lead for faster leaking it this is the case 12:45 < pkliczew> fromani, what do you think? 12:49 -!- fromani: has quit [Ping timeout: 268 seconds] 12:52 -!- fromani (Francesco Romani): has joined #vdsm 12:55 -!- fromani: has quit [Client Quit] 12:55 -!- fromani (Francesco Romani): has joined #vdsm 12:55 < fromani> pkliczew: sorry, got disconnected... can you please repeat? 12:55 -!- apahim (Amador Pahim): has joined #vdsm 12:56 < pkliczew> fromani, can we speed up the cycle of sampling 12:56 < pkliczew> fromani, that we lead for faster leaking it this is the case 12:56 < pkliczew> fromani, what do you think? 12:56 < fromani> pkliczew: worth a shot. Trying. 12:56 < pkliczew> fromani, cool 12:57 -!- mskrivanek_away is now known as mskrivanek 13:03 -!- ykaplan (Yeela Kaplan): has joined #vdsm 13:04 -!- firemanxbr: has quit [Quit: Leaving] 13:08 < fromani> who can verify? https://gerrit.ovirt.org/#/c/47421/ we need it into the build 13:41 -!- ykaplan: has quit [Ping timeout: 260 seconds] 13:41 -!- tim (Tim): has joined #vdsm 13:42 -!- ykaplan (Yeela Kaplan): has joined #vdsm 13:44 < pkliczew> fromani, I started to look at the logs generated over the weekend 13:44 < pkliczew> fromani, when vdsm crashed there was this: http://ur1.ca/o356v 13:45 < fromani> pkliczew: looking 13:45 < fromani> pkliczew: seems clean shutdown. 13:45 < pkliczew> fromani, Eldad claims that there are 57 vms but the log says 73 13:45 < pkliczew> fromani, it is but I am not sure about the number of threads 13:45 < fromani> pkliczew: 73 threads still to be terminated :) 13:46 < pkliczew> fromani, yes 13:46 < pkliczew> fromani, is it thread per vm? 13:46 < fromani> VDSM with 57 VMs should have at least 57*2 + 60 threads 13:46 < fromani> pkliczew: in 3.5.x *two* threads per VM plus ~50 baseline 13:46 < pkliczew> fromani, oh ok 13:47 < pkliczew> fromani, browsing thru the log I do not see other thread being killed 13:59 -!- sbonazzo is now known as sbonazzo|mtg 14:12 -!- ndarshan: has quit [Quit: Leaving] 14:19 -!- gshereme (Greg Sheremeta): has joined #vdsm 14:33 < pkliczew> fromani, can you please check your email 14:34 < pkliczew> fromani, there is an email about similar issue which happened during testing of my ssl patch for python 14:34 < pkliczew> fromani, ? 14:35 < fromani> pkliczew: I just logged in the other host 14:35 < pkliczew> fromani, ok 14:50 -!- dyasny (Dan Yasny): has joined #vdsm 15:04 -!- bazulay: has quit [Quit: Leaving.] 15:07 -!- gpadgett (Greg Padgett): has joined #vdsm 15:08 -!- sshnaidm: has quit [Ping timeout: 256 seconds] 15:10 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 15:11 -!- nsoffer (Nir Soffer): has joined #vdsm 15:24 -!- ibarkan: has quit [Ping timeout: 260 seconds] 15:34 -!- hchiramm: has quit [Ping timeout: 252 seconds] 15:34 -!- Humble: has quit [Ping timeout: 252 seconds] 15:51 -!- Humble (Humble Chirammal): has joined #vdsm 15:51 -!- hchiramm (Humble Chirammal): has joined #vdsm 16:00 -!- acanan (Aharon Canan): has joined #vdsm 16:05 -!- shubhendu: has quit [Ping timeout: 244 seconds] 16:07 -!- sshnaidm: has quit [Ping timeout: 256 seconds] 16:21 -!- oved: has quit [Ping timeout: 255 seconds] 16:28 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 16:33 < nsoffer> danken1, https://gerrit.ovirt.org/47078 is urgent, please look 16:45 -!- sshnaidm: has quit [Ping timeout: 256 seconds] 16:47 -!- sbonazzo|mtg: has quit [Quit: Leaving.] 16:57 -!- sshnaidm (Sergey (Sagi) Shnaidman): has joined #vdsm 17:14 -!- rmohr: has quit [Ping timeout: 252 seconds] 17:14 -!- acanan: has quit [Quit: Leaving] 17:15 -!- derez_: has quit [Quit: Leaving] 17:16 -!- mskrivanek is now known as mskrivanek_away 17:20 -!- bala: has quit [Ping timeout: 260 seconds] 17:30 -!- ishaby: has quit [Ping timeout: 260 seconds] 17:31 -!- derez_ (Daniel Erez): has joined #vdsm 17:39 -!- derez_: has quit [Quit: Leaving] 17:39 -!- mmirecki: has quit [Ping timeout: 244 seconds] 17:44 -!- danken1: has quit [Ping timeout: 256 seconds] 17:46 -!- tim: has quit [Ping timeout: 260 seconds] 17:56 -!- ykaplan: has quit [Remote host closed the connection] 18:00 -!- pkliczew: has quit [Ping timeout: 272 seconds] 18:29 -!- gshereme: has quit [Ping timeout: 264 seconds] 18:29 -!- mskrivanek_away: has quit [Ping timeout: 264 seconds] 18:41 -!- mskrivanek_away (mskrivan): has joined #vdsm 18:45 -!- gshereme (Greg Sheremeta): has joined #vdsm 19:24 -!- sshnaidm: has quit [Ping timeout: 256 seconds] 19:44 -!- nsoffer: has quit [Ping timeout: 252 seconds] 20:28 -!- nsoffer (Nir Soffer): has joined #vdsm 21:02 -!- hchiramm: has quit [Quit: Leaving] 21:18 -!- nsoffer: has quit [Ping timeout: 272 seconds] 21:18 -!- ybronhei (purple): has joined #vdsm 21:32 < fabiand> hey 21:32 < fabiand> anybody got a hint how I can run vdsm-tool with pdb? 21:32 < fabiand> ybronhei, maybe? 21:32 -!- fromani: has quit [Quit: Leaving] 21:37 < vxitch> hello, vdsmd won't start on one of my two hosts and my cluster is down as a result. 21:37 < vxitch> The host OS is RHEL 7. I am managing vdsmd through systemctl 21:37 < vxitch> here is the result of running `systemctl start vdsmd` http://hastebin.com/oseyoqozog.vbs 21:38 < vxitch> Any bit of help is greatly appreciated :) 22:29 -!- fabiand: has quit [Quit: Verlassend] 23:00 -!- #vdsm ybronhei: has quit [Quit: Leaving.] 23:01 -!- ybronhei (purple): has joined #vdsm 23:17 < vxitch> I find the following in supervdsm.log http://hastebin.com/fucahilaka.vhdl --- Log closed di okt 20 00:00:37 2015