Translations: English
Search on Docs:
   
ActionItem Search:

#2569: Memory Leak

Type: BugItem Feature:   Tags: blogoosfero
ScheduledFor: N/A Assigned to: RodrigoSouto Sites:  
Priority: 10 Status: Pending  

Description of the bug

Noosfero is showing performance issues that have patterns associated with memory leaks. After restarting the server everything works fine but after some time the average response time and the memory consuption increases indefinetly.

Using Oink (a memory consumption tool) I monitored http://blogoosfero.cc server for 12h. Here are the results:
-- SUMMARY --
Worst Requests:
1. Fev 04 21:01:19, 261496 KB, search#articles
2. Fev 04 21:02:49, 246956 KB, search#articles
3. Fev 05 00:15:31, 165260 KB, content_viewer#view_page
4. Fev 04 21:27:53, 120112 KB, content_viewer#view_page
5. Fev 05 11:00:30, 95892 KB, content_viewer#view_page
6. Fev 05 11:06:21, 95536 KB, content_viewer#view_page
7. Fev 04 20:09:02, 91064 KB, content_viewer#view_page
8. Fev 05 11:37:53, 90128 KB, profile_editor#edit
9. Fev 05 01:30:32, 79372 KB, content_viewer#view_page

Worst Actions:
6, content_viewer#view_page
2, search#articles
1, profile_editor#edit

Aggregated Totals:
Action                     Max   Mean   Min   Total   Number of requests
content_viewer#view_page   165260   107872   79372   647236   6
search#articles            261496   254226   246956   508452   2
profile_editor#edit        90128   90128   90128   90128   1

The results clearly show that content_viewer#view_page and search#articles are consuming too much memory. Although the content_viewer#view_page shows a bigger total, the search#articles has a bigger consumption per request (mean).

Steps to reproduce

After having these results on the production environment I tried to reproduce the problem on my local machine. I used ab to simulate a load of 50 requests on the search#articles action:
This is ApacheBench, Version 2.3 <$Revision: 1.16 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /search/articles
Document Length:        54158 bytes

Concurrency Level:      1
Time taken for tests:   269.097 seconds
Complete requests:      50
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      2728450 bytes
HTML transferred:       2707900 bytes
Requests per second:    0.19 [#/sec] (mean)
Time per request:       5381.945 [ms] (mean)
Time per request:       5381.945 [ms] (mean, across all concurrent requests)
Transfer rate:          9.90 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  2395 5382 1623.0   5267    8507
Waiting:     2361 5341 1626.3   5244    8482
Total:       2395 5382 1623.0   5267    8507

Percentage of the requests served within a certain time (ms)
  50%   5267
  66%   5997
  75%   6158
  80%   7346
  90%   7783
  95%   8156
  98%   8507
  99%   8507
 100%   8507 (longest request)

The high standard deviation and the percentage distribution confirms the fact that the response time is increasing over each request. Reducing this standard deviation shall lead to solving the memory leak.

Here is the reference of the same test ran over the home page:
This is ApacheBench, Version 2.3 <$Revision: 1.16 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /
Document Length:        32427 bytes

Concurrency Level:      1
Time taken for tests:   122.845 seconds
Complete requests:      50
Failed requests:        49
   (Connect: 0, Receive: 0, Length: 49, Exceptions: 0)
Write errors:           0
Keep-Alive requests:    0
Total transferred:      1643924 bytes
HTML transferred:       1623374 bytes
Requests per second:    0.41 [#/sec] (mean)
Time per request:       2456.896 [ms] (mean)
Time per request:       2456.896 [ms] (mean, across all concurrent requests)
Transfer rate:          13.07 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  1546 2457 360.6   2526    3151
Waiting:     1518 2419 354.5   2455    2940
Total:       1546 2457 360.6   2526    3151

Percentage of the requests served within a certain time (ms)
  50%   2526
  66%   2686
  75%   2745
  80%   2844
  90%   2884
  95%   2926
  98%   3151
  99%   3151
 100%   3151 (longest request)

This is the standard deviation we are looking forward on the search#articles.

I ran the same tests without the solr server running (straight ./script/server instead of ./script/development) and I obtained the same results. This implies that solr has no relation with the memory leak.

After some hazardous investigation, I discovered that (for some yet mysterious reason) this code below improves considerably the memory consumption:
diff --git a/app/models/article.rb b/app/models/article.rb
index 57d3781..a8a6a1b 100644
--- a/app/models/article.rb
+++ b/app/models/article.rb
@@ -329,6 +329,10 @@ class Article < ActiveRecord::Base
     false
   end
 
+  def product?
+    false
+  end
+
   def has_posts?
     false
   end
diff --git a/app/models/product.rb b/app/models/product.rb
index 032c626..cc24a1c 100644
--- a/app/models/product.rb
+++ b/app/models/product.rb
@@ -46,6 +46,10 @@ class Product < ActiveRecord::Base
   include WhiteListFilter
   filter_iframes :description, :whitelist => lambda { enterprise && enterprise.environment && enterprise.environment.trusted_sites_for_ifra
 
+  def product?
+    true
+  end
+
   def name
     self[:name].blank? ? category_name : self[:name]
   end
diff --git a/app/views/search/_image.rhtml b/app/views/search/_image.rhtml
index be85c67..e2be3b5 100644
--- a/app/views/search/_image.rhtml
+++ b/app/views/search/_image.rhtml
@@ -35,14 +35,14 @@
         <div class="search-no-image"><span><%= _('No image') %></span></div>
       <% end %>
     </div>
-  <% elsif image.is_a? Product %>
+  <% elsif image.product? %>
     <% if image.image %>
   <div class="zoomable-image">
     <%= link_to '', product_path(image), :class => "search-image-pic",
         :style => 'background-image: url(%s)'% image.default_image(:thumb) %>
     <%= link_to content_tag(:span, _('Zoom in')), image.image.public_filename,
         :class => 'zoomify-image' %>
   </div>

And here are the results after this improvements:
This is ApacheBench, Version 2.3 <$Revision: 1.16 $>
Copyright 1996 Adam Twiss, Zeus Technology Ltd, http://www.zeustech.net/
Licensed to The Apache Software Foundation, http://www.apache.org/

Benchmarking 127.0.0.1 (be patient).....done


Server Software:        
Server Hostname:        127.0.0.1
Server Port:            3000

Document Path:          /search/articles
Document Length:        48287 bytes

Concurrency Level:      1
Time taken for tests:   185.624 seconds
Complete requests:      50
Failed requests:        0
Write errors:           0
Keep-Alive requests:    0
Total transferred:      2434900 bytes
HTML transferred:       2414350 bytes
Requests per second:    0.27 [#/sec] (mean)
Time per request:       3712.489 [ms] (mean)
Time per request:       3712.489 [ms] (mean, across all concurrent requests)
Transfer rate:          12.81 [Kbytes/sec] received

Connection Times (ms)
              min  mean[+/-sd] median   max
Connect:        0    0   0.0      0       0
Processing:  2569 3712 643.0   3759    5037
Waiting:     2493 3669 663.7   3732    5010
Total:       2569 3712 643.0   3759    5037

Percentage of the requests served within a certain time (ms)
  50%   3759
  66%   4141
  75%   4285
  80%   4364
  90%   4473
  95%   4700
  98%   5037
  99%   5037
 100%   5037 (longest request)

We still have the memory leak pattern happening but much slower now. Investigation goes on…

Testing environment

http://blogoosfero.cc

http://social.stoa.usp.br

-- RodrigoSouto -- 05 Feb 2013

This is a very important research.

We've done some as well. Bu we checked in versions lower than 0.39. What we saw was that there was no memory leak, but a very big memory usage because of the way HTML was being processed: the SQL queries were quite "stupid" in terms of optimization. To have an idea, only generating the categories menu used dozens of SQL requests! Just doing some Postgresql tests, we reduced to 1 single request (more complex) and 3 requests (very simple and straightforward).

We've done the same for the SQL showing the search results contents. Bráulio is now doing the tests for the merge request. We stopped checking the performance after noosfero upgraded to version 0.39 and you discovered performance problems. Anyway, we expect to do the merge request with the SQL optimizations.

I think you should check the table indexes (we were astonished to see that important fields were not indexes!) and the way the HTML is being built through SQL to show the contents.

As I've said several times in the last 5 years, I also think that the whole noosfero database should be analyzed and modelized. For example, the "categories" table is, in my opinion, a monster, congregating 3 extremely different types of data together, making life very hard when trying to do SQL queries. In my opinion, there should be at least 3 different tables: categories (the "recortes"); places (the geolocalized table); and products_services categories. They're semantically completely different, which makes us create columns for one type of category that won't be used for others, and also use this "data" column, which is in my opinion totally absurd as a concept (a hash that cannot be indexed… so it becomes trashy in terms of performance). So why not having 3 different tables???

-- DanielTygel - 06 Feb 2013

Indeed we have some major problems on our queries construction (must of all caused by ActiveRecord and its easy ways to build queries but "crazy" ways to retrieve the data). But I don't think the problems are exactly in the database. The fact that there are too many "crazy" queries to retrieve data do not implies that the database design is necessarily wrong. In our case, I think the problem is on our naive belief that ActiveRecord would retrieve things in the most optimized way (which it rarely does). I think our database design might need some improvements, but I don't think it is to blame by this problem here.

I'm looking forward to your patches improving our data retrieval! =D

-- RodrigoSouto - 13 Feb 2013

This AI is related to AI:2558.

-- RodrigoSouto - 18 Feb 2013
Add comment
You need to login to be able to comment.
 

ActionItemForm edit

Title Memory Leak
ActionItemType BugItem
Priority High
Tags blogoosfero
Feature
Plugin
ResponsibleDevelopers RodrigoSouto
ScheduledFor N/A
AffectsVersion
Status Pending
Ticket SAC:
who cares
Topic revision: r1 - 22 May 2015, UnknownUser

irc Talk with Devs Now!

Copyright © 2007-2019 by the Noosfero contributors
Colivre - Cooperativa de Tecnologias Livres