Extreme load times - low disk perf?


(Rafael dos Santos Silva) #7

491ms to render the index is a lot.

Click on the 39 sql link to show what’s the query that’s slow, is the one with the red background.


(James Figgle) #8

This is the one that’s consistently red:

lib/freedom_patches/active_record_base.rb:7:in `exec_sql'
lib/sql_builder.rb:67:in `exec'
lib/sql_builder.rb:104:in `map_exec'
app/models/topic_tracking_state.rb:136:in `report'
app/controllers/application_controller.rb:345:in `preload_current_user_data'
app/controllers/application_controller.rb:199:in `preload_json'
lib/middleware/anonymous_cache.rb:129:in `call'
config/initializers/100-quiet_logger.rb:10:in `call_with_quiet_assets'
config/initializers/100-silence_logger.rb:26:in `call'
lib/middleware/request_tracker.rb:73:in `call'
lib/scheduler/defer.rb:85:in `process_client'
lib/middleware/unicorn_oobgc.rb:95:in `process_client'
SELECT
u.id AS user_id,
topics.id AS topic_id,
topics.created_at,
highest_post_number,
last_read_post_number,
c.id AS category_id,
tu.notification_level
FROM topics
JOIN users u on u.id = 2
JOIN user_stats AS us ON us.user_id = u.id
JOIN user_options AS uo ON uo.user_id = u.id
JOIN categories c ON c.id = topics.category_id
LEFT JOIN topic_users tu ON tu.topic_id = topics.id AND tu.user_id = u.id
WHERE u.id = 2 AND
topics.archetype <> 'private_message' AND
((1=0) OR ("topics"."deleted_at" IS NULL AND topics.created_at >= GREATEST(CASE
WHEN COALESCE(uo.new_topic_duration_minutes, 2880) = -1 THEN u.created_at
WHEN COALESCE(uo.new_topic_duration_minutes, 2880) = -2 THEN COALESCE(u.previous_visit_at,u.created_at)
ELSE ('2016-03-06 14:31:20.986157'::timestamp - INTERVAL '1 MINUTE' * COALESCE(uo.new_topic_duration_minutes, 2880))
END, us.new_since, '2016-03-01 23:25:20.000000') AND tu.last_read_post_number IS NULL AND COALESCE(tu.notification_level, 2) >= 2)) AND
(topics.visible OR u.admin OR u.moderator) AND
topics.deleted_at IS NULL AND
( NOT c.read_restricted OR u.admin OR category_id IN (
SELECT c2.id FROM categories c2
JOIN category_groups cg ON cg.category_id = c2.id
JOIN group_users gu ON gu.user_id = 2 AND cg.group_id = gu.group_id
WHERE c2.read_restricted )
)
AND NOT EXISTS( SELECT 1 FROM category_users cu
WHERE last_read_post_number IS NULL AND
cu.user_id = 2 AND
cu.category_id = topics.category_id AND
cu.notification_level = 0)


UNION ALL

SELECT
u.id AS user_id,
topics.id AS topic_id,
topics.created_at,
highest_post_number,
last_read_post_number,
c.id AS category_id,
tu.notification_level
FROM topics
JOIN users u on u.id = 2
JOIN user_stats AS us ON us.user_id = u.id
JOIN user_options AS uo ON uo.user_id = u.id
JOIN categories c ON c.id = topics.category_id
LEFT JOIN topic_users tu ON tu.topic_id = topics.id AND tu.user_id = u.id
WHERE u.id = 2 AND
topics.archetype <> 'private_message' AND
(("topics"."deleted_at" IS NULL AND tu.last_read_post_number < topics.highest_post_number AND COALESCE(tu.notification_level, 1) >= 2) OR (1=0)) AND
(topics.visible OR u.admin OR u.moderator) AND
topics.deleted_at IS NULL AND
( NOT c.read_restricted OR u.admin OR category_id IN (
SELECT c2.id FROM categories c2
JOIN category_groups cg ON cg.category_id = c2.id
JOIN group_users gu ON gu.user_id = 2 AND cg.group_id = gu.group_id
WHERE c2.read_restricted )
)
AND NOT EXISTS( SELECT 1 FROM category_users cu
WHERE last_read_post_number IS NULL AND
cu.user_id = 2 AND
cu.category_id = topics.category_id AND
cu.notification_level = 0)

frequently the category list queries like this one also show up a lighter red background, but never using as much time as the previous one:

app/models/category_list.rb:48:in `find_relevant_topics'
app/models/category_list.rb:17:in `initialize'
app/controllers/categories_controller.rb:22:in `new'
app/controllers/categories_controller.rb:22:in `index'
lib/middleware/anonymous_cache.rb:129:in `call'
config/initializers/100-quiet_logger.rb:10:in `call_with_quiet_assets'
config/initializers/100-silence_logger.rb:26:in `call'
lib/middleware/request_tracker.rb:73:in `call'
lib/scheduler/defer.rb:85:in `process_client'
lib/middleware/unicorn_oobgc.rb:95:in `process_client'
SELECT "topics".* FROM "topics" WHERE ("topics"."deleted_at" IS NULL) AND "topics"."id" IN (10320, 10322, 10, 8192, 9, 10364, 5291, 10273, 7, 9897, 8897, 4645, 10361, 9372, 9678, 9327, 8195, 5104, 10307, 10311, 10365, 6, 10314, 10319, 9395, 5, 4290, 10303, 3775, 10313, 7782, 4, 10027, 8504, 10116, 10286, 9988, 9851, 9964, 10317, 10316, 7614, 10321, 10324, 5961, 10212, 562, 10302, 8655, 10323, 9314, 4547, 10198, 10290, 10181, 3255, 9695, 9570, 10283, 10292, 5102, 10315, 10312, 996, 4004, 3849, 10141, 10309, 10326, 8652, 2296, 9841, 10304, 10299, 10271, 10325, 10024, 10310, 1276, 10228, 94, 10298, 9959, 10194, 2656)

(Rafael dos Santos Silva) #9

How many users and topics you have?

Looks like your database is slow, maybe a Vacuum can solve this.

Are your disks good?


(James Figgle) #10

3688 topics all time. 1419 users. not a lot. can get a good thousand replies on a topic pretty often though.

I’m not sure on the disk quality, but I think they’re not SSD. seeing a SCSI PERC 5/i on lshw. any way to get the database to cache more in memory? there’s plenty of room for it to never hit the disk.


(Dean Taylor) #11

This query has been reported here:


(Rafael dos Santos Silva) #12

The linux will cache and use all idle memory.

Can you run a VACUUM VERBOSE ANALYZE?

You have a very small topics and users table to get this timings…


(James Figgle) #13

what’s the best way to do that and capture the output?


(Rafael dos Santos Silva) #14

Are you using the docker install?

You need to enter the docker (launcher enter or launcher ssh) and then connect with pqsl. Are you familiar with those concepts?


(James Figgle) #15

sorry yes, I was unsure of the best way to write psql interactive output to file, but just used screen’s copy and paste function:
here’s the results for the topics table, which I assume is the one I should have performed it on:

INFO:  vacuuming "public.topics"
INFO:  scanned index "topics_pkey" to remove 467 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "idx_topics_front_page" to remove 467 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  scanned index "idx_topics_user_id_deleted_at" to remove 467 row versions
DETAIL:  CPU 0.01s/0.02u sec elapsed 0.05 sec.
INFO:  scanned index "index_topics_on_bumped_at" to remove 467 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "index_topics_on_created_at_and_visible" to remove 467 row versions
DETAIL:  CPU 0.01s/0.00u sec elapsed 0.02 sec.
INFO:  scanned index "index_topics_on_id_and_deleted_at" to remove 467 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.01 sec.
INFO:  scanned index "index_topics_on_lower_title" to remove 467 row versions
DETAIL:  CPU 0.00s/0.03u sec elapsed 0.03 sec.
INFO:  scanned index "index_topics_on_pinned_at" to remove 467 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "index_topics_on_pinned_globally" to remove 467 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "topics": removed 467 row versions in 126 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "topics_pkey" now contains 6031 row versions in 38 pages
DETAIL:  192 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_topics_front_page" now contains 6031 row versions in 180 pages
DETAIL:  467 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_topics_user_id_deleted_at" now contains 6012 row versions in 116 pages
DETAIL:  463 index row versions were removed.
37 index pages have been deleted, 36 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_topics_on_bumped_at" now contains 6031 row versions in 35 pages
DETAIL:  30 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_topics_on_created_at_and_visible" now contains 3691 row versions in 56 pages
DETAIL:  433 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_topics_on_id_and_deleted_at" now contains 6031 row versions in 54 pages
DETAIL:  113 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_topics_on_lower_title" now contains 6031 row versions in 200 pages
DETAIL:  467 index row versions were removed.
1 index pages have been deleted, 1 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_topics_on_pinned_at" now contains 5 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_topics_on_pinned_globally" now contains 0 row versions in 2 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "topics": found 576 removable, 5421 nonremovable row versions in 746 out of 925 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 10645 unused item pointers.
0 pages are entirely empty.
CPU 0.09s/0.19u sec elapsed 0.32 sec.
INFO:  vacuuming "pg_toast.pg_toast_40056"
INFO:  index "pg_toast_40056_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_40056": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.topics"
INFO:  "topics": scanned 925 of 925 pages, containing 6031 live rows and 0 dead rows; 6031 rows in sample, 6031 estimated total rows
VACUUM

(James Figgle) #16

In addition to topics, I did categories and category_users because they’re also referenced in the slow query. No improvement to page load afterwards, same query still causing issue.

INFO:  vacuuming "public.categories"
INFO:  scanned index "categories_pkey" to remove 26 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "index_categories_on_email_in" to remove 26 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "index_categories_on_topic_count" to remove 26 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "unique_index_categories_on_name" to remove 26 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "categories": removed 26 row versions in 2 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "categories_pkey" now contains 8 row versions in 2 pages
DETAIL:  26 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_categories_on_email_in" now contains 8 row versions in 2 pages
DETAIL:  26 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "index_categories_on_topic_count" now contains 8 row versions in 2 pages
DETAIL:  26 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "unique_index_categories_on_name" now contains 8 row versions in 2 pages
DETAIL:  26 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "categories": found 5 removable, 8 nonremovable row versions in 2 out of 2 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 85 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  vacuuming "pg_toast.pg_toast_40001"
INFO:  index "pg_toast_40001_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "pg_toast_40001": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.categories"
INFO:  "categories": scanned 2 of 2 pages, containing 8 live rows and 0 dead rows; 8 rows in sample, 8 estimated total rows
VACUUM
INFO:  vacuuming "public.category_users"
INFO:  scanned index "category_users_pkey" to remove 1 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "idx_category_users_u1" to remove 1 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  scanned index "idx_category_users_u2" to remove 1 row versions
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "category_users": removed 1 row versions in 1 pages
DETAIL:  CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "category_users_pkey" now contains 7 row versions in 2 pages
DETAIL:  1 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_category_users_u1" now contains 7 row versions in 2 pages
DETAIL:  1 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  index "idx_category_users_u2" now contains 7 row versions in 2 pages
DETAIL:  1 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  "category_users": found 1 removable, 7 nonremovable row versions in 1 out of 1 pages
DETAIL:  0 dead row versions cannot be removed yet.
There were 0 unused item pointers.
0 pages are entirely empty.
CPU 0.00s/0.00u sec elapsed 0.00 sec.
INFO:  analyzing "public.category_users"
INFO:  "category_users": scanned 1 of 1 pages, containing 7 live rows and 0 dead rows; 7 rows in sample, 7 estimated total rows
VACUUM

(Rafael dos Santos Silva) #17

Okay so this is trickier than expected:

First let’s get some more concrete info:

  1. Whats the CPU model? And what’s your host? VPS?
  • 28 Unicorns are a lot of unicorns, and they may put too much load on your (already slow) DB. Let’s try 10.

  • Let’s test this disk perf:

dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
hdparm -Tt /dev/sda

Also, at work I use a server to host Nginx (web), a server to host rails, redis & sidekiq (app) and a separate server to host PostgreSQL (DB). I think separating data and web containers would be a good start.


#18

He has 14 cores 28 threads. There’s only one xeon that have so many


(Rafael dos Santos Silva) #19

Many slow cores aren’t discourse type, but I don’t think this is whats holding his instance.


#20

It’s true, guy might find higher performance out of a dual-core g3258 running 4ghz+.


(James Figgle) #21

the host is a VM on a mostly idle master. here’s a sample of cpuinfo from the master:

processor	: 15
vendor_id	: GenuineIntel
cpu family	: 6
model		: 15
model name	: Intel(R) Xeon(R) CPU           X7350  @ 2.93GHz
stepping	: 11
microcode	: 187
cpu MHz		: 2933.337
cache size	: 4096 KB
physical id	: 6
siblings	: 4
core id		: 3
cpu cores	: 4
apicid		: 27
initial apicid	: 27
fpu		: yes
fpu_exception	: yes
cpuid level	: 10
wp		: yes
flags		: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx lm constant_tsc arch_perfmon pebs bts rep_good aperfmperf pni dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm dca lahf_lm dts tpr_shadow vnmi flexpriority
bogomips	: 5851.81
root@bbs:~# dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 9.53099 s, 56.3 MB/s

root@bbs:~# hdparm -Tt /dev/sda
/dev/sda:
 Timing cached reads:   1036 MB in  2.00 seconds = 516.76 MB/sec
 Timing buffered disk reads:  74 MB in  3.05 seconds =  24.26 MB/sec

bringing down unicorns to 10 seems to have helped some. i get some slow queries, but a few less of them.


#22

If you have 4 virtual cores allocated to the VM drop unicorns down to 8 or less. Is this ESXi? Be sure to properly configure the VM to utilize the cores in full. Why do you show 4 physical cores on the host, but mention 14 in the OP?


(Rafael dos Santos Silva) #23

This is a very bad disk perf :sadpanda:

Here are some Digital Ocean from today:

root@ubuntu-512mb-nyc3-01:~# dd bs=1M count=512 if=/dev/zero of=test conv=fdatasync
512+0 records in
512+0 records out
536870912 bytes (537 MB) copied, 1.09289 s, 491 MB/s

root@bbs:~# hdparm -Tt /dev/sda
/dev/sda:
 Timing cached reads:   15264MB in  2.00 seconds = 7637.886 MB/sec
 Timing buffered disk reads:  2340MB in  3.00 seconds =  779.43MB/sec

And some local pc with SSD:

536870912 bytes (537 MB) copied, 1.52775 s, 351 MB/s
Timing cached reads:   11434 MB in  2.00 seconds = 5720.61 MB/sec
Timing buffered disk reads:  760 MB in  3.00 seconds = 253.09 MB/sec

(James Figgle) #24

Wow that sucks. Guess I’m going to have to rack a box with some SSDs.


(Rafael dos Santos Silva) #25

This is a 10x or more performance gap.

Tomorrow I’ll compare with my non-SSD enterprise grade storage to get some numbers too!


(Rafael dos Santos Silva) #26

Here are the non-SSD entreprise storage results:

 Timing cached reads:   10562 MB in  2.00 seconds = 5283.99 MB/sec
 Timing buffered disk reads:  1012 MB in  3.00 seconds = 336.82 MB/sec