Skip to content

Not seeing huge performance gain - could be my query? #23

Closed
@pholly

Description

@pholly

Postgresql 9.6.4
Windows Server 2016 with 32GB ram / SSD

160K hits, Rum index query time: 80+ seconds

When querying a table partition with 500K rows for a search term that returns 160K hits I am not seeing a large performance increase using rum tsvector <=> vs gin ts_rank. tsvector column includes weights. Each document is pretty large, similar to Wikipedia. They are all legal documents.

Judging by explain (analyze, buffers) output, looks like majority of time is reading and not sorting. Could this be why I am not seeing performance gain? Also, output shows there's still a bitmap heap scan performed when searching the RUM index.

I clear shared buffers and restart Postgresql (thanks https://stackoverflow.com/a/43186594/812610) so there's nothing in memory.

Appreciate any help and thanks for putting RUM on github!

RUM tsvector column: tsv_natural_rum

Index: CREATE INDEX code_docs_fl_tsv_natural_rum_idx on code_docs_fl using rum (tsv_natural_rum rum_tsvector_ops)

Query:

SELECT id, node_id, 
            ts_headline('mcc_natural_config', title, query, 'highlightall=true') title, 
	    ts_headline('mcc_natural_config', content, query, 'maxfragments=1') fragment, 
            ancestors, product_id, product_name, client_name, state_abbr, 
 	    0 AS full_count, --window function for getting full count here but removed it for simplicity
            tsv_natural_rum <=> query AS rank 
FROM code_docs_fl, 
           (select ts_rewrite(plainto_tsquery('mcc_natural_config',  'code'), 'SELECT target, sub FROM 
                aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target') query) query
WHERE tsv_natural_rum @@ query
ORDER BY tsv_natural_rum <=> query
LIMIT 10 OFFSET 0

Result:

Count: 159,613
Limit  (cost=9279.80..9284.95 rows=10 width=435) (actual time=81780.797..82289.912 rows=10 loops=1)
  Buffers: shared hit=314429 read=190532
  ->  Result  (cost=9279.80..10576.57 rows=2518 width=435) (actual time=81780.786..82289.862 rows=10 loops=1)
        Buffers: shared hit=314429 read=190532
        ->  Sort  (cost=9279.80..9286.10 rows=2518 width=829) (actual time=81764.315..81764.391 rows=10 loops=1)
              Sort Key: ((code_docs_fl.tsv_natural_rum <=> (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text))))
              Sort Method: top-N heapsort  Memory: 35kB
              Buffers: shared hit=314405 read=190481
              ->  Nested Loop  (cost=43.52..9225.39 rows=2518 width=829) (actual time=174.173..80714.142 rows=159613 loops=1)
                    Buffers: shared hit=314402 read=190481
                    ->  Result  (cost=0.00..0.26 rows=1 width=32) (actual time=2.055..2.057 rows=1 loops=1)
                          Buffers: shared hit=56 read=4
                    ->  Bitmap Heap Scan on code_docs_fl  (cost=43.52..9193.65 rows=2518 width=821) (actual time=163.860..18193.512 rows=159613 loops=1)
                          Recheck Cond: (tsv_natural_rum @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                          Heap Blocks: exact=68933
                          Buffers: shared hit=4 read=69085
                          ->  Bitmap Index Scan on code_docs_fl_tsv_natural_rum_idx  (cost=0.00..42.89 rows=2518 width=0) (actual time=125.595..125.595 rows=159613 loops=1)
                                Index Cond: (tsv_natural_rum @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                                Buffers: shared read=156
Planning time: 276.156 ms
Execution time: 82299.866 ms

Gin TS_Rank_CD tsvector column: tsv_natural (exact same contents as tsv_natural_rum)

Index: gin index on column tsv_natural

Query:

SELECT id, node_id, 
            ts_headline('mcc_natural_config', title, query, 'highlightall=true') title, 
	    ts_headline('mcc_natural_config', content, query, 'maxfragments=1') fragment, 
            ancestors, product_id, product_name, client_name, state_abbr, 
 	    0 AS full_count, --window function for getting full count here but removed it for simplicity
            ts_rank_cd(tsv_natural, query, 4|1) AS rank 
FROM code_docs_fl, 
           (select ts_rewrite(plainto_tsquery('mcc_natural_config',  'code'), 'SELECT target, sub FROM 
                aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target') query) query
WHERE tsv_natural @@ query
ORDER BY rank DESC
LIMIT 10 OFFSET 0

Result:

Count: 159,613
Limit  (cost=10959.80..10964.95 rows=10 width=435) (actual time=104597.190..104597.762 rows=10 loops=1)
  Buffers: shared hit=415488 read=255421 dirtied=11754 written=3715
  ->  Result  (cost=10959.80..12256.57 rows=2518 width=435) (actual time=104597.180..104597.728 rows=10 loops=1)
        Buffers: shared hit=415488 read=255421 dirtied=11754 written=3715
        ->  Sort  (cost=10959.80..10966.10 rows=2518 width=829) (actual time=104597.046..104597.061 rows=10 loops=1)
              Sort Key: (ts_rank_cd(code_docs_fl.tsv_natural, (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)), 5)) DESC
              Sort Method: top-N heapsort  Memory: 27kB
              Buffers: shared hit=415488 read=255421 dirtied=11754 written=3715
              ->  Nested Loop  (cost=1723.52..10905.39 rows=2518 width=829) (actual time=535.603..103393.362 rows=159613 loops=1)
                    Buffers: shared hit=415485 read=255421 dirtied=11754 written=3715
                    ->  Result  (cost=0.00..0.26 rows=1 width=32) (actual time=1.971..1.972 rows=1 loops=1)
                          Buffers: shared hit=55 read=5
                    ->  Bitmap Heap Scan on code_docs_fl  (cost=1723.52..10873.65 rows=2518 width=976) (actual time=525.526..25184.048 rows=159613 loops=1)
                          Recheck Cond: (tsv_natural @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                          Heap Blocks: exact=119510
                          Buffers: shared hit=1 read=120034 dirtied=7722 written=1293
                          ->  Bitmap Index Scan on code_docs_fl_tsv_natural_idx  (cost=0.00..1722.89 rows=2518 width=0) (actual time=450.870..450.870 rows=322447 loops=1)
                                Index Cond: (tsv_natural @@ (ts_rewrite('''code'''::tsquery, 'SELECT target, sub FROM aliases_natural WHERE to_tsquery(''mcc_natural_config'',  ''code'') @> target'::text)))
                                Buffers: shared hit=1 read=524
Planning time: 255.426 ms
Execution time: 104601.713 ms

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions