23.03.11 09:30, Adarsh Sharma написав(ла): 
 Thanks Chetan, here is the output of your updated query :
explain  select distinct(p.crawled_page_id) from page_content p where NOT EXISTS (select 1 from  clause2 c where c.source_id = p.crawled_page_id);
                                       QUERY PLAN                                       
 ---------------------------------------------------------------------------------------
  HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8)
    ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8)
          Hash Cond: (p.crawled_page_id = c.source_id)
          ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8)
          ->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4)
                ->  Seq Scan on clause2 c  (cost=0.00..771182.96 rows=31876196 width=4)
 (6 rows)
 And my explain analyze output is :
                                                       QUERY PLAN                                                                 
 --------------------------------------------------------------------------------------------------------------------------------------------
  HashAggregate  (cost=1516749.47..1520576.06 rows=382659 width=8) (actual time=56666.181..56669.270 rows=72 loops=1)
    ->  Hash Anti Join  (cost=1294152.41..1515791.80 rows=383071 width=8) (actual time=45740.789..56665.816 rows=74 loops=1)
          Hash Cond: (p.crawled_page_id = c.source_id)
          ->  Seq Scan on page_content p  (cost=0.00..87132.17 rows=428817 width=8) (actual time=0.012..715.915 rows=428467 loops=1)
          ->  Hash  (cost=771182.96..771182.96 rows=31876196 width=4) (actual time=45310.524..45310.524 rows=31853083 loops=1)
                ->  Seq Scan on clause2 c  (cost=0.00..771182.96 rows=31876196 width=4) (actual time=0.055..23408.884 rows=31853083 loops=1)
  Total runtime: 56687.660 ms
 (7 rows)
 But Is there is any option to tune it further and one more thing output rows varies from 6 to 7.
 You need an index on source_id to prevent seq scan, like the next:
 CREATE INDEX idx_clause2_source_id
   ON clause2
   (source_id);
Best regards, Vitalii Tymchyshyn