Skip to content

OutOfMemoryError #1190

Open
Open
@TolstoyDotCom

Description

@TolstoyDotCom

I'm starting a simple spider with about 800 megabytes of heap space, but, after running for a day or so, it throws a series of OutOfMemoryError. Example:

Exception in thread "pool-1-thread-97" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-92" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-99" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-88" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-101" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-100" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-103" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-94" java.lang.OutOfMemoryError: Java heap space
Exception in thread "pool-1-thread-105" java.lang.OutOfMemoryError: Java heap space

Here's a version of the code I'm using. The else branch is a little more complicated but it only involves System.out.println, it doesn't write to a database or similar.

import us.codecraft.webmagic.*;
import us.codecraft.webmagic.processor.*;

public class App implements PageProcessor {
	private Site site = Site.me().setRetryTimes( 3 ).setSleepTime( 1000 );

	@Override
	public void process( Page page ) {
		List<String> links = page.getHtml().links().all();
		page.addTargetRequests( links );
		page.putField( "generator", page.getHtml().xpath( "/html/head/meta[@name=\"generator\"]/@content" ).toString() );
		String generator = page.getResultItems().get( "generator" );

		if ( generator == null ) {
			page.setSkip( true );
		}
		else {
			System.out.println( generator );
		}
	}

	@Override
	public Site getSite() {
		return site;
	}

	public static void main( String[] args ) {
		System.setProperty( "slf4j.internal.verbosity", "WARN" );
		Spider.create( new App() ).addUrl( "https://github.com/code4craft/webmagic/" ).thread( 5 ).run();
	}
}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions