This project implements three custom RDF extractors based on Stanford's CoreNLP library.
Extracts named entities mentions, with the same output format as Stardog's entities extractor.
Extracts and links entity mentions to existing resources in a knowledge graph. Same output format as Stardog's linker extractor.
Extracts relations between named entity mentions. For example, the sentence:
The Orioles are a professional baseball team based in Baltimore
Will generate three triples:
entity:e435cd0347642bc7d2736155815a54e2 rdfs:label "Orioles"
entity:eb3cdb4e267d28feebb638711f8bd7b1 rdfs:label "Baltimore"
iri:e435cd0347642bc7d2736155815a54e2 relation:org:city_of_headquarters iri:eb3cdb4e267d28feebb638711f8bd7b1
- Download the latest release
- Add the jar to Stardog's classpath:
- Copy it to
server/extor other folder in the server (e.g.,server/dbms) - OR
- Point the environment variable
STARDOG_EXTto the its folder
- Copy it to
- Restart the Stardog server
CoreNLPMentionRDFExtractor,CoreNLPEntityLinkerRDFExtractor, andCoreNLPRelationRDFExtractorwill be available as RDF extractors, accessible through the CLI, API, and HTTP interfaces
For example, using the CLI, if you want to add a document to BITES and extract its entities:
stardog doc put --rdf-extractors CoreNLPMentionRDFExtractor myDatabase document.pdfCoreNLP models can consume large amounts of system memory. If greeted with a GC overhead limit exceeded error when using any of the extractors, increase the amount of memory available to Stardog.
- Tweak
build.gradleto the language of your choice (e.g., change CoreNLP dependency tomodels-spanish) - Run
gradlew clean fatjarfor a single jar, orgradlew clean copyDepsfor individual dependencies - Add files in
build/libsto Stardog's classpath - Restart the Stardog server