excerpt | categories | layout | title | created | permalink | ||
---|---|---|---|---|---|---|---|
<a href="http://github.com/%3Ca%20href="http://www.csc.liv.ac.uk/~azaroth/">Rob" rel="nofollow">http://www.csc.liv.ac.uk/~azaroth/">Rob Sanderson</a>
Using the TeraGrid1 and the SRB DataGrid2, we have sufficient
computational and storage facilities to run normally prohibitively
expensive processing tasks. By integrating text and data mining
tools3[4] within the Cheshire35 information architecture, we can
parse the natural language present in 20 million MARC records (the
University of California’s MELVYL collection) and extract information to
provide to search/retrieve applications. In this talk, we’ll discuss
the results of applying new techniques to ‘old’ data.
|
|
post |
Library Text Mining |
1136872693 |
/conference/2006/sanderson/ |
Using the TeraGrid1 and the SRB DataGrid2, we have sufficient computational and storage facilities to run normally prohibitively expensive processing tasks. By integrating text and data mining tools3[4] within the Cheshire35 information architecture, we can parse the natural language present in 20 million MARC records (the University of California’s MELVYL collection) and extract information to provide to search/retrieve applications. In this talk, we’ll discuss the results of applying new techniques to ‘old’ data.
1: http://www.teragrid.org 2: http://www.sdsc.edu/srb 3: http://www.ailab.si/orange 4: http://www-tsujii.is.s.u-tokyo.ac.jp/ 5: http://www.cheshire3.org/
Rob Sanderson, ([email protected])