Skip to content

Latest commit

 

History

History
35 lines (32 loc) · 1.79 KB

2006-01-09-library-text-mining.md

File metadata and controls

35 lines (32 loc) · 1.79 KB
excerpt categories layout title created permalink
<a href="http://github.com/%3Ca%20href="http://www.csc.liv.ac.uk/~azaroth/">Rob" rel="nofollow">http://www.csc.liv.ac.uk/~azaroth/">Rob Sanderson</a> Using the TeraGrid1 and the SRB DataGrid2, we have sufficient computational and storage facilities to run normally prohibitively expensive processing tasks. By integrating text and data mining tools3[4] within the Cheshire35 information architecture, we can parse the natural language present in 20 million MARC records (the University of California’s MELVYL collection) and extract information to provide to search/retrieve applications. In this talk, we’ll discuss the results of applying new techniques to ‘old’ data.
conferences
code4lib 2006
post
Library Text Mining
1136872693
/conference/2006/sanderson/

Rob Sanderson

Using the TeraGrid1 and the SRB DataGrid2, we have sufficient computational and storage facilities to run normally prohibitively expensive processing tasks. By integrating text and data mining tools3[4] within the Cheshire35 information architecture, we can parse the natural language present in 20 million MARC records (the University of California’s MELVYL collection) and extract information to provide to search/retrieve applications. In this talk, we’ll discuss the results of applying new techniques to ‘old’ data.

1: http://www.teragrid.org 2: http://www.sdsc.edu/srb 3: http://www.ailab.si/orange 4: http://www-tsujii.is.s.u-tokyo.ac.jp/ 5: http://www.cheshire3.org/

Rob Sanderson, ([email protected])