The LSA_Toolkit is a robust scalable efficient library that provides the necessary capabilities for performing Latent Semantic Analysis tasks on a collection of data. The toolkit is implemented as a library with a C++ API allowing it to be easily integrated into your existing applications or front ended with software customized to your specific needs. Ultimately the LSA_Toolkit will span all of the LSA processing phases from tokenization to final query processing and analysis. It is designed to be open ended and configurable as to resource usage. Future release versions will support parallel grid computation, running on 1 to n processors as requested.
Instructions on this page cover:
This version of the LSA_Toolkit performs the core computations for Latent Semantic Analysis - taking the contents of a sparse matrix and performing the truncated singular value decomposition to produce an LSA Space which represents the semantic clustering of the information in the input sparse matrix. Use of the LSA_Toolkit at this point still requires knowledge of the necessary precursor steps and weighting calculations that must be done to produce the input matrix, as well as the ability to use the produced LSA Space for analysis. Low level query processing functions have been introduced in this version, as well as storage and loading of SparseMatrix and LSASpace objects. Other aspects of the toolkit have been improved that are not externally visible to the end user, but move the entire implementation forward toward the planned functionality for future release versions.
The LSA_Toolkit is packaged as a library to be utilized by your specific application code. The package consists of the following items:
You will be issued a site specific license file - LSA_Toolkit.lic
After these instructions are completed, you should be able to build against the LSA_Toolkit library. See the example.make file for a simple build using the library.
We are using the RLM license manager from Reprise software for license key management.
If you will be using the license server to support floating licenses your license key in LSA_Toolkit.lic will be locked to the machine which will run the license server. The LSA_Toolkit.lic file may be placed in the same directory that you start the rlm license server from or you may place it in another location defined in the rlm license environment (See the RLM End User Manual for more information).
You must also put a .lic file (like serverloc.lic) in the same directory as the executable you build which will identify where the license server may be found on your network. This file can consist of a single line:
HOST hostname hostid [port]
An example serverloc.lic file would be:
HOST colorado 00a0cc3c9ba5 5053
You may obtain the hostid for your machine by running the rlmutil program:
rlmutil rlmhostid
The port number defaults to 5053 unless you configure it differently in the HOST or ISV line of the LSA_Toolkit.lic file the rlm license server is using.
In version 0.2 typical usage would be:
See the example.cpp file in the download package for an example program using the LSA_Toolkit.
Support is available from Small Bear Technologies, Inc. Please contact us for support options.