A Latent Semantic Analysis system automatically builds a mapping of meaning based on the information that it is given (see How does LSA Work). This mapping of meaning, or semantic space, can then be used in various ways to examine and gain understanding of a body of data.
There are five primary operations you can perform with an LSA space:
- Retrieval – retrieve items of information based on meaning, ranked according to semantic relevance
- Clustering – identify clusters of meaning within a single semantic space
- Comparison – compare items and clusters within a single semantic space as well as comparison of multiple spaces
- Interpretation – identify where a new item maps in a given semantic space or in multiple spaces as well as merging information from multiple semantic spaces
- Completion – use a semantic space to suggest next word or missing word elements in an information item
A Simple Illustration
Given these five primary operations, imagine some of the ways you could use Latent Semantic Analysis to evaluate the content of a library for instance:
- Searching – You want to find items in the library that discuss a certain topic but want to be sure items of similar meaning are retrieved even if they don’t contain the search keywords.
(eg: You search for “railroad trains” and get back relevant items discussing “locomotives” that don’t necessarily include the original search terms)
- Indexing of multilingual collections – If the library consists of items in different languages, you would like to be able to search in one language but retrieve relevant items even if they are in a different language – but without having to translate all of the items into a single language beforehand.
- Content analysis – You could to evaluate the content of the library and determine what major subjects are covered by the items in it. Is there a specific concentration or clusters of subject matter in the library, or is the content widely dispersed? Are there particular items that don’t fit with the main body of the library collection (outliers)?
- Evaluation of “fit” into an existing collection – If you are considering adding a new item to the library, does it fit in with the subject matter you already have or is it an outlier? Is it a near duplicate of something that is already in the library collection?
- Comparison of multiple collections – Considering two library collections, do they overlap in content or are they complimentary? If we consider representations of our library at different times, how has the content of our library changed over time?
- Correction of scanned documents – When adding scanned documents to your library using Optical Character Recognition (OCR) errors are frequently introduced by the OCR processing. You would like to be able to recognize these errors and correct them by supplying the correct word chosen based on the context of the data item rather than introduce “dirty” data items into our collection.
Existing Applications of LSA
– just to name a few…
Many times LSA is thought of in terms of its application to Information Retrieval, but there are so many other possibilities to consider. LSA has already been applied in existing applications in the following areas:
- Contextual Ad Placement
- Recommender Sysetems
- Cross Language Information Retrieval
- Automated Grading
- Computer guided training
- Legal discovery/e-discovery services
- Cognitive Science Research
- Repairing/cleaning data
- Job posting/resume matching
- Non-textual applications
- Personality profiles/compatibility analysis
The potential applications for LSA technology are limitless. Our clients are continually finding new ways to apply this technology.
Our focus at Small Bear Technologies is delivering the core technology in a scalable, reliable, easy to use package so that you can apply it in your domain of interest. The combination of your application knowledge and our LSA expertise creates the potential for powerful solutions.