Word Analyzer
Analyze the first words of sentences and create a frequency table. The number of distinct words is supposed to be around 2000 according to this paper.
* The program takes text file and performs an analysis of sentences
Rules for detecting the first word of a sentence.
1. First word in a paragraph (how do you find a paragraph in a document independent format)?
2. First word in a sentence (any word following - period, ? or ! etc.)
Components:
* A simple text parser
* First word detector
* Associative table (automatically updates count for duplicate words)
Follow on projects:
1. Analyze n words at the beginning of a sentence
2. Analyze sentence lengths
3. Domain specific implementations - for different fields/industries
Applications:
* Concept extraction
* Fact database for AIML
* Analysis of writing
* Signature detection
Link Summary:
http://www.idealliance.org/papers/extreme/proceedings/html/2007/Freese01/EML2007Freese01.html
Posted in nlp, parser, size:small on August 27th, 2007 by Dorai | | 0 Comments
