Word Analyzer

Analyze the first words of sentences and create a frequency table. The number of distinct words is supposed to be around 2000 according to this paper.

* The program takes text file and performs an analysis of sentences

Rules for detecting the first word of a sentence.

1. First word in a paragraph (how do you find a paragraph in a document independent format)?
2. First word in a sentence (any word following - period, ? or ! etc.)

Components:

* A simple text parser
* First word detector
* Associative table (automatically updates count for duplicate words)

Follow on projects:

1. Analyze n words at the beginning of a sentence
2. Analyze sentence lengths
3. Domain specific implementations - for different fields/industries

Applications:

* Concept extraction
* Fact database for AIML
* Analysis of writing
* Signature detection

Link Summary:

http://www.idealliance.org/papers/extreme/proceedings/html/2007/Freese01/EML2007Freese01.html

Posted in nlp, parser, size:small on August 27th, 2007 by Dorai | |

Leave a reply