Downloads
Readme
= TactfulTokenizer
{}[http://badge.fury.io/rb/tactful_tokenizer] {}[https://travis-ci.org/zencephalon/Tactful_Tokenizer] {}[https://codeclimate.com/github/zencephalon/Tactful_Tokenizer] {}[https://coveralls.io/r/zencephalon/Tactful_Tokenizer?branch=release]
TactfulTokenizer is a Ruby library for high quality sentence tokenization. It uses a Naive Bayesian statistical model, and is based on Splitta[http://code.google.com/p/splitta/], but has support for ‘?’ and ‘!’ as well as primitive handling of XHTML markup. Better support for XHTML parsing is coming shortly.
Additionally supports unicode text tokenization.
== Usage
require “tactful_tokenizer” m = TactfulTokenizer::Model.new m.tokenize_text(“Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? Yes. Maybe!”) #=> [“Here in the U.S. Senate we prefer to eat our friends.”, “Is it easier that way?”, “Yes.“, “Maybe!”]
The input text is expected to consist of paragraphs delimited by line breaks.
== Installation gem install tactful_tokenizer
== Author
Copyright (c) 2010 Matthew Bunday. All rights reserved. Released under the {GNU GPL v3}[http://www.gnu.org/licenses/gpl.html].