tactful_tokenizer

0.0.5last stable release 11 years ago
Complexity Score
Low
Open Issues
0
Dependent Projects
2
Weekly Downloadsglobal
40

License

  • GPL-3.0

    Downloads

    Readme

    = TactfulTokenizer

    {}[http://badge.fury.io/rb/tactful_tokenizer] {}[https://travis-ci.org/zencephalon/Tactful_Tokenizer] {}[https://codeclimate.com/github/zencephalon/Tactful_Tokenizer] {}[https://coveralls.io/r/zencephalon/Tactful_Tokenizer?branch=release]

    TactfulTokenizer is a Ruby library for high quality sentence tokenization. It uses a Naive Bayesian statistical model, and is based on Splitta[http://code.google.com/p/splitta/], but has support for ‘?’ and ‘!’ as well as primitive handling of XHTML markup. Better support for XHTML parsing is coming shortly.

    Additionally supports unicode text tokenization.

    == Usage

    require “tactful_tokenizer” m = TactfulTokenizer::Model.new m.tokenize_text(“Here in the U.S. Senate we prefer to eat our friends. Is it easier that way? Yes. Maybe!”) #=> [“Here in the U.S. Senate we prefer to eat our friends.”, “Is it easier that way?”, “Yes.“, “Maybe!”]

    The input text is expected to consist of paragraphs delimited by line breaks.

    == Installation gem install tactful_tokenizer

    == Author

    Copyright (c) 2010 Matthew Bunday. All rights reserved. Released under the {GNU GPL v3}[http://www.gnu.org/licenses/gpl.html].

    Dependencies

    No runtime dependency information found for this package.

    CVE IssuesActive
    0
    Scorecards Score
    No Data
    Test Coverage
    87.00%
    Follows Semver
    Yes
    Github Stars
    78
    Dependenciestotal
    2
    DependenciesOutdated
    2
    DependenciesDeprecated
    0
    Threat Modelling
    No
    Repo Audits
    No

    Learn how to distribute tactful_tokenizer in your own private RubyGems registry

    gem install tactful_tokenizer
    Processing...
    Done

    4 Releases

    RubyGems on Cloudsmith

    Getting started with RubyGems on Cloudsmith is fast and easy.