jschardet

3.1.4last stable release 6 months ago
Complexity Score
Medium
Open Issues
34
Dependent Projects
197
Weekly Downloadsglobal
301,621

License

  • LGPL-2.1+

    Downloads

    Readme

    JsChardet

    Port of python’s chardet (https://github.com/chardet/chardet).

    License

    LGPL

    How To Use It

    Node

    npm install jschardet
    
    var jschardet = require("jschardet")
    
    // "àíàçã" in UTF-8
    jschardet.detect("\xc3\xa0\xc3\xad\xc3\xa0\xc3\xa7\xc3\xa3")
    // { encoding: "UTF-8", confidence: 0.9690625 }
    
    // "次常用國字標準字體表" in Big5
    jschardet.detect("\xa6\xb8\xb1\x60\xa5\xce\xb0\xea\xa6\x72\xbc\xd0\xb7\xc7\xa6\x72\xc5\xe9\xaa\xed")
    // { encoding: "Big5", confidence: 0.99 }
    
    // Martin Kühl
    // jschardet.detectAll("\x3c\x73\x74\x72\x69\x6e\x67\x3e\x4d\x61\x72\x74\x69\x6e\x20\x4b\xfc\x68\x6c\x3c\x2f\x73\x74\x72\x69\x6e\x67\x3e")
    // [
    //   {encoding: "windows-1252", confidence: 0.95},
    //   {encoding: "ISO-8859-2", confidence: 0.8796300205763055},
    //   {encoding: "SHIFT_JIS", confidence: 0.01}
    // ]
    

    Browser

    Copy and include jschardet.min.js in your web page.

    This library is also available in cdnjs at https://cdnjs.cloudflare.com/ajax/libs/jschardet/1.4.1/jschardet.min.js

    Options

    // See all information related to the confidence levels of each encoding.
    // This is useful to see why you're not getting the expected encoding.
    jschardet.enableDebug();
    
    // Default minimum accepted confidence level is 0.20 but sometimes this is not
    // enough, specially when dealing with files mostly with numbers.
    // To change this to 0 to always get something or any other value that can
    // work for you.
    jschardet.detect(str, { minimumThreshold: 0 });
    
    // Lock down which encodings to detect, can be useful in situations jschardet
    // is giving a higher probability to encodings that you never use.
    jschardet.detect(str, { detectEncodings: ["UTF-8", "windows-1252"] });
    

    Supported Charsets

    • Big5, GB2312/GB18030, EUC-TW, HZ-GB-2312, and ISO-2022-CN (Traditional and Simplified Chinese)
    • EUC-JP, SHIFT_JIS, and ISO-2022-JP (Japanese)
    • EUC-KR and ISO-2022-KR (Korean)
    • KOI8-R, MacCyrillic, IBM855, IBM866, ISO-8859-5, and windows-1251 (Russian)
    • ISO-8859-2 and windows-1250 (Hungarian)
    • ISO-8859-5 and windows-1251 (Bulgarian)
    • windows-1252
    • ISO-8859-7 and windows-1253 (Greek)
    • ISO-8859-8 and windows-1255 (Visual and Logical Hebrew)
    • TIS-620 (Thai)
    • UTF-32 BE, LE, 3412-ordered, or 2143-ordered (with a BOM)
    • UTF-16 BE or LE (with a BOM)
    • UTF-8 (with or without a BOM)
    • ASCII

    Technical Information

    I haven’t been able to create tests to correctly detect:

    • ISO-2022-CN
    • windows-1250 in Hungarian
    • windows-1251 in Bulgarian
    • windows-1253 in Greek
    • EUC-CN

    Development

    Use npm run dist to update the distribution files. They’re available at https://github.com/aadsm/jschardet/tree/master/dist.

    Authors

    Ported from python to JavaScript by António Afonso (https://github.com/aadsm/jschardet)

    Transformed into an npm package by Markus Ast (https://github.com/brainafk)

    Dependencies

    Loading dependencies...

    CVE IssuesActive
    0
    Scorecards Score
    2.80
    Test Coverage
    No Data
    Follows Semver
    Yes
    Github Stars
    727
    Dependenciestotal
    3
    DependenciesOutdated
    0
    DependenciesDeprecated
    0
    Threat Modelling
    No
    Repo Audits
    No

    Learn how to distribute jschardet in your own private NPM registry

    npm config set registry  https://npm.cloudsmith.com/owner/repo
    Processing...
    Done
    npm install jschardet
    Processing...
    Done

    Releases

    Loading Version Data
    NPM on Cloudsmith

    Getting started with NPM on Cloudsmith is fast and easy.