hanzidentifier

1.3.0last stable release 5 months ago

Complexity Score

Low

Open Issues

Dependent Projects

Weekly Downloadsglobal

9,640

Keywords

License

MIT
- Yesattribution
- Permissivelinking
- Permissivedistribution
- Permissivemodification
- Nopatent grant
- Yesprivate use
- Permissivesublicensing
- Notrademark grant

Downloads

Readme

================ Hanzi Identifier

.. image:: https://badge.fury.io/py/hanzidentifier.svg :target: https://pypi.org/project/hanzidentifier

.. image:: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml/badge.svg :target: https://github.com/tsroten/hanzidentifier/actions/workflows/ci.yml

Hanzi Identifier is a simple Python module that identifies a string of text as having Simplified or Traditional characters.

GitHub: https://github.com/tsroten/hanzidentifier
Free software: MIT license

About

Easy-to-use helper functions for identifying strings:

.. code:: python

>>> import hanzidentifier
>>> hanzidentifier.has_chinese('Hello my name is John.')
False
>>> hanzidentifier.is_simplified('John说：你好！')
True
>>> hanzidentifier.is_traditional('John說：你好！')
True
>>> hanzidentifier.has_chinese('Country in Simplified: 国家. Country in Traditional: 國家.')
True

Here it is without the helper functions:

.. code:: python

>>> hanzidentifier.identify('Hello my name is Thomas.') is hanzidentifier.UNKNOWN
True
>>> hanzidentifier.identify('Thomas 说：你好！') is hanzidentifier.SIMPLIFIED
True
>>> hanzidentifier.identify('Thomas 說：你好！') is hanzidentifier.TRADITIONAL
True
>>> hanzidentifier.identify('你好！') is hanzidentifier.BOTH
True
>>> hanzidentifier.identify('Country in Simplified: 国家. Country in Traditional: 國家.' ) is hanzidentifier.MIXED
True

hanzidentifier.identify has five possible return values:

hanzidentifier.UNKNOWN: there are no recognized Chinese characters in the string.
hanzidentifier.BOTH: the string is compatible with both Simplified and Traditional character systems.
hanzidentifier.TRADITIONAL: the string consists of Traditional characters.
hanzidentifier.SIMPLIFIED: the string consists of Simplified characters.
hanzidentifier.MIXED: the string consists of characters recognized solely as Traditional characters and also consists of characters recognized solely as Simplified characters.

Characters that aren’t found in CC-CEDICT are ignored when determining a string’s identity. Hanzi Identifier uses the CC-CEDICT data provided by Zhon <https://github.com/tsroten/zhon>_ to identify Chinese characters.

Because the Traditional and Simplified Chinese character systems overlap, a string containing Simplified characters could identify as hanzidentifier.SIMPLIFIED or hanzidentifier.BOTH depending on if the characters are also Traditional characters.

Getting Started

Install Hanzi Identifier: $ pip install hanzidentifier
Report bugs and ask questions via GitHub Issues <https://github.com/tsroten/hanzidentifier/issues>_
Contribute features or bug fixes <https://github.com/tsroten/hanzidentifier/pulls>_

Dependencies

CVE IssuesActive

Scorecards Score

3.80

Test Coverage

No Data

Follows Semver

Yes

Github Stars

Dependenciestotal

DependenciesOutdated

DependenciesDeprecated

Threat Modelling

Repo Audits

Learn how to distribute hanzidentifier in your own private PyPI registry

pip install hanzidentifier

Processing...

Done

Start your free trial

5 Releases

PyPI on Cloudsmith

Getting started with PyPI on Cloudsmith is fast and easy.

Learn more about PyPI on Cloudsmith

View the Cloudsmith + Python Docs

Keywords

License

Readme

================ Hanzi Identifier

About

Getting Started

zhon~=2.1

53Quality

37Maintenance

80Docs

Learn how to distribute hanzidentifier in your own private PyPI registry

5 Releases

Getting started with PyPI on Cloudsmith is fast and easy.