Skip to content

Offline tokenization produces empty token_strings #493

@yifanmai

Description

@yifanmai

When I run the script on this doc: https://docs.cohere.com/reference/tokenize

response = co.tokenize(text="tokenize me! :D", model="command")

I get:

tokens=[10002, 2261, 2012, 8, 2792, 43] token_strings=[] meta=None

where token_strings is an empty array, even thought the docs suggests that it should be non-empty. However, if I run:

response = co.tokenize(text="tokenize me! :D", model="command", offline=False)

I get the token_strings as expected:

tokens=[10002, 2261, 2012, 8, 2792, 43] token_strings=['token', 'ize', ' me', '!', ' :', 'D'] meta=ApiMeta(api_version=ApiMetaApiVersion(version='1', is_deprecated=None, is_experimental=None), billed_units=None, tokens=None, warnings=None)

It would be nice if token_strings could be supported for offline tokenization, so that the online and offline behavior is identical. I'll attach a pull request for how this could be done.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions