tokenizers: Fast, Consistent Tokenization of Natural Language Text
Convert natural language text into tokens. Includes tokenizers for
    shingled n-grams, skip n-grams, words, word stems, sentences, paragraphs,
    characters, shingled characters, lines, Penn Treebank, regular
    expressions, as well as functions for counting characters, words, and sentences,
    and a function for splitting longer texts into separate documents, each with
    the same number of words.  The tokenizers have a consistent interface, and
    the package is built on the 'stringi' and 'Rcpp' packages for  fast
    yet correct tokenization in 'UTF-8'. 
| Version: | 
0.3.0 | 
| Depends: | 
R (≥ 3.1.3) | 
| Imports: | 
stringi (≥ 1.0.1), Rcpp (≥ 0.12.3), SnowballC (≥ 0.5.1) | 
| LinkingTo: | 
Rcpp | 
| Suggests: | 
covr, knitr, rmarkdown, stopwords (≥ 0.9.0), testthat | 
| Published: | 
2022-12-22 | 
| DOI: | 
10.32614/CRAN.package.tokenizers | 
| Author: | 
Lincoln Mullen  
    [aut, cre],
  Os Keyes   [ctb],
  Dmitriy Selivanov [ctb],
  Jeffrey Arnold  
    [ctb],
  Kenneth Benoit  
    [ctb] | 
| Maintainer: | 
Lincoln Mullen  <lincoln at lincolnmullen.com> | 
| BugReports: | 
https://github.com/ropensci/tokenizers/issues | 
| License: | 
MIT + file LICENSE | 
| URL: | 
https://docs.ropensci.org/tokenizers/,
https://github.com/ropensci/tokenizers | 
| NeedsCompilation: | 
yes | 
| Citation: | 
tokenizers citation info  | 
| Materials: | 
README, NEWS  | 
| In views: | 
NaturalLanguageProcessing | 
| CRAN checks: | 
tokenizers results | 
Documentation:
Downloads:
Reverse dependencies:
| Reverse imports: | 
blocking, covfefe, deeplr, DeepPINCS, DramaAnalysis, pdfsearch, proustr, rslp, textrecipes, tidypmc, tidytext, ttgsea, wactor, WhatsR | 
| Reverse suggests: | 
edgarWebR, sumup, torchdatasets | 
| Reverse enhances: | 
quanteda | 
Linking:
Please use the canonical form
https://CRAN.R-project.org/package=tokenizers
to link to this page.