COS_TEXT                Cosine similarity for text documents
Count_Rows              Number of rows of a file
Doc2Vec                 Conversion of text documents to
                        word-vector-representation features ( Doc2Vec )
JACCARD_DICE            Jaccard or Dice similarity for text documents
TEXT_DOC_DISSIM         Dissimilarity calculation of text documents
big_tokenize_transform
                        String tokenization and transformation for big
                        data sets
bytes_converter         bytes converter of a text file ( KB, MB or GB )
cluster_frequency       Frequencies of an existing cluster object
cosine_distance         cosine distance of two character strings (each
                        string consists of more than one words)
dense_2sparse           convert a dense matrix to a sparse matrix
dice_distance           dice similarity of words using n-grams
dims_of_word_vecs       dimensions of a word vectors file
levenshtein_distance    levenshtein distance of two words
load_sparse_binary      load a sparse matrix in binary format
matrix_sparsity         sparsity percentage of a sparse matrix
read_characters         read a specific number of characters from a
                        text file
read_rows               read a specific number of rows from a text file
save_sparse_binary      save a sparse matrix in binary format
select_predictors       Exclude highly correlated predictors
sparse_Means            RowMens and colMeans for a sparse matrix
sparse_Sums             RowSums and colSums for a sparse matrix
sparse_term_matrix      Term matrices and statistics (
                        document-term-matrix, term-document-matrix)
text_file_parser        text file parser
text_intersect          intersection of words or letters in tokenized
                        text
token_stats             token statistics
tokenize_transform_text
                        String tokenization and transformation (
                        character string or path to a file )
tokenize_transform_vec_docs
                        String tokenization and transformation ( vector
                        of documents )
utf_locale              utf-locale for the available languages
vocabulary_parser       returns the vocabulary counts for small or
                        medium ( xml and not only ) files
