NEWS
====

Versioning
----------

Releases will be numbered with the following semantic versioning format:

<major>.<minor>.<patch>

And constructed with the following guidelines:

* Breaking backward compatibility bumps the major (and resets the minor
  and patch)
* New additions without breaking backward compatibility bumps the minor
  (and resets the patch)
* Bug fixes and misc changes bumps the patch




textshape 1.7.2 - 1.7.3
----------------------------------------------------------------

BUG FIXES
  
* `split_token` would keep quote apostrophes with the leading/ending words of the 
  quote.  These are now stripped off as individual tokens.  Apostrophes are still  
  preserved inside of words (e.g., possessives and contractions)


textshape 1.6.1 - 1.7.1
----------------------------------------------------------------


BUG FIXES

`as_list` contained a bug because it relies on `apply` which can simplfy lists
  to a data.frame when the lists contain equal numbers of each eleement.

`split_run` relies on **stringi** which had a change in API that caused the 
  strings to no longer split correctly (a space was added at the end).  This has 
  been fixed.
  
* `split_token` would keep quote apostrophes with the leading/ending words of the 
  quote.  These are now stripped off as individual tokens.  Apostrophes are still  
  preserved inside of words (e.g., possessives and contractions)

NEW FEATURES

* `grab_index` added to grab from a start up to an ending index.  

* `grab_match` added to grab from a start up to an ending regex match.


IMPROVEMENTS

* `split_transcript` now has more accurate regex group capturing.



textshape 1.5.1 - 1.6.0
----------------------------------------------------------------

BUG FIXES

* `split_match_regex_to_transcript` gave warnings about `perl = TRUE` that were
  due to `gsub` with a `fixed = TRUE` clashing with the `perl = TRUE`.

NEW FEATURES

* `flatten` added for flattening nested, named lists into single tiered lists
  using the concatenated list/atomic vector names as the names of the single
  tiered list.

* `unnest_text` added to located and un-nest nested text columns in a data.frame.

IMPROVEMENTS

* `tidy_dtm`/`tidy_tdm` did not order unnamed matrices as expected (e.g.,
  `{1, 2, ..., 1}` was ordered as `{1, 10, 2, ...}`).  This has been corrected.




textshape 1.4.0 - 1.5.0
----------------------------------------------------------------

BUG FIXES

* split_sentence`, `split_word`, `split_token`, & `split_speaker` did not handle
  single row data.frames properly resulting in loss of data.  This has been fixed.

NEW FEATURES

* `split_sentence_token` added as a shortcut to split into sentences, add a
  sentence index, and then split into tokens and add a token index.

* `tidy_matrix` and `tidy_adjacency_matrix` added to provide easy tidy
  representations of these data types.

* `cluster_matrix` added for reordering the columns/rows of matrices via
  hierarchical clustering.

IMPROVEMENTS

* `split_sentence` now handles digit(s) + inch (in.) abbreviation if not
  followed by a capital letter.  Previously, this was split on.  Additionally,
  post script (p.s.) is no longer split on.



textshape 1.1.0 - 1.3.0
----------------------------------------------------------------

BUG FIXES

* `tidy_list` did not add `content.attribute.name` for lists of named vectors.


MINOR FEATURES

* `split_match_regex` added as a version of `split_match` with `regex = TRUE` by
  default.  This makes it easier to reason about what the function call is doing.

* `split_match_regex_to_transcript` added to directly split a text by a person
  regex and convert to a two column transcript of person and dialogue.


IMPROVEMENTS

* `tidy_list` now uses **data.table**'s `rbind` for lists of `data.frame`s.
  This means column ordering does not need to match and missing columns are
  automatically filled with `NA`s.

* `split_sentence` has better handling for the 'No.' abbreviation that
  distinguishes between 'No.' followed by digits (assumed to be and abbreviation)
  and when no digits follow (assumed to be a complete sentence).

* `split_sentence` has better handling for quoted material (i.e., a punctuation
  mark followed by single or double quotes that is not followed by a comma).

* `split_sentence` has better handling for single and double middle names
  presented as initials.

* `split_sentence` has better handling for abbreviated English units of measure.

CHANGES

* `combine.default` included element names by default.  This has been removed to
  include only the elements.


textshape 1.0.2
----------------------------------------------------------------

BUG FIXES

* `tidy_list` with a list of unnamed `data.frame`s resulted in an error (see
  issue #7).  This issue has been fixed.

* `split_word.data.frame` and `split_token.data.frame` both used an incorrect
  column naming of `sentence_id` for word and token index respectively.  These
  columns are now renamed to `word_id` and `token_id` respectively.

* `split_token` gets a more robust splitting algorithm.

NEW FEATURES

* `column_to_rownames` added to enable one to quickly add a column as rownames
  easily within a pipeline.  This is useful when turning a `data.frame` into a
  `matrix`.

* `tidy_list` picks up the ability to tidy a list of named vectors into three
  columns.

CHANGES

* `as.tibble` removed from all function arguments.  This was a nice interactive
  feature that made programming very difficult to reason about.  Having an
  environment dependent output would result in no adoption of the **textshape**
  package as a dependency.  Additionally, `set_output` and `tibble_output`,
  two complementary function have been removed without being deprecated.  The
  problem was so egregious and the package infant enough, that removal without
  deprecation was warranted.



textshape 1.0.1
----------------------------------------------------------------


NEW FEATURES

* Users can now globally select a **tibble** output rather than a **data.table**
  output for all functions that outputted a **data.table**.  This can be set
  globally via `set_output`.  If the user does not set the output type
  **textshape** tries to infer based on whether or not the user has **dplyr**
  loaded.  If **dplyr** is loaded then *tibble* is the default output.

* `set_output` and `tibble_output` added to globally set the output type
  (**tibble** or **data.table**) and to check/infer the desired output type.


textshape 1.0.0
----------------------------------------------------------------

CHANGES

* `bind_list`, `bind_table`, & `bind_vector` have been renamed to the more
  meaningful forms of `tidy_list`, `tidy_table`, & `tidy_vector`.  The former
  version are now deprecated.  This bumps the version to 1.0.0 as this is a
  major change that breaks backward compatibility.



textshape 0.1.0 - 0.2.0
----------------------------------------------------------------

NEW FEATURES

* `bind_list` added to `rbind` a `list` of named `data.frame`s or `vector`s.

* `split_transcript` added to split a transcript style vector (e.g.,
  `c("greg: Who me", "sarah: yes you!")` into a name and dialogue vector that is
  coerced to a `data.table`.

* `change_index` added  for extracting the indices of changes in runs within an
  atomic vector.  Pairs well with `split_index`.

* `bind_vector` added to `cbind` a named atomic `vector`'s names and values.

* `bind_table` added to `cbind` a `table`'s names and values.

* `duration` method for numeric vectors added as well as a `starts` and `ends`
  function for calculating start and end times from a numeric vector.

* `from_to` added to prepare speaker data for a network lot given the flowing
  nature of discourse.

* `tidy_dtm` & `tidy_tdm` added to convert a `DocumentTermMatrix`
  or `TermDocumentMatrix` into a tidied `data.frame`.

* `tidy_colo_dtm` & `tidy_colo_tdm` added to convert a `DocumentTermMatrix`
  or `TermDocumentMatrix` into a collocation matrix and then a tidied `data.frame`.

* `unique_pairs` added to compliment the output of `tidy_colo_dtm` &
  `tidy_colo_tdm`.  Enables the removal of duplicated collocating pairs caused
  by symmetrical mirroring of the upper and lower triangle of the collocation
  matrix.

CHANGES

* `split_index` now uses `change_index(x)` as the default when `x` is an atomic
  vector.


textshape 0.0.1
----------------------------------------------------------------

Tools that can be used to reshape text data.
