Continuous integration

  • Use stable pak (#85).

Continuous integration

  • Latest changes (#84).

Continuous integration

  • Install via R CMD INSTALL ., not pak (#83).

    • ci: Install via R CMD INSTALL ., not pak

    • ci: Bump version of upload-artifact action

    • ci: Use pkgdown branch

    • ci: Updates from duckdb

    • ci: Trigger run

    • ci: Trigger run

Chore

Continuous integration

  • Install local package for pkgdown builds.

  • Improve support for protected branches with fledge.

  • Improve support for protected branches, without fledge.

  • Sync with latest developments.

  • Use v2 instead of master.

  • Inline action.

  • Use dev roxygen2 and decor.

  • Fix on Windows, tweak lock workflow.

  • Avoid checking bashisms on Windows.

  • Better commit message.

  • Bump versions, better default, consume custom matrix.

  • Recent updates.

  • Internal changes only.
  • Internal changes only.
  • Internal changes only.

Bug fixes

ml_replicate

  • Getting a replication algorithm now guarantee to w… (#81).
  • Fixed the internal working of ml_replicate(). Getting a replication algorithm now is guaranteed to work even without the mlfit package in the current environment. This can be an fatal issue when mlfit is internally called by another package. The root cause of this is in the .get_int_fnc() that uses as.environment(“package:mlfit”).

Chore

  • Styler.
  • Soft deprecated the prior_weights argument. The name of a weight column in the reference sample to be used as prior weights should be specified using the prior_weight argument in special_field_names().
  • the group_controls and individual_controls arguments of ml_problem() now have NULL as their default value.
  • Fixed ml_problem() not allowing single-level control when the geo_hierarchy argument is not NULL. (#78, @walkerke)
  • It is now possible to use prior weights in a geo_hierarchy ml_problem(). Simply specify the name of the weight column in your reference sample using the prior_weight argument in special_field_names(). (#78, @walkerke)

doc

NEWS.md

  • Correct the latest news update.

news

  • Fixed inconsistence header levels and invalid release titles in NEWS.md (#76, @maelle)
  • Internal changes only.
  • Harmonize yaml formatting.

  • Revert changes to matrix section.

  • Reduce parallelism.

  • Also check dev on cran-* branches.

  • Update hash key for dev.

  • Remove R 3.3.

  • Fixed the length of the flat_weights field of an ml_fit object when there are one or more entries that correspond to zero-valued controls in its ml_problem object. (#60)
  • Upgraded to testthat 3rd edition.
  • Replaced deprecated functions.
  • Make a file check test more robust as per CRAN’s suggestion.
  • ml_problem() gains a geo_hierarchy argument, which let the user specifies a region and zone table for creating a list of ml_problem objects based on zones. See the README page for an example. Printing ml_problem will also show its zone, if exists.
  • Add a person ID column to Ye’s example. (#57)
  • Fix the package’s URL (#56) and URLs used in the package.
  • Add a GPL-3 license file and the copyright holder role to Kirill.
  • Add more changes for first CRAN submission. These changes do not affect any functionalities.
  • Add an example section to README.
  • Add more changes for first CRAN submission. These changes do not affect any functionalities.
  • Add an example section to README
  • Add ml_replicate() for replicating the reference sample of a fitted problem (#38).
  • Fix the error when a level is missing from the reference sample (#32, @asiripanich).
  • Fix test on Windows.
  • Avoid converting sparse matrix to full matrix.
  • If the controls contain values of zero for existing observations in the reference sample, the removal of these observations now works in all cases (#30).
  • Add overview in package documentation.
  • Add examples to all functions.
  • Explicitly document return values to ml_fit().
  • Set default maximum number of iterations for HIPF to 2000.
  • Convert documentation to Markdown.
  • Add “Driven by” and “Related work” sections to the README.
  • Use a sparse matrix for the flattened reference sample.
  • Status messages with verbose = TRUE are prepended with a time stamp.
  • Fail if NA group ID found.
  • Reorganized and renamed internal datasets.
  • Fitting result contains iterations and tol members (#28).
  • Fixed model matrix of “separate” type if only grand totals are given.
  • ml_fit() gains tol argument, which determines the success of a fitting operation.
  • ml_fit objects have new members success, rel_residuals, and flat_weighted_values (#28).
  • HIPF and IPU stop iterating if tolerance is reached.
  • IPU and HIPF abort iteration when the weights do not change measurably between two iterations (#27).
  • Features
    • New algoritms: HIPF (#2) and IPU.
  • Interface
    • New as.flat_ml_fit_problem() is used to coerce input for the ml_fit_ functions.
    • format() and print() methods for classes fitting_problem, flat_ml_fit_problem and ml_fit.
    • Flattened reference sample now contains observations in rows, and controls in columns (#26).
    • flatten_ml_fit_problem() gains new model_matrix_type argument that allows selecting an alternative model matrix building method where all cross-classifications are allocated to a column, regardless of overlaps. Flattened problems store the type of model matrix used, it is also shown with the format() and print() methods.
  • Improvements
    • Reference sample doesn’t need to be ordered by group ID anymore.
    • Remove individualsPerGroup special variable.
    • Allow problems with individual-only controls.
    • Check for correspondence of levels between sample and controls.
    • Check for NA values in controls.
  • Technical changes
    • Use grake package again for calibration, because the alternatives are worse: sampling uses a too low tolerance, survey forcibly loads MASS, and laeken could work but is unrelated (which is the reason grake has been started in the first place).
    • Duplicate rows are kept in the reference sample.
    • Rename control_totals to target_values.
    • New toy_example() allows easier access to bundled examples, load with readRDS().
    • Move legacy format (IPAF) and related functions to data-raw directory.
    • Use factors internally.
  • Performance
  • Tests
    • Specific test for households with the same signature.
  • Documentation
    • Enhance example.
    • Include flat example problem (group size = 1 for all groups).
  • Cleanup
  • New functions compute_margins() and margins_to_df() for validation
  • Support specification of prior weights in construction of fitting problems
  • Use survey::grake() instead of grake::calibWeights().
  • Adapt to change of undocumented behavior in base R.
  • Don’t alter column names of controls if they are of type data.table (explicitly convert to data.frame)
  • Proper handling of corner cases (reference sample with one row, and grand total controls and dummy controls with only one category)
  • Allow character variables (in addition to factors) as control variables
  • Explicit error message if reference sample is not sorted
  • If name of count column in controls is not specified, it is determined automatically (with a message in verbose mode)
  • Expansion of weights loads Matrix package if necessary
  • Clarify documentation
  • Straighten out imports, use importFrom instead of ::
  • new functions fitting_problem, is.fitting_problem, special_field_names
  • all fitting functions now expect an object of class fitting_problem (as returned by the fitting_problem and import_IPAF_problem functions); former calls like ml_fit(ref_sample, controls, field_names) now need to be written as ml_fit(fitting_problem(...))
  • use grake package instead of laeken
  • new argument ginv to ml_fit_dss, passed down to calibWeights
  • fix example for ml_fit_dss
  • new function ml_fit_dss with an implementation very close to the paper by Deville et al. (1993); implementation in the laeken package

  • normalize weights to get rid of precision problems

  • allow partly uncontrolled attributes and controls without observations in the reference sample (with a warning, #24)

  • better error reporting for non-factor controls and existence of group ID column

  • improve warning and progress messages

  • return correct weights – regression introduced in # mlfit 0.0.9

  • rewrite transformation of weights using sparse matrices and a home-grown Moore-Penrose inverse for our (very special) transformation matrix (#17)

  • warn on missing observations for nonzero controls (#20)

  • ml_fit_entropy_o also returns flat weights

  • allow arbitrary order in control total tables (#19)

  • remove observations that correspond to zero-valued control totals, with warning; don’t warn if no corresponding observations need to be removed (#16)

  • support multiple controls at individual or group level, also detect conflicting control totals

  • support fitting one-dimensional problems (where only group-level controls are given)

  • new function flatten_ml_fit_problem: transform representation as returned by import_IPAF_result into a matrix, a control vector and a weights vector

  • function ml_fit_entropy_o: use BB::dfsane instead of BB::BBsolve for solving the optimization problem; rename argument BBsolve_args to dfsane_args

  • function ml_fit: new parameter verbose

  • aggregate identical household types, implement prior weights (so far only internally)

  • Add example for ml_fit (#11)

  • allow additional arguments for the algorithms; ml_fit_entropy_o now accepts a named list BBsolve_args that contains additional arguments to BB::BBsolve

  • Faster internal data preparation for ml_fit_entropy_o

  • Fix dependency issues (#13, #14)

  • Add example for ml_fit_entropy_o (#11)

  • Print more helpful error message if control totals and reference sample categories do not overlap (#11)

  • import_IPAF_results now returns a class of type IPAF_results
  • New functions ml_ipf and ml_ipf_entropy_o, implementation does not yet return the same weights as the Python code
  • Convert control columns to factors
  • Fix importing configuration files with more than one control of any type and with comments in the control definition

  • New parameter config_name to import, defaults to config.xml

  • Parameter all_weights to import that allows importing also intermediate weights. The output format of import has changed, the weights for each algorithm are now always a list of weight vectors, even in the default case all_weights == FALSE (#5).
  • Import results of old Python code (#1).
  • Initial setup