Create an instance of a fitting problem

The ml_problem() function is the first step for fitting a reference sample to known control totals with mlfit. All algorithms (see ml_fit()) expect an object created by this function (or optionally processed with flatten_ml_fit_problem()).

The special_field_names() function is useful for the field_names argument to ml_problem.

ml_problem(
  ref_sample,
  controls = list(individual = individual_controls, group = group_controls),
  field_names,
  individual_controls = NULL,
  group_controls = NULL,
  prior_weights = NULL,
  geo_hierarchy = NULL
)

is_ml_problem(x)

# S3 method for class 'ml_problem'
format(x, ...)

# S3 method for class 'ml_problem'
print(x, ...)

special_field_names(
  groupId,
  individualId,
  individualsPerGroup = NULL,
  count = NULL,
  zone = NULL,
  region = NULL,
  prior_weight = NULL
)

Arguments

ref_sample: The reference sample
controls: Control totals, by default initialized from the individual_controls and group_controls arguments
field_names: Names of special fields, construct using special_field_names()
individual_controls, group_controls: Control totals at individual and group level, given as a list of data frames where each data frame defines a control
prior_weights: (Deprecated) Use special_field_names(prior_weight = '<column-name>') to specify the prior weight column in the ref_sample instead.
geo_hierarchy: A table shows mapping between a larger zoning level to many zones of a smaller zoning level. The column name of the larger level should be specified in field_names as 'region' and the smaller one as 'zone'.
x: An object
...: Ignored.
groupId, individualId: Name of the column that defines the ID of the group or the individual
individualsPerGroup: Obsolete.
count: Name of control total column in control tables (use first numeric column in each control by default).
region, zone: Name of the column that defines the region of the reference sample or the zone of the controls. Note that region is a larger area that contains more than one zone.
prior_weight: Name of the column that defines the prior weight of the reference sample. Prior (or design) weights at group level; by default a vector of ones will be used, which corresponds to random sampling of groups.

Value

An object of class ml_problem or a list of them if geo_hierarchy was given, essentially a named list with the following components:

refSample: The reference sample, a data.frame.
controls: A named list with two components, individual and group. Each contains a list of controls as data.frames.
fieldNames: A named list with the names of special fields.

is_ml_problem() returns a logical.

Examples

# Create example from Ye et al., 2009

# Provide reference sample
ye <- tibble::tribble(
  ~HHNR, ~PNR, ~APER, ~HH_VAR, ~P_VAR,
  1, 1, 3, 1, 1,
  1, 2, 3, 1, 2,
  1, 3, 3, 1, 3,
  2, 4, 2, 1, 1,
  2, 5, 2, 1, 3,
  3, 6, 3, 1, 1,
  3, 7, 3, 1, 1,
  3, 8, 3, 1, 2,
  4, 9, 3, 2, 1,
  4, 10, 3, 2, 3,
  4, 11, 3, 2, 3,
  5, 12, 3, 2, 2,
  5, 13, 3, 2, 2,
  5, 14, 3, 2, 3,
  6, 15, 2, 2, 1,
  6, 16, 2, 2, 2,
  7, 17, 5, 2, 1,
  7, 18, 5, 2, 1,
  7, 19, 5, 2, 2,
  7, 20, 5, 2, 3,
  7, 21, 5, 2, 3,
  8, 22, 2, 2, 1,
  8, 23, 2, 2, 2
)
ye
#> # A tibble: 23 × 5
#>     HHNR   PNR  APER HH_VAR P_VAR
#>    <dbl> <dbl> <dbl>  <dbl> <dbl>
#>  1     1     1     3      1     1
#>  2     1     2     3      1     2
#>  3     1     3     3      1     3
#>  4     2     4     2      1     1
#>  5     2     5     2      1     3
#>  6     3     6     3      1     1
#>  7     3     7     3      1     1
#>  8     3     8     3      1     2
#>  9     4     9     3      2     1
#> 10     4    10     3      2     3
#> # ℹ 13 more rows

# Specify control at household level
ye_hh <- tibble::tribble(
  ~HH_VAR, ~N,
  1,       35,
  2,       65
)
ye_hh
#> # A tibble: 2 × 2
#>   HH_VAR     N
#>    <dbl> <dbl>
#> 1      1    35
#> 2      2    65

# Specify control at person level
ye_ind <- tibble::tribble(
  ~P_VAR, ~N,
  1, 91,
  2, 65,
  3, 104
)
ye_ind
#> # A tibble: 3 × 2
#>   P_VAR     N
#>   <dbl> <dbl>
#> 1     1    91
#> 2     2    65
#> 3     3   104

ye_problem <- ml_problem(
  ref_sample = ye,
  field_names = special_field_names(
    groupId = "HHNR", individualId = "PNR", count = "N"
  ),
  group_controls = list(ye_hh),
  individual_controls = list(ye_ind)
)
ye_problem
#> An object of class ml_problem
#>   Reference sample: 23 observations
#>   Control totals: 1 at individual, and 1 at group level

fit <- ml_fit_dss(ye_problem)
fit$weights
#>  [1]  8.937470  8.937470  8.937470 23.448579 23.448579  2.613950  2.613950
#>  [8]  2.613950 25.899223 25.899223 25.899223 14.347802 14.347802 14.347802
#> [15] 11.009562 11.009562  2.733852  2.733852  2.733852  2.733852  2.733852
#> [22] 11.009562 11.009562