The ml_problem() function is the first step for fitting a reference sample to known control totals with mlfit. All algorithms (see ml_fit()) expect an object created by this function (or optionally processed with flatten_ml_fit_problem()).

The special_field_names() function is useful for the field_names argument to ml_problem.

ml_problem(
  ref_sample,
  controls = list(individual = individual_controls, group = group_controls),
  field_names,
  individual_controls = NULL,
  group_controls = NULL,
  prior_weights = NULL,
  geo_hierarchy = NULL
)

is_ml_problem(x)

# S3 method for ml_problem
format(x, ...)

# S3 method for ml_problem
print(x, ...)

special_field_names(
  groupId,
  individualId,
  individualsPerGroup = NULL,
  count = NULL,
  zone = NULL,
  region = NULL,
  prior_weight = NULL
)

Arguments

ref_sample

The reference sample

controls

Control totals, by default initialized from the individual_controls and group_controls arguments

field_names

Names of special fields, construct using special_field_names()

individual_controls, group_controls

Control totals at individual and group level, given as a list of data frames where each data frame defines a control

prior_weights

(Deprecated) Use special_field_names(prior_weight = '<column-name>') to specify the prior weight column in the ref_sample instead.

geo_hierarchy

A table shows mapping between a larger zoning level to many zones of a smaller zoning level. The column name of the larger level should be specified in field_names as 'region' and the smaller one as 'zone'.

x

An object

...

Ignored.

groupId, individualId

Name of the column that defines the ID of the group or the individual

individualsPerGroup

Obsolete.

count

Name of control total column in control tables (use first numeric column in each control by default).

region, zone

Name of the column that defines the region of the reference sample or the zone of the controls. Note that region is a larger area that contains more than one zone.

prior_weight

Name of the column that defines the prior weight of the reference sample. Prior (or design) weights at group level; by default a vector of ones will be used, which corresponds to random sampling of groups.

Value

An object of class ml_problem or a list of them if geo_hierarchy

was given, essentially a named list with the following components:

refSample

The reference sample, a data.frame.

controls

A named list with two components, individual and group. Each contains a list of controls as data.frames.

fieldNames

A named list with the names of special fields.

is_ml_problem() returns a logical.

Examples

# Create example from Ye et al., 2009

# Provide reference sample
ye <- tibble::tribble(
  ~HHNR, ~PNR, ~APER, ~HH_VAR, ~P_VAR,
  1, 1, 3, 1, 1,
  1, 2, 3, 1, 2,
  1, 3, 3, 1, 3,
  2, 4, 2, 1, 1,
  2, 5, 2, 1, 3,
  3, 6, 3, 1, 1,
  3, 7, 3, 1, 1,
  3, 8, 3, 1, 2,
  4, 9, 3, 2, 1,
  4, 10, 3, 2, 3,
  4, 11, 3, 2, 3,
  5, 12, 3, 2, 2,
  5, 13, 3, 2, 2,
  5, 14, 3, 2, 3,
  6, 15, 2, 2, 1,
  6, 16, 2, 2, 2,
  7, 17, 5, 2, 1,
  7, 18, 5, 2, 1,
  7, 19, 5, 2, 2,
  7, 20, 5, 2, 3,
  7, 21, 5, 2, 3,
  8, 22, 2, 2, 1,
  8, 23, 2, 2, 2
)
ye
#> # A tibble: 23 × 5
#>     HHNR   PNR  APER HH_VAR P_VAR
#>    <dbl> <dbl> <dbl>  <dbl> <dbl>
#>  1     1     1     3      1     1
#>  2     1     2     3      1     2
#>  3     1     3     3      1     3
#>  4     2     4     2      1     1
#>  5     2     5     2      1     3
#>  6     3     6     3      1     1
#>  7     3     7     3      1     1
#>  8     3     8     3      1     2
#>  9     4     9     3      2     1
#> 10     4    10     3      2     3
#> # ℹ 13 more rows

# Specify control at household level
ye_hh <- tibble::tribble(
  ~HH_VAR, ~N,
  1,       35,
  2,       65
)
ye_hh
#> # A tibble: 2 × 2
#>   HH_VAR     N
#>    <dbl> <dbl>
#> 1      1    35
#> 2      2    65

# Specify control at person level
ye_ind <- tibble::tribble(
  ~P_VAR, ~N,
  1, 91,
  2, 65,
  3, 104
)
ye_ind
#> # A tibble: 3 × 2
#>   P_VAR     N
#>   <dbl> <dbl>
#> 1     1    91
#> 2     2    65
#> 3     3   104

ye_problem <- ml_problem(
  ref_sample = ye,
  field_names = special_field_names(
    groupId = "HHNR", individualId = "PNR", count = "N"
  ),
  group_controls = list(ye_hh),
  individual_controls = list(ye_ind)
)
ye_problem
#> An object of class ml_problem
#>   Reference sample: 23 observations
#>   Control totals: 1 at individual, and 1 at group level

fit <- ml_fit_dss(ye_problem)
fit$weights
#>  [1]  8.937470  8.937470  8.937470 23.448579 23.448579  2.613950  2.613950
#>  [8]  2.613950 25.899223 25.899223 25.899223 14.347802 14.347802 14.347802
#> [15] 11.009562 11.009562  2.733852  2.733852  2.733852  2.733852  2.733852
#> [22] 11.009562 11.009562