```
<- 31000000/1000
weight <- rnorm(31000000,50000,5000)
population <- sample(population,1000)
sample
<- (31000000/1000)*sum(sample)) (T_hat_X
```

`[1] 1.550859e+12`

These are my notes from the book. I don’t distinguish between direct quotes from the book and my own comments. For illustrative purposes, I replicate some of the analysis using the package using basic algebra. This is a work in progress.

Most of statistics is model-bused (think regression,MRP). The analysis of complex survey samples, in contrast, is usually design-based. The model represents the process; can be generalized to other situations. The goal of the design-based analysis is to estimate features of the fixed population.

The fundamental statistical concept in design-based inference is the probability sample or random sample. “Taking a random sample” of 1000 individuals means a sampling procedure when any subset of 1000people from the population is equally likely to be selected.

A stratified random sample is still a probability sample (e.g., taking a simple random sample of 20 people from each state).

What makes aprobability sample is the procedure for taking samples from a population.

- Every individual in the population must have a non-zero probability of ending up in the sample (written \(\pi_i\) for individual \(i\))
- The probability \(\pi_i\) must be known for every individual
- Every pair of individuals in the sample must have a non-zero probability of both ending up in the sample (\(\pi_{ij}\))
- The probability \(\pi_{ij}\) must be known for every pair that does end up in the sample.

The first two properties are necessary in order to get valid population estimates; the last two are necessary to work out the accuracy of the estimates. If individuals were sampled independently of each other the first two properties would guarantee the last two (which is the case in the surveys I work with).

The fundamental statistical idea behind all of design-based inference is that an individual sampled with a sampling probability of \(\pi_i\) represents \(1/\pi_i\) individuals in the population. The value \(1/\pi_i\) is called the sampling weight.

Given a sample of size n,the (Horvitz-Thompson) estimator \(\hat{T}_X\) for the population total \(T_X\) of \(X\) is \[\hat{T}_X=\sum_{i=1}^{n}\frac{1}{\pi_i}X_i\]

The variance estimate is

\[\mathrm{var}[\hat{T}_X] = \sum_{i,j} \left( \frac{X_i X_j}{\pi_{ij}} - \frac{X_i}{\pi_i} \frac{X_j}{\pi_j} \right)\] Knowing the formula for the variance estimator is less important to the applied user, but it is useful to note two things. The first is that the formula applies to any design, however complicated, where \(\pi_i\) and \(\pi_ij\) are known for the sampled observations. The second is that the formula depends on the pairwise sampling probabilities \(\pi_ij\), not just on the sampling weights; this is how correlations in the sampling design enter the computations.

Statisticians use the term ‘weight’ to mean at least three different things.

sampling weights: A sampling weight of 1000 means that the observation represents 1000 individuals in the population.

precision weights: A precision (or inverse-variance) weight of 1000 means that the observation has 1000 times lower variance than an observation with a weight of 1.

frequency weights: A frequency weight of 1000 means that the sample contains 1000 identical observations and space is being saved by using only one record in the data set to represent them.

In this book, weights are always sampling weights. They’ll all produce the same point estimates but not same standard errors.

A complex survey will not have the same standard errors for estimates as a simple random sample of the same size, but many sample size calculations are only conveniently available for simple random samples.

The design effect was defined by Kish (1965) as the ratio of a variance of an estimate in a complex sample to the variance of the same estimate in a simple random sample.

Design effects are greater than 1.0 when the sample is not simple random, implying that a larger sample size is needed.

The book uses some datasets. I think they are all available through the survey package.

The book uses R.

R is very nice, really!

The Horvitz-Thompson estimator of the population total of a variable X is

\[\hat{T}_X=\sum_{i=1}^{n}\frac{1}{\pi_i}X_i\]

A sampling weight \(\pi_i\) is a value assigned to each unit in a sample, representing the number of units in the total population that the sampled unit represents. Let’s say there are 31 million Canadian adults. Let’s say, implausibly, that income is a normal centered at 50k with sd of 5k.

```
<- 31000000/1000
weight <- rnorm(31000000,50000,5000)
population <- sample(population,1000)
sample
<- (31000000/1000)*sum(sample)) (T_hat_X
```

`[1] 1.550859e+12`

Which is a ok estimate of that true number

`<- sum(population)) (T_X `

`[1] 1.550036e+12`

The variance of this Horvitz-Thompson estimator is

\[\mathrm{var}[\bar{X}] = \frac{N - n}{N} \times N^2 \times \frac{\mathrm{var}[X]}{n}\]

`<- ((31000000-1000)/31000000)*31000000^2*(var(sample)/1000)) (var_T_hat_X `

`[1] 2.293811e+19`

For illustration purposes, you can get this estimator using simulations

```
<- function() (31000000/1000)*sum(sample(population,1000))
get_sum_one_sample var(replicate(100000,get_sum_one_sample()))
```

`[1] 2.40975e+19`

The standard error of our estimator

`sqrt(var_T_hat_X)`

`[1] 4789375170`

The first term in the variance formula is the inite population correction. When the population is large, you can drop it.

The population mean of X can be estimated by dividing the estimated total by the population size, N

`<- T_hat_X / 31000000) (mu_x `

`[1] 50027.7`

The variance estimate is obtained by dividing the variance estimate for the total by \(N^2\) (get the square root for standard error)

`sqrt(var_T_hat_X / 31000000^2)`

`[1] 154.496`

Which is equal to getting the standard error of the mean with the usual formula

`sd(sample)/sqrt(1000)`

`[1] 154.4985`

The Horvitz-Thompson estimator of the population size is the sum of the sampling weights, in large surveys where the population size is known, the sampling weights are adjusted so that the estimated population size equals the true sample size.

Confidence intervals for estimates are computed by using a Normal distribution for the estimate, ie, for a 95% confidence interval adding and subtracting 1.96 standard errors.

The Academic Performance Index is computed for all California schools based on standardised testing of students. The data sets contain information for all schools with at least 100 students and for various probability samples of the data. `?apisrs`

for more info.

`library(survey)`

`Loading required package: grid`

`Loading required package: Matrix`

`Loading required package: survival`

```
Attaching package: 'survey'
```

```
The following object is masked from 'package:graphics':
dotchart
```

```
data(api)
dim(apisrs)
```

`[1] 200 39`

The variable fpc in this data set contains the number 6194, the number of schools in California. `id=~1`

says that individual schools were sampled. `fpc=~fpc`

says that the variable called fpc in the data set contains the population size.

`<- svydesign(id=~1,fpc=~fpc,data=apisrs) srs_design `

We can replicate the estimates and standard errors by hand:

`svytotal(~enroll, srs_design)`

```
total SE
enroll 3621074 169520
```

`6194/200)*sum(apisrs$enroll) (`

`[1] 3621074`

`<- sqrt( ((6194-200)/6194)*6194^2*(var(apisrs$enroll)/200)) ) (svytotal_SE `

`[1] 169519.7`

`svymean(~enroll, srs_design)`

```
mean SE
enroll 584.61 27.368
```

`mean(apisrs$enroll)`

`[1] 584.61`

`sqrt(svytotal_SE^2 / (6194^2))`

`[1] 27.36837`

If the population size is not specified it is necessary to specify the sampling probabilities or sampling weights. If we have the sampling weight but not the population, the standard error are slightly larger.

`svytotal(~stype, srs_design)`

```
total SE
stypeE 4397.74 196.00
stypeH 774.25 142.85
stypeM 1022.01 160.33
```

`<- sqrt( ((6194-200)/6194)*6194^2*(var(as.numeric(apisrs$stype=="E"))/200)) ) (svytotal_SE_E `

`[1] 195.9953`

`<- sqrt( ((6194-200)/6194)*6194^2*(var(as.numeric(apisrs$stype=="H"))/200)) ) (svytotal_SE_H `

`[1] 142.8488`

`<- sqrt( ((6194-200)/6194)*6194^2*(var(as.numeric(apisrs$stype=="M"))/200)) ) (svytotal_SE_M `

`[1] 160.3255`

`svymean(~stype, srs_design)`

```
mean SE
stypeE 0.710 0.0316
stypeH 0.125 0.0231
stypeM 0.165 0.0259
```

`sqrt(svytotal_SE_E^2 / (6194^2))`

`[1] 0.03164276`

`sqrt(svytotal_SE_H^2 / (6194^2))`

`[1] 0.02306244`

`sqrt(svytotal_SE_M^2 / (6194^2))`

`[1] 0.025884`

Stratified sampling involves dividing the population up into groups called strata and drawing a separate probability sample from each one. Since a stratified sample is just a set of simple random samples from each stratum, the Horvitz-Thompson estimator of the total is just the sum of the estimated totals in each stratum and its variance is the sum of the estimated variances in each stratum.

```
<- svydesign(id=~1,strata=~stype,fpc=~fpc,data=apistrat)
strat_design svytotal(~enroll,strat_design)
```

```
total SE
enroll 3687178 114642
```

Off because use of stratas (treating like random sample)

`6194/200)*sum(apistrat$enroll) (`

`[1] 4624967`

Of course, this is ugly and impossible to read but the point is that it works: total is just the sum of the estimated totals in each stratum and its variance is the sum of the estimated variances in each stratum

```
$fpc[apistrat$stype=="E"][1]/sum(apistrat$stype=="E")) *
(apistratsum(apistrat$enroll[apistrat$stype=="E"]) +
$fpc[apistrat$stype=="H"][1]/sum(apistrat$stype=="H")) *
(apistratsum(apistrat$enroll[apistrat$stype=="H"]) +
$fpc[apistrat$stype=="M"][1]/sum(apistrat$stype=="M")) *
(apistratsum(apistrat$enroll[apistrat$stype=="M"])
```

`[1] 3687178`

```
sqrt( ((apistrat$fpc[apistrat$stype=="E"][1]-sum(apistrat$stype=="E"))/apistrat$fpc[apistrat$stype=="E"][1])*apistrat$fpc[apistrat$stype=="E"][1]^2*(var(apistrat$enroll[apistrat$stype=="E"])/sum(apistrat$stype=="E")) +
$fpc[apistrat$stype=="H"][1]-sum(apistrat$stype=="H"))/apistrat$fpc[apistrat$stype=="H"][1])*apistrat$fpc[apistrat$stype=="H"][1]^2*(var(apistrat$enroll[apistrat$stype=="H"])/sum(apistrat$stype=="H")) +
((apistrat$fpc[apistrat$stype=="M"][1]-sum(apistrat$stype=="M"))/apistrat$fpc[apistrat$stype=="M"][1])*apistrat$fpc[apistrat$stype=="M"][1]^2*(var(apistrat$enroll[apistrat$stype=="M"])/sum(apistrat$stype=="M"))) ((apistrat
```

`[1] 114641.7`

The standard error of a mean or any other population summary is, by definition, the standard deviation of that estimated summary across many independent samples of data.

```
<- rnorm(1000000)
population <- function(){sample(population,1000)}
draw_1000 <- draw_1000()
sample1 sd(sample1)/sqrt(1000)
```

`[1] 0.03156248`

`sd(replicate(1000,mean(draw_1000())))`

`[1] 0.03035286`

The replicate-weight approach to estimating standard errors computes the standard deviation of the estimated summary across many partially independent subsets of the one sample, and extrapolates from this to the standard deviation between completely independent samples. Replicate weight methods require more computation than using the Horvitz- Thompson estimator but are easier to generalize to statistics other than the mean and total (at least for software designers). Are given as example: Balanced repeated replication, Fay’s method, jackkknife and bootstrap.

An example with the bootstrap: take a sample of observations (or clusters) with replacement from each stratum and the sampling weight is multiplied by the number of times the observation appears in this sample.

```
<- svydesign(id=~1,strata=~stype,fpc=~fpc,data=apistrat)
strat_design svymean(~enroll,strat_design)
```

```
mean SE
enroll 595.28 18.509
```

```
<- apistrat$enroll[apistrat$stype=="E"]
enroll_E <- apistrat$fpc[apistrat$stype=="E"][1]
fpc_E <- apistrat$enroll[apistrat$stype=="H"]
enroll_H <- apistrat$fpc[apistrat$stype=="H"][1]
fpc_H <- apistrat$enroll[apistrat$stype=="M"]
enroll_M <- apistrat$fpc[apistrat$stype=="M"][1]
fpc_M
<- function() {(fpc_E*mean(sample(enroll_E,length(enroll_E),replace=TRUE)) +
draw1 *mean(sample(enroll_H,length(enroll_H),replace=TRUE)) +
fpc_H*mean(sample(enroll_M,length(enroll_M),replace=TRUE))) /
fpc_Msum(fpc_E+fpc_H+fpc_M)}
<- replicate(1000,draw1())
draws1000 mean(draws1000);sd(draws1000)
```

`[1] 596.3182`

`[1] 19.0782`

Very close. I love the bootstrap, it’s so intuitive.

Replicate-weight approach: computes the standard deviation of the estimated summary across many partially independent subsets of the one sample, and extrapolate from this to the standard deviation between completely independent samples. Let’s skip for now.

```
<- as.svrepdesign(strat_design,type='bootstrap',replicates=1000)
boot_design svymean(~enroll,boot_design)
```

```
mean SE
enroll 595.28 17.866
```

But we’ve already done something similar by hand, above.

Again, the most obvious advantage of designs with replicate weights in the survey package is that it is possible to compute standard errors for differences between subpopulations for arbitrary statistics; to perform an analysis that has not been implemented in the survey package.

Using replicate weights (or the bootstrap as I show) it is only necessary to write code to compute the weighted point estimates; this code is re-run on all the sets of replicate weights to estimate standard errors, which is usually easier than writing code for the linearization standard error estimates.

There are other notes, for example, the bootstrap is straightforward only when all the strata are large.

Estimators, such as the median or a regression coefficient, are obtained by solving equations that can be written as population totals or means. Standard errors for these estimates can be obtained with replicate weights. Alternatively we can use linearization, dicussed in Appendix.

The median is defined implicitly rather than explicitly: the estimated population size above and below the median are equal.

Bootstrapping them:

`<- foreign::read.dta("~/Downloads/adult.dta") # get on book website chis_adult `

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('sragef')
for 'srage_p' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('povgwd')
for 'povgwd_p' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw0' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw1' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw2' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw3' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw4' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw5' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw6' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw7' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw8' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw9' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw10' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw11' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw12' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw13' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw14' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw15' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw16' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw17' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw18' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw19' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw20' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw21' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw22' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw23' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw24' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw25' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw26' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw27' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw28' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw29' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw30' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw31' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw32' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw33' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw34' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw35' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw36' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw37' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw38' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw39' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw40' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw41' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw42' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw43' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw44' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw45' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw46' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw47' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw48' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw49' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw50' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw51' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw52' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw53' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw54' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw55' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw56' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw57' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw58' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw59' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw60' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw61' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw62' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw63' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw64' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw65' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw66' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw67' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw68' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw69' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw70' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw71' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw72' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw73' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw74' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw75' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw76' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw77' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw78' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw79' are missing
```

```
Warning in foreign::read.dta("~/Downloads/adult.dta"): value labels ('rakedf')
for 'rakedw80' are missing
```

```
<- svrepdesign(variables=chis_adult[,1:418] ,
chis repweights=chis_adult [,420:499] , weights=chis_adult[,419], combined.weights=TRUE, type="other", scale=1, rscales=1)
svyquantile(~bmi_p,design=chis,quantiles=c(0.25,0.5,0.75))
```

```
$bmi_p
quantile ci.2.5 ci.97.5 se
0.25 22.68 22.66 22.81 0.03767982
0.5 25.75 25.69 25.80 0.02763161
0.75 29.18 29.12 29.29 0.04270393
attr(,"hasci")
[1] TRUE
attr(,"class")
[1] "newsvyquantile"
```

```
<- sum(chis_adult$rakedw0)
sum_w <- order(chis_adult$bmi_p)
bmi_order <- which(cumsum(chis_adult$rakedw0[bmi_order])>(sum_w)/2)[1]
median_loc $bmi_p[bmi_order][median_loc] chis_adult
```

`[1] 25.75`

```
<- which(cumsum(chis_adult$rakedw0[bmi_order])>(sum_w)/4)[1]
median_loc $bmi_p[bmi_order][median_loc] chis_adult
```

`[1] 22.68`

```
<- function(){
boot <- chis_adult[,c("rakedw0","bmi_p")]
temp <- temp[sample(1:nrow(temp),replace=TRUE),]
temp_boot <- sum(temp_boot$rakedw0)
sum_w <- order(temp_boot$bmi_p)
bmi_order <- which(cumsum(temp_boot$rakedw0[bmi_order])>(sum_w)/2)[1]
median_loc $bmi_p[bmi_order][median_loc]
temp_boot
}<- replicate(500,boot())
v_means mean(v_means);sd(v_means)
```

`[1] 25.72364`

`[1] 0.04076844`

SE is bigger than the result from svyquantile() but actually in line with what’s in the book. Strange.

**I’m skipping many chapters. This is what I’m using most often.**

This is chapter 7.

Raking by hand:

`library(tidyverse)`

```
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.4
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.4.4 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.0
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ tidyr::expand() masks Matrix::expand()
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
✖ tidyr::pack() masks Matrix::pack()
✖ tidyr::unpack() masks Matrix::unpack()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
```

```
<- sample(c("M","F"),100,replace=TRUE)
gender <- sample(c("Y","O"),100,replace=TRUE)
age <- sample(c("HS","C","U"),100,replace=TRUE)
educ
<- data.frame(gender,age,educ)
df
<- svydesign(ids = ~1, data = df, weights = ~rep(1, nrow(df)))
design
<- data.frame(gender = c("M", "F"), Freq = c(40, 60))
gender_marg <- data.frame(age = c("Y", "O"), Freq = c(30, 70))
age_marg <- data.frame(educ = c("HS", "C", "U"), Freq = c(40, 40, 20))
educ_marg
<- rake(design,
raked_design sample.margins = list(~gender, ~age, ~educ),
population.margins = list(gender_marg, age_marg, educ_marg))
$weights <- weights(raked_design)
df<- df$weights
stored_w $weights <- 1
df<- df$weights
store_weights for (i in 1:100){
<- df %>%
mult_gender group_by(gender) %>%
summarise(sw=sum(weights)) %>%
left_join(gender_marg,"gender") %>% mutate(mult=Freq/sw)
<- df %>%
mult_age group_by(age) %>%
summarise(sw=sum(weights)) %>%
left_join(age_marg,"age") %>% mutate(mult=Freq/sw)
<- df %>%
mult_educ group_by(educ) %>%
summarise(sw=sum(weights)) %>%
left_join(educ_marg,"educ") %>% mutate(mult=Freq/sw)
<- df %>%
df left_join(mult_gender,"gender") %>%
left_join(mult_age,"age") %>%
left_join(mult_educ,"educ") %>%
mutate(weights=weights*mult.x*mult.y*mult) %>%
select(gender,age,educ,weights) %>%
mutate(weights=100*(weights/sum(weights)))
if (sum((df$weights-store_weights)^2)<0.00001){
break
}<- df$weights
store_weights
}
cor(stored_w,df$weights)
```

`[1] 1`

Standard errors: