alculate the average of the percentile ranks of the math and reading tests for every individual in each year.

Problem description
In this problem set, you will replicate – as well as possible using publicly available data – key results on the Tennessee Student-Teacher Achievement Ratio (STAR) experiment presented in Krueger (1999).

The STAR experiment was a pioneering randomized study from the field of education, designed to estimate the effects of smaller classes in primary school. The study was implemented for a cohort of kindergartners in 1985-86.

It ran for four years, until the original cohort of kindergartners was in 3rd grade.
Economists have a long tradition of trying to establish causal links between class size and student learning.

Many studies in the field of education using non-experimental data, however, suggest there is little or no link between class size and student learning.

In this problem set you are supposed to critically evaluate the experiment and to come up with estimates on the causal relationship between class size and learning.

Description of the data

The data base “Star13.dta” covers students who took part in the STAR experiment at some point between kindergarten and 3rd grade. It contains information on their birth year, race, treatment assigned, and on their learning outcomes. Further, information on their teachers’
backgrounds is available:
Id Fictitious student identification number
Sex Sex (=1 if male, =2 if female)

Race Race of the student (=1 if white, =2 if black, =3 if Asian, =4 if Hispanic, =5 if American Indian, =6 if other)
Mathk Math score in Stanford Achievement Test (SAT) in kindergarten

Readk Reading score in Stanford Achievement Test (SAT) in kindergarten
math1 (math2, math3) SAT math scores in 1st (2nd, 3rd) grade
read1 (read2, read3) SAT math scores in 1st (2nd, 3rd) grade
mathk_p (math1_p, math2_p, math3_p) Percentile rank in math scores in kindergarten (1st, 2nd, 3rd grade) (see Krueger, 1999, pp. 507/508)
readk_p (read1_p, read2_p, Percentile rank in reading scores in kindergarten (1st, 2nd,
read3_p) 3rd grade) (see Krueger, 1999, pp. 507/508)
stark (star1, star2, star3) Indicator whether student took part in the STAR
experiment in kindergarten (1st, 2nd, 3rd grade) (=1 if yes, =2 if no)
ctypek (ctype1, ctype2, ctype3) Indicator of the treatment the student received in kindergarten (1st, 2nd, 3rd grade) (=1 if small class, =2 if regular class, =3 if regular class with aide)
csizek (csize1, csize2, csize3) Class size in kindergarten (1st, 2nd, 3rd grade)
sesk (ses1, ses2, ses3) Indicator of the student’s social economic status in kindergarten (1st, 2nd, 3rd grade) (=1 if free lunch, =2 if non-free lunch)
attrition (attritionk, attrition1, attrition2) Attrition in each grade (k, 1, 2) as explained in note d. to Table 1 in Krueger (1999).
yob Year of birth
schidk (schid1, schid2, schid3) School id

In total, the data contain 11,598 student observations. Your estimates may differ from the ones presented in Krueger (1999) as you are using a public use file instead of the original data. Qualitatively, however, your results should be consistent with the ones presented by Krueger.

Recoding missing variables and creating additional variables
a. In some variables, missing values are coded as 9, 99, 999 etc. You need to recode these values to missing (“.”).
b. When generating dummy variables, be sure that the dummies are coded as missing values when the original variable has a missing value. Create dummy variables that indicate

(i.) in each grade (kindergarten, 1, 2, 3) students receiving a free lunch (freelunchk, freelunch1, freelunch2, freelunch3).

(ii.) in each grade class type “small” (smallk, small1, small2, small3), class type “regular” (regulark, regular1, regular2, regular3), and class type “regular with aide” (regular_aidek, regular_aide1, regular_aide2, regular_aide3).

(iii.) students who entered star in kindergarten, in 1st, in 2nd, and in 3rd grade (enterk, enter1, enter2, enter3). A student enters in a certain grade if this is the first grade when this student is observed in star. It does not matter for his variable whether students remain in the experiment in subsequent years or not.

(iv.) white or Asian students (combine both categories, whiteasian).
(v.) girls (girl).
c. Create an age variable containing the student’s age as of 31st December 1985 (age).

Question 1: Summary statistics
a. Calculate the share of students receiving free lunch, the share of white or Asian students, the average age in 1985, the attrition rate, average class size, and gender by the enter-variables generated above and by treatment status (small class, regular class, and regular class with aide).

Present your summary statistics for STAR participants in Table 1, structured as in Table I in Krueger (1999). You do not need to provide standard deviations.

b. Calculate the average of the percentile ranks of the math and reading tests for every individual in each year (name the variables testk etc.). (Hint: If one subtest score is missing, take the percentile score corresponding to the only available test as in Krueger (1999), fn.11.) Add the average values by the enter-variables and by treatment status to Table 1.

c. Comment on the characteristics of students assigned to the “small class” treatment, who entered STAR in kindergarten.

alculate the average of the percentile ranks of the math and reading tests for every individual in each year.