Reviews from the Mental Measurement Yearbook of The Buros Center
Review #1
Review of the Assessment, Evaluation, and Programming System for Infants and Children, Third Edition (AEPS®-3) by CATHERINE A. FIORELLO, Professor of School Psychology, Temple University, Philadelphia, PA:
DESCRIPTION. The Assessment, Evaluation, and Programming System for Infants and Children, Third Edition (AEPS-3) is a linked system of assessment, goal/outcome development, teaching/intervention, and progress monitoring. It is the latest revision of a combined assessment and curriculum for ages birth to 6 years. It is designed to assess and intervene in eight broad developmental areas: Fine Motor, Gross Motor, Adaptive, Social-Emotional, Social-Communication, Cognitive, Literacy, and Math. The test authors note that the AEPS-3 can be used by a wide range of professionals who have training in child development and learning, such as early interventionists, special education teachers, and specialists, including psychologists. The authors recommend that a professional team implement the assessment and interventions. The measure also includes family support materials, which should be completed with assistance available from a professional.
Assessment data are collected through observation, interview, and direct testing. The test authors note that observations in the natural environment and interviews with families are the preferred methods of data gathering, with direct testing reserved for items that cannot otherwise be obtained.
The AEPS-3 Test can be used as a whole, or it can be used flexibly to target individual developmental areas, to assess at the level of goals rather than the more in-depth objectives, or to assess school readiness skills using a subset of items called Ready—Set. The test is designed to determine present levels of performance, develop meaningful goals, plan effective teaching or intervention, monitor performance over time, and determine eligibility for services. Although the assessment portion of the AEPS-3 is curriculum based, cutoff scores are provided for determining or corroborating eligibility for services.
The teaching/intervention materials are closely linked to the skills assessed by the test and are designed to be implemented within routines and activities at home or at school.
DEVELOPMENT. Development of the first edition was undertaken by a consortium of professionals seeking an alternative to standardized, norm-referenced tests for use with young children with disabilities. Initial development began in 1976 with conceptual and empirical work funded by the predecessor to the Office of Special Education Programs. This work yielded a very comprehensive but overly long assessment for ages birth to 2 years in 1980, which was revised and studied before being published as the AEPS for Birth to Three Years in 1993. The AEPS for Three to Six Years followed in 1996. Both assessment and curriculum were included.
The second edition, published in 2002, included psychometric data and cutoff scores to determine eligibility for services and added a comprehensive online system that generated a variety of reports from the individual to the state level.
Development of the third edition, which combined both age groups into one system for birth to age 6, began in 2008. Items and scoring were revised to eliminate redundancy and to clarify and streamline assessment. Additional items were added at the early and late developmental age ranges of the scale, and the Literacy and Math areas were separated out and given new emphasis. A subset of items for ages 4 and up focused on school readiness (cognitive, literacy, math, communication, and social skills) was developed as a separate abbreviated test, the Ready—Set. The curricular materials were re-organized around the three-tier systems of support: Tier 1—Universal Support, Tier 2—Focused Support, and Tier 3—Specialized Support.
TECHNICAL.
Standardization. The sample employed for development of the AEPS-3 eligibility cutoff scores consisted of 874 children ages 2 months through 6 years 11 months. The sample included 412 typically developing children and 462 children receiving services under IDEA. Programs from seven states providing childcare or educational services to young children and who were currently using the AEPS Test were contacted. Teachers or providers participated in online training and had to obtain 80% agreement to collect data. Test administrators used an online system to enter a code number, chronological age, status as typically developing or receiving services, and goal scores for the eight developmental areas.
The states chosen for field testing were primarily in the Midwest and the South. Demographic information about the children assessed was limited to age, gender, and status as typically developing/receiving services. It is not known whether the sample is representative of the U.S. population based on race, ethnicity, language background, or socioeconomic status. The test authors did not provide evidence to suggest that this measure is invariant across these groups. The number of children in each age group was not reported in the test manual. However, the test authors noted that the sizes of the 0- to 6-month and 7- to 12-month age groups were particularly small, at 16 and 39, respectively.
Reliability. Internal consistency for items in each developmental area was not reported in the test manual, but person reliability estimates resulting from IRT analyses reported in the supplemental online information from a published study (Toland et al., 2022). Coefficients ranged from .74 (Fine Motor) to .91 (Adaptive), and all except Fine Motor and Math were above .80.
Interrater reliability analyses were conducted using 37 video clips of children engaging in typical activities in home and classroom settings. Teachers or providers (N = 116) completed online training, then scored the video clips using 68 AEPS-3 test items across all eight developmental areas and developmental levels. Participants’ scores were compared to the author/expert ratings of the video clips and yielded mean interrater agreement of 89%, with a range of 66% to 100%. Interrater agreement was not assessed for the separate developmental areas.
Validity. In addition to the history of initial item development by the consortium of experts, the test authors used expert reviews to examine content validity evidence. Across developmental areas and for the overall test, they reviewed qualitative feedback and comments on individual items or perceived gaps and revised test content in author work groups by consensus.
To provide further evidence of validity, the test authors showed that AEPS-3 combined scores were positively correlated with age (r = .65 to .92). In addition, a small study in Kentucky (N = 50, 68% Caucasian) showed weak to moderate correlations between AEPS-3 area scores and area scores from the Battelle Developmental Inventory, 2nd Edition (r = .24 to .65, per Grisham et al., 2021). The full range of intercorrelations was not provided in the test manual and needed to be obtained from the published study (Grisham et al., 2021) but did not show a consistently strong pattern of scores being more highly correlated across similar areas, as would be expected.
Sensitivity and Specificity. Cutoff scores for special education eligibility were derived in 6-month intervals based on the field study data (N = 874). (Analyses were attempted with 3-month intervals, but some age groups were too small to support that.) Children were coded prior to testing as eligible or ineligible for services. Based on AEPS-3 results, children were identified as eligible if they obtained at least two area scores equal to or less than the cutoff scores for their age group. Sensitivity ranged from 57% (61- to 66-month age group) to 100% (0- to 6-month and 7- to 12-month age group), while specificity ranged from 0% (0- to 6-month and 7- to 12-month age group) to 81% (67- to 72-month age group). Areas under the receiver operating characteristic curve (AUC) were statistically significant at the p < .05 level and were considered good to fair indicators of separability, with the exception of the 0- to 6-month and 7- to 12-month age groups, which had non-significant AUC and were considered poor indicators of separability.
COMMENTARY. The test authors provide in-depth information about the curriculum and its development. The test, as a curriculum-based, criterion-referenced measure, is comprehensive and clearly developmentally organized. It leads to clear teaching and intervention recommendations.
The portion of the test designed to determine eligibility for services through the use of age-based cutoff scores is technically inadequate for the task. The normative sample is not described in enough detail to determine the extent to which it is representative of the U.S. population. The youngest age groups, 0–6 months and 7–12 months, are too small to derive reliable and valid cutoff scores. The other age groups provide scores that are reliable enough to serve as screeners, but not for making high-stakes decisions. The rates of over- and under-identification are unacceptable for individual decision making.
SUMMARY. The AEPS-3 is a combined assessment and curriculum for children ages birth to 6 years. It covers eight developmental areas: Fine Motor, Gross Motor, Adaptive, Social-Emotional, Social-Communication, Cognitive, Literacy, and Math. The assessment is designed to be completed primarily through naturalistic observation and interviews, which is appropriate for a developmental assessment for this age group. Using the assessment as a criterion-referenced tool linked to the curriculum has high utility. The test manual does not provide sufficient evidence to support the use of eligibility cutoff scores. Even after consulting studies published by the test authors, no detailed descriptive information about the normative sample was found. The cutoff scores are based on an inadequate normative sample, and there is insufficient evidence of representativeness or fairness for all children. The scores are only reliable enough to use as a screener.
Review of the Assessment, Evaluation, and Programming System for Infants and Children, Third Edition (AEPS®-3) by CATHERINE A. FIORELLO, Professor of School Psychology, Temple University, Philadelphia, PA:
DESCRIPTION. The Assessment, Evaluation, and Programming System for Infants and Children, Third Edition (AEPS-3) is a linked system of assessment, goal/outcome development, teaching/intervention, and progress monitoring. It is the latest revision of a combined assessment and curriculum for ages birth to 6 years. It is designed to assess and intervene in eight broad developmental areas: Fine Motor, Gross Motor, Adaptive, Social-Emotional, Social-Communication, Cognitive, Literacy, and Math. The test authors note that the AEPS-3 can be used by a wide range of professionals who have training in child development and learning, such as early interventionists, special education teachers, and specialists, including psychologists. The authors recommend that a professional team implement the assessment and interventions. The measure also includes family support materials, which should be completed with assistance available from a professional.
Assessment data are collected through observation, interview, and direct testing. The test authors note that observations in the natural environment and interviews with families are the preferred methods of data gathering, with direct testing reserved for items that cannot otherwise be obtained.
The AEPS-3 Test can be used as a whole, or it can be used flexibly to target individual developmental areas, to assess at the level of goals rather than the more in-depth objectives, or to assess school readiness skills using a subset of items called Ready—Set. The test is designed to determine present levels of performance, develop meaningful goals, plan effective teaching or intervention, monitor performance over time, and determine eligibility for services. Although the assessment portion of the AEPS-3 is curriculum based, cutoff scores are provided for determining or corroborating eligibility for services.
The teaching/intervention materials are closely linked to the skills assessed by the test and are designed to be implemented within routines and activities at home or at school.
DEVELOPMENT. Development of the first edition was undertaken by a consortium of professionals seeking an alternative to standardized, norm-referenced tests for use with young children with disabilities. Initial development began in 1976 with conceptual and empirical work funded by the predecessor to the Office of Special Education Programs. This work yielded a very comprehensive but overly long assessment for ages birth to 2 years in 1980, which was revised and studied before being published as the AEPS for Birth to Three Years in 1993. The AEPS for Three to Six Years followed in 1996. Both assessment and curriculum were included.
The second edition, published in 2002, included psychometric data and cutoff scores to determine eligibility for services and added a comprehensive online system that generated a variety of reports from the individual to the state level.
Development of the third edition, which combined both age groups into one system for birth to age 6, began in 2008. Items and scoring were revised to eliminate redundancy and to clarify and streamline assessment. Additional items were added at the early and late developmental age ranges of the scale, and the Literacy and Math areas were separated out and given new emphasis. A subset of items for ages 4 and up focused on school readiness (cognitive, literacy, math, communication, and social skills) was developed as a separate abbreviated test, the Ready—Set. The curricular materials were re-organized around the three-tier systems of support: Tier 1—Universal Support, Tier 2—Focused Support, and Tier 3—Specialized Support.
TECHNICAL.
Standardization. The sample employed for development of the AEPS-3 eligibility cutoff scores consisted of 874 children ages 2 months through 6 years 11 months. The sample included 412 typically developing children and 462 children receiving services under IDEA. Programs from seven states providing childcare or educational services to young children and who were currently using the AEPS Test were contacted. Teachers or providers participated in online training and had to obtain 80% agreement to collect data. Test administrators used an online system to enter a code number, chronological age, status as typically developing or receiving services, and goal scores for the eight developmental areas.
The states chosen for field testing were primarily in the Midwest and the South. Demographic information about the children assessed was limited to age, gender, and status as typically developing/receiving services. It is not known whether the sample is representative of the U.S. population based on race, ethnicity, language background, or socioeconomic status. The test authors did not provide evidence to suggest that this measure is invariant across these groups. The number of children in each age group was not reported in the test manual. However, the test authors noted that the sizes of the 0- to 6-month and 7- to 12-month age groups were particularly small, at 16 and 39, respectively.
Reliability. Internal consistency for items in each developmental area was not reported in the test manual, but person reliability estimates resulting from IRT analyses reported in the supplemental online information from a published study (Toland et al., 2022). Coefficients ranged from .74 (Fine Motor) to .91 (Adaptive), and all except Fine Motor and Math were above .80.
Interrater reliability analyses were conducted using 37 video clips of children engaging in typical activities in home and classroom settings. Teachers or providers (N = 116) completed online training, then scored the video clips using 68 AEPS-3 test items across all eight developmental areas and developmental levels. Participants’ scores were compared to the author/expert ratings of the video clips and yielded mean interrater agreement of 89%, with a range of 66% to 100%. Interrater agreement was not assessed for the separate developmental areas.
Validity. In addition to the history of initial item development by the consortium of experts, the test authors used expert reviews to examine content validity evidence. Across developmental areas and for the overall test, they reviewed qualitative feedback and comments on individual items or perceived gaps and revised test content in author work groups by consensus.
To provide further evidence of validity, the test authors showed that AEPS-3 combined scores were positively correlated with age (r = .65 to .92). In addition, a small study in Kentucky (N = 50, 68% Caucasian) showed weak to moderate correlations between AEPS-3 area scores and area scores from the Battelle Developmental Inventory, 2nd Edition (r = .24 to .65, per Grisham et al., 2021). The full range of intercorrelations was not provided in the test manual and needed to be obtained from the published study (Grisham et al., 2021) but did not show a consistently strong pattern of scores being more highly correlated across similar areas, as would be expected.
Sensitivity and Specificity. Cutoff scores for special education eligibility were derived in 6-month intervals based on the field study data (N = 874). (Analyses were attempted with 3-month intervals, but some age groups were too small to support that.) Children were coded prior to testing as eligible or ineligible for services. Based on AEPS-3 results, children were identified as eligible if they obtained at least two area scores equal to or less than the cutoff scores for their age group. Sensitivity ranged from 57% (61- to 66-month age group) to 100% (0- to 6-month and 7- to 12-month age group), while specificity ranged from 0% (0- to 6-month and 7- to 12-month age group) to 81% (67- to 72-month age group). Areas under the receiver operating characteristic curve (AUC) were statistically significant at the p < .05 level and were considered good to fair indicators of separability, with the exception of the 0- to 6-month and 7- to 12-month age groups, which had non-significant AUC and were considered poor indicators of separability.
COMMENTARY. The test authors provide in-depth information about the curriculum and its development. The test, as a curriculum-based, criterion-referenced measure, is comprehensive and clearly developmentally organized. It leads to clear teaching and intervention recommendations.
The portion of the test designed to determine eligibility for services through the use of age-based cutoff scores is technically inadequate for the task. The normative sample is not described in enough detail to determine the extent to which it is representative of the U.S. population. The youngest age groups, 0–6 months and 7–12 months, are too small to derive reliable and valid cutoff scores. The other age groups provide scores that are reliable enough to serve as screeners, but not for making high-stakes decisions. The rates of over- and under-identification are unacceptable for individual decision making.
SUMMARY. The AEPS-3 is a combined assessment and curriculum for children ages birth to 6 years. It covers eight developmental areas: Fine Motor, Gross Motor, Adaptive, Social-Emotional, Social-Communication, Cognitive, Literacy, and Math. The assessment is designed to be completed primarily through naturalistic observation and interviews, which is appropriate for a developmental assessment for this age group. Using the assessment as a criterion-referenced tool linked to the curriculum has high utility. The test manual does not provide sufficient evidence to support the use of eligibility cutoff scores. Even after consulting studies published by the test authors, no detailed descriptive information about the normative sample was found. The cutoff scores are based on an inadequate normative sample, and there is insufficient evidence of representativeness or fairness for all children. The scores are only reliable enough to use as a screener.