Glossary

 

Adaptive testing

A type of exam design in which different examinees see different items based in part on their response patterns (for example, an incorrect response might result in an examinee getting an easier item), and the number of items any given examinee sees may vary depending on how quickly the adaptive algorithm determines the examinee’s ability level. Contrast with fixed-form testing and linear on-the-fly testing.

Alternative (alt)

In a multiple-choice item, the statements from which a test-taker chooses the correct response (key). More commonly known as an option.

Analytic rating

A rating of an item response based on rating several components of the response independently (contrast with holistic rating).

Bias

The degree to which an examin favors one group of examinees over another based on characteristics unrelated to what the exam is measuring, for example by stereotyping groups of people or representing situations more familiar to one group than another.

Clue

A situation in which one item provides or suggests the correct response (key) to another item.

Cognitive level

A degree of cognitive complexity of a task, often labeled according to Bloom’s Taxonomy: Knowledge/Remembering, Comprehension/Understanding, Applying, Analyzing, Evaluating, Creating.

Compensatory

A type of rating or scoring in which strengths in one area can make up for deficiencies in another.

Constructed response

A type of item in which examinees enter a response. Includes items with predetermined keys, such as calculation items, and items that may be rated using a rubric, such as essays. Contrast with selected response.

Content Area

A collection of related content that is measured on a test.

Criterion referencing

The interpretation of exam scores with respect to a set of criteria or standards for performance or proficiency. Typically used for certification and licensure testing, as well as competency-based educational testing. Contrast with norm referenced.

Cut score

The lowest possible score at which an examinee can be awarded a given level, for example the lowest possible passing score. Exam forms may have one cut score (pass-fail exams) or many (exams awarding letter grades or different proficiency levels).

Dimension

A type of domain to be measured, such as the skills being tested or the content areas to which skills might apply. A test can have multiple dimensions.

Discrimination

When referring to items, the degree to which the item differentiates examinees with the desired attribute from those without—high discrimination is a good thing in this context.

Distractor

An incorrect, yet plausible, option in a multiple-choice item. Sometimes called a foil.

Equating

A psychometric process of determining score equivalence between multiple forms of an exam, so that the results can be interpreted in the same way.

Examinee

A person taking a test.

Fixed-form testing

A type of exam design in which forms (sets of items) are determined and published in advance, and all examinees getting a given form see the same items. Contrast with adaptive testing and linear on-the-fly (LOFT) testing.

Foil

An incorrect, yet plausible, option in a multiple-choice item. More commonly known as a distractor or distracter.

Form

A set of items that meet the specifications for a given exam. An exam may have multiple forms.

Formative assessment

An assessment whose purpose is to provide information that will inform instructional decisions such as whether to spend more time on a topic. Examinees often receive extensive feedback on their responses. Contrast with summative assessment.

Holistic rating

A rating of an item response based on an overall interpretation of an examinee response, not its constituent parts (contrast with analytic rating).

Item Response Theory

(IRT)

A psychometric method that calibrates items and examinees on a single ability scale.

Item

An assessment task, such as a multiple-choice question and options, an essay prompt, or a group of labels to place on a diagram.

Key

A correct response for an item; can be either the correct option(s) for a multiple-choice item, or the specified correct answer for a constructed-response item.

Linear on-the-fly testing

(LOFT)

An exam design in which all examinees see the same number of items, and the forms meet the same specifications, but the particular items selected for each examinee vary randomly.

Multiple-choice item

A type of selected response item in which examinees are presented with several options and are asked to select one or more correct ones.

Multiple selection item

A type of multiple-choice item in which examinees are asked to select more than one answer.

Non-compensatory

A type of rating or scoring in which failure to meet any one of the criteria for a level results in not being awarded that level, regardless of strengths on other criteria.

Norm referenced

The interpretation of exam scores with respect to a group of examinees, also known as “grading on a curve.” Used for determining ranking within a large group.

Objective scoring

Evaluation of examinee responses that can be done without using human judgment; acceptable responses are spelled out in advance (contrast with subjective rating).

Option

In a multiple-choice item, the statements from which a test-taker chooses the correct response (key). Also known as an alternative (alt).

Parallel construction

In multiple-choice options, the use of identical or equivalent syntactic constructions in corresponding clauses or phrases, or the use of concepts in the same semantic area.

Grammatical: options are the same part of speech, clause type, or sentence structure

Content: all options address the content of the stem (question) in the same way, for example all dealing with reasons for an action, or all dealing with outcomes of an action

Plausible distractor

An incorrect option (distractor) that seems credible and could reasonably be perceived as the correct answer (key) by the test-taker.

Psychometrics

A field of study concerned with the theory and technique of educational measurement. It is mainly concerned with the objective measurement of examinees' skills, knowledge, abilities, and educational achievement.

Rater

A person who evaluates an examinee’s response and assigns a rating, grade, or score to it.

Rating key

Information about how to determine which responses are correct.

Rating rubric

Guidelines to raters indicating criteria for demonstrating achievement of different levels of ability or performance.

Reliability

The degree to which exam results are consistent.

Selected response

A type of item in which all the response information is provided, such as multiple-choice items, and the examinees’ task is to select or arrange the information. Contrast with constructed response.

Specific determiner

A term that, by itself, tends to indicate that a statement is correct or incorrect. For example, absolute terms such as all, never, and always tend to be associated with incorrect statements and may cue the examinee that the option with the term is a distractor, while words indicating exceptions such as some, seldom, and generally tend to be associated with correct statements and may signal a key. Such terms should be avoided in writing items.

Standard-setting

A process for determining the cut score(s) for an exam. Typically content experts evaluate all the items according to descriptions of what is expected for different performance levels, and psychometricians analyze the evaluations to arrive at a numerical cut score.

Stem

The part of the item that contains the task for the examinee.

Stimulus

Material used as the basis for examinee tasks, such as a reading passage or image.

Subjective rating

Using expert judgment to determine the correctness and/or quality of examinee responses (contrast with objective scoring).

Summative assessment

An assessment whose purpose is to provide information about examinees’ level of ability or achievement so that decisions can be made regarding examinees’ placement or the granting of credit or credentials. Feedback to examinees is typically not given. Contrast with formative assessment.

Trick question

An item that requires examinees to use skills and abilities other than the ones the item is intended to measure—to be avoided.

Validity

The degree to which the use of an exam’s score for a particular purpose is warranted.

Window dressing

Irrelevant information that is distracting and unrelated to the topic of the item.

Institute for Credentialing Excellence
Council for Adult and Experiential Learning
Competency-Based Education Network
National College Testing Association
Open Education Resource Foundation
National Council on Measurement in Education