|

www.beelddenkers.nl
| |
The Snijders-Oomen nonverbal intelligence tests:
general intelligence tests or tests for learning potential?
P.J. Tellegen
&
J.A. Laros
Introduction
Traditional tests for general
intelligence (GI-tests) like the Stanford-Binet and the Wechsler
intelligence tests are criticized by advocates of Learning Potential
tests (LP-tests) on the point that these tests measure the end result of
prior learning rather than learning potential. By merely reflecting the
end result of prior learning GI-tests would underestimate the learning
ability of persons who have had fewer opportunities to acquire the
knowledge and skills to perform well in a test situation. In particular,
members of ethnic minorities, persons from lower socioeconomic
background and persons with learning problems would be at a disadvantage
when tested with a GI-test. A related point of criticism implies that
GI-tests provide no information on the growth in performance to be
expected given optimal learning conditions. As a result these tests
would not discriminate sufficiently between mentally retarded and
learning disabled children.
Common to LP-tests is the inclusion of training in the test design,
either as a separate phase or incorporated in a single test
administration. The aim of the training is to eliminate differences due
to previous educational or cultural opportunities and to optimize the
learning conditions. Although research on learning potential has been
going on for several decades (see Guthke, chapter 3 in this volume),
practical instruments for general assessment have only recently become
available.
GI-tests have also been criticized on the basis of their contents by
advocates of culture fair intelligence tests. Because the tests often
make an appeal to specific language skills, both in test contents and
instructions, these tests would place members of cultural minority
groups at a disadvantage. This argument also applies to persons with
hearing-, speech- and language problems. For all these groups, low
performance on a GI-test might primarily reflect poor verbal knowledge
instead of poor reasoning or learning ability. This criticism of
GI-tests has led to the development of nonverbal intelligence tests
which aim at minimizing the reliance on acquired knowledge and verbal
ability, such as Raven's Progressive Matrices (Raven, 1938) and
Cattell's Culture Fair Intelligence Test (Cattell, 1950).
In the early forties, Snijders-Oomen (1943) constructed a nonverbal
intelligence scale (SON) intended for the assessment of deaf children.
Intelligence was defined by her in terms of learning ability; the extent
to which children could profit from instruction at school. The SON-test
developed by Snijders-Oomen was the first test that covered a wide area
of intelligence without being dependent on the use of language. The
scale has been revised several times and is especially suited for the
intelligence assessment of immigrant children and children with hearing-,
speech- and language problems. In the context of learning potential
tests a closer look at a nonverbal test like the SON can be valuable in
showing that explicit training is not the only alternative to general
intelligence tests for 'fair' testing of special groups.
In this chapter the latest revision of the SON-test, the SON-R 5.5-17,
will be described. After a short summary of the history of the SON-tests
and the characteristics of the SON-R, the most important psychometric
qualities and research results with hearing and deaf subjects will be
reviewed. In the discussion special attention will be given to the
similarities and differences of the SON-R compared to general
intelligence tests and learning potential tests.
History of the SON-Tests
In her work with children at an
institute for the deaf Snijders-Oomen was confronted with problems of
assessing the learning ability of children who were severely handicapped
in their language development. General intelligence tests were not
suited for this purpose due to reliance on verbal skills, while
nonverbal tests at that time consisted mainly of performance tests
related to spatial abilities (like mazes, form boards, mosaics). After
extensive experimentation with existing and newly developed tasks she
constructed a test series which also included nonverbal subtests related
to abstract and concrete reasoning. Capacities for abstraction and
combination were considered especially important for the ability to
participate in the educational system (Snijders-Oomen, 1943, pp. 25-28).
Mental age norms were constructed for deaf children from 4 to 14 years
of age.
In the subsequent revision the test series was expanded and standardized
for deaf and hearing children from 3 to 17 years (Snijders &
Snijders-Oomen, 1970). With the second revision, different series of
tests were developed for younger and older children respectively, the
SON 2.5-7 (Snijders & Snijders-Oomen, 1976), and the SSON for the ages
of 7 to 17 years (Starren, 1978). The latest revision, published in
1989, is the SON-R 5.5-17 (Snijders, Tellegen & Laros, 1989; Laros &
Tellegen, 1991). A new revision of the test for the preschool age group
will be published in 1994. Common to the revisions of the SON-tests is
the primary goal to examine a broad spectrum of intelligence without
being dependent on language. Due to the nonverbal character of the
SON-tests, the test materials can be used internationally without
modifications; the manual of the SON-R has been published in the English,
German and Dutch languages.
The SON-R 5.5-17
Composition of the SON-R 5.5-17
In sequence of administration the test series consists of the following
7 subtests:
- Categories: The subject is
shown three drawings of objects or situations that have something in
common. The subject has to discover the concept underlying the three
pictures and is required to choose, from five alternatives, those
two drawings which depict the same concept. The difficulty of the
items is related to the degree of abstraction of the underlying
concept. For example, in an easy item the concept is 'fruit' and in
one of the most difficult items the concept is 'art'.
- Mosaics: Various mosaic
patterns, presented in a booklet, have to be copied by the subject
using nine red/white squares. There are six different sorts of
squares. With the easy items, only two sorts are used while all six
sorts are used with the difficult items.
- Hidden Pictures: A certain
search object (for instance a kite) is hidden fifteen times in a
drawing. The size and the position of the hidden object varies.
After focusing on the search object, the subject has to indicate the
places where it is hidden.
- Patterns: In the middle of
a repeating pattern of one or two lines a part is left out. The
subject has to draw the missing part of the lines in such a way that
the pattern is repeated in a consistent way. The difficulty of the
items is related to the number of lines, the complexity of the line
pattern and the size of the missing part.
- Situations: The subject is
shown a picture of a concrete situation in which one or more parts
are missing. The sub-ject has to choose the correct parts from a
num-ber of alternatives in order to make the situa-tion logically
coherent.
- Analogies: The items
consist of geometrical figures with the problem format A:B=C:D. The
subject is required to discover the principle behind the
transformation A:B and apply it to figure C. Figure D is not
presented and has to be selected from four alternatives. The
difficulty of the items is related to the number and the complexity
of the transformations.
- Stories: The subject is
shown a number of cards that together form a story. The subject is
given the cards in an incorrect sequence and is required to order
them in a logical time sequence. The number of cards that are
presented varies from four to seven.
The diversity in tasks and testing
materials has the advantage of making the test administration attractive
for the subjects. Categories, Situations and Analogies are multiple
choice tests, the remaining four tests are so called 'action' tests. In
the action tests the solution has to be sought in an active manner which
makes observation of behaviour possible. Although no observation system
is provided with the SON-R, and no data regarding the reliability and
validity of observations were collected, many users of the SON-tests
appreciate the possibilities for behaviour observation. It is the main
reason why the SON-'58 remained in use after the publication of the SSON
as in the latter test all subtests were in multiple choice form.
One can divide the SON-R into four types of tests according to their
contents: abstract reasoning tests (Categories and Analogies), concrete
reasoning tests (Situations and Stories), spatial tests (Mosaics and
Patterns) and perceptual tests (Hidden Pictures). The abstract reasoning
tests are based on relationships that are not bound by time and place; a
principle of order has to be derived from the presented material and
applied to new material. For nonverbal testing of abstract reasoning,
classification tests and analogy tests are widely used. In the concrete
reasoning tests the objective is to bring about a realistic time-space
connection between objects. Emphasizing either the spatial dimension or
the time dimension leads to two different test types. In the so-called
completion tests (Situations), the task is to bring about an imperative
simultaneous connection between objects within a spatial whole. In the
other type (Stories), the object is to place different scenes of an
event in the correct time sequence. The concrete reasoning tests show an
affinity to tests for social intelligence in which insight in social
relationships and behaviour is emphasized. In the spatial tests a
relationship between parts of an abstract figure has to be established.
Mosaics is a widely known test-type which was included in the earlier
SON-tests; the new subtest Patterns is especially developed for the
SON-R. In the perceptual test, Hidden Pictures, one must discover a
certain figure hidden in an ambiguous stimulus pattern. This subtest,
which is also new for the SON-tests, represents the factor 'flexibility
of closure', differentiated by Thurstone.
In contrast to the earlier versions of the SON-tests, the SON-R does not
include short-term memory-span tests. As Estes (1982) notes, the way
information is organized and retrieved from long-term memory seems much
more relevant than short-term memory in assessing the ability of
children to succeed in school, where virtually all instruction is
presumably intended to deal with long-term memory for the material
learned. Although we tried to develop alternatives for the short-term
memory tests, it was found to be too complicated and time consuming to
integrate nonverbal subtests concerning long-term memory and memory
strategies in the SON-R.
Construction of the subtests
The subtests of the SON-R have been systematically constructed on the
basis of a theory of item difficulty. The intention of such a theory is
to cover the most important factors that contribute to the difficulty of
items in a subtest. With the help of such a theory items can be ordered
as subsequent, logical steps in the mastery of a specific problem type.
A theory which is successful in explaining the progressive difficulty in
a subtest has two important advantages. In the first place, it creates
the possibility of designing items with a certain degree of difficulty,
and of performing a systematic test construction. Secondly, one obtains
a rational basis for interpreting failure at a certain level of
difficulty. Especially for the subtests Mosaics, Patterns and Analogies
we succeeded in developing effective theories of item difficulty. The
unidimensional scaling model developed by Mokken (1971) was used in
selecting the items. In terms of the model, the subtests are reasonable
strong scales with H-coefficients close to .50.
Test administration
Like most intelligence tests for children, the SON-R is individually
administered. Group-administration is less suited for nonverbal
instructions and for motivating young subjects, and would exclude
behaviour observation. The role of time scoring is kept to a minimum. In
this sense, the SON-R is a typical power test; there is a large
variation in the difficulty of the items, while there is sufficient time
for solving each item. The time needed to administer the SON-R varies
from 1 to 2 hours with an average of 1.5 hours. There is a shortened
version of the SON-R, consisting of four subtests: Categories, Mosaics,
Situations and Analogies. The administration of this shortened version
takes about three quarters of an hour.
For the subtests of the SON-R there are verbal and nonverbal
instructions which have been made as equivalent as possible. Nonverbal
instruction forms the point of departure, verbal parts are added as
accompaniment and not as supplementary information. The two sets are not
intended for use as two exclusive alternatives but they give, in a
different form, essentially the same information. With deaf and hearing
disabled children one can often use an intermediate form by combining
the nonverbal instructions with (parts of) the verbal instructions. In
practice, the choice between the two procedures is generally not a
problem; one adjusts to the form of communication the subject is used
to.
In two important aspects the SON-R distinguishes itself from traditional
intelligence tests with regard to the test procedure: firstly, by giving
feedback to the subject and, secondly, by the use of an adaptive
procedure in presenting test items. It is tradition in intelligence
testing not to give feedback whether the subject's answer is right or
wrong. This tradition is broken in the SON-R because we think that such
behaviour is not natural. When no reaction is allowed following an
answer, the examiner's attitude can be interpreted by a subject as
indifference or, erroneously, as an indication that the answer was
correct. In the SON-R, the subject is told whether the answer was
correct or incorrect following each item. However, this does not include
an explanation of why an answer is incorrect. One of the advantages of
giving feedback is that the subject has the opportunity to change his
problem solving strategy. Also, when a subject has interpreted the
instructions incorrectly, feedback offers the opportunity to adjust.
The second important difference of the test procedure of the SON-R with
common test procedure concerns adaptive testing. In intelligence tests
for children with a wide age range, the difficulty of the test items has
to be very divergent. Presentation of all items to every subject is
troublesome for a number of reasons. In the first place, this would
greatly extend the duration of the test. In the second place, it is
frustrating for young or less intelligent subjects to be required to
solve many items that are too difficult, while the motivation of older
and more intelligent subjects is reduced when they are required to solve
many problems that are too easy. A practical solution, often followed,
consists of presenting all items in order of difficulty and applying a
discontinuation rule. However, this procedure does not result in
eliminating items that are too easy for a specific subject, and the
procedure has the effect that the items on which the subject fails often
occur in successive order, which can be highly frustrating. In recent
years, adaptive test procedures have been developed which restrict the
presentation to those items that are most suited for the specific
subject. These adaptive procedures have the goal of effectively limiting
the number of items to be administered with relatively little loss of
reliability (Weiss, 1982). With computerized testing, these procedures
can easily be implemented; with non-computerized testing, there are
great practical difficulties for the examiner, both in selection and
presentation of the most informative items. The SON-R uses an effective
adaptive test procedure by dividing the subtests into either two or
three parallel series of about 10 items. The difficulty increases
relatively fast in the series. The first series of items serves to
estimate the subject's general level of performance. The series is
broken off after two errors. Those items in the following series that
can most effectively improve and refine the measurement are administered
by skipping easy items and by stopping again after two errors. This way,
the administration is determined by the subject's individual performance
and the presentation is limited to the most relevant items. For the
examiner this method has the advantage of presenting the items within a
series in a fixed sequence. Thus, searching in the test booklet for the
item which has to be presented next, takes place only at the beginning
of a new series. For the subject, it is motivating that relatively easy
items are presented after two errors.
Psychometric characteristics
Standardization
The standardization of the test scores of the SON-R is based on a
nationwide sample of 1350 subjects varying in age from 6 to 14 years.
Per age group the sample consisted of 150 subjects and was stratified
according to sex, educational type and demographic variables. The
population was restricted to persons residing in The Netherlands for at
least one year who were not suffering from severe physical or mental
handicaps.
From 6 to 14 years, test performance strongly increases with age; 66% of
the variance of the raw total score is explained by age. To make
comparisons between subjects of different ages possible, a
standardization of test scores dependent on age is required. In
practice, such standardizations are often performed on the separate age
samples. For the SON-R a model has been developed in which the
cumulative proportions of the raw scores in the nine age groups are
simultaneously fitted as a higher order function of raw score and age.
This method yields population estimates of the score distributions which
are more reliable (by combining the information of all the age groups),
more consistent (by imposing constraints on the form of the functions),
and which can be computed for any specific age. By using this model it
was possible to extrapolate the age norms to 5;6 and to 17;0 years.
Dependent on the age of the subject, the raw subtest scores are
normalized and standardized, thus reflecting the relative position of an
individual compared to persons of the same age. The total score on the
test is based on the sum of the standardized subtest scores. In the
SON-R manual, norm tables for 38 age groups are presented. Even more
accurate norms are obtained by using the computer program which is
supplied with the test. The program computes norms based on the exact
age of the subject.
Reliability and generalizability
The reliability of the standardized subtest scores depends on the
correlations between the item responses. Since, with the adaptive
procedure, almost 50% of the items are not actually administered, the
correlations between the item scores are systematically and artificially
enhanced because not administered easy items get a score of '1' and not
administered difficult items get a score '0'. As a result, usual
formulas will overestimate the reliability. A separate study has been
conducted to achieve unbiased estimates. In this study the subtests were
(almost) completely administered. Score patterns of the complete
administration were compared to score patterns computed as if the
adaptive procedure had been applied. It appeared that as a result of the
adaptive procedure the correlations between the subtests decreased,
while the computed alpha coefficient was higher compared to the alpha of
the complete administration. The outcome indicates that (averaged over
subtests and age groups) the actual reliability with the adaptive
procedure is .10 lower than computed by coefficient alpha. The actual
reliability of the adaptive procedure is .05 lower than the reliability
with complete subtest administration. The corrected reliability of the
subtests of the SON-R is .76 on the average. The most reliable subtests
are Mosaics, Patterns and Analogies.
In classical test theory, reliability refers to the stability of
hypothetic independent repeated measurements (see Lord & Novick, 1968).
In the theory of generalizability the items are considered to be a
sample from a domain of comparable items and the internal consistency of
the item scores indicates how valid it is to generalize from the outcome
of the sample to the entire item domain (Cronbach, Rajaratnam & Glaser,
1963; Nunnally, 1978). For homogeneous item sets, both approaches are
almost equivalent. For the total score on an intelligence test that is
composed of several subtests, all partly measuring separate components,
an important distinction between reliability and generalizability can be
made. With the reliability of the total score (stratified alpha;
Nunnally, 1978, p. 246), the possibilities for generalization remain
restricted to the specific contents of the subtests. For the
interpretation of individual outcomes it will be more relevant to
generalize to the entire domain of comparable subtests, and to consider
the subtests as a restricted sample of the domain that is important for
the assessment of intelligence. In the latter case, the number of
subtests and the mean correlation between the subtests determine the
coefficient of generalizability. This can be computed by the usual
coefficient alpha in which the subtests are the unit of analysis
(Nunnally, 1978, p. 212).
For the SON-R, the reliability of the total score (alpha stratified) is
.93. The generalizability of the total score (alpha) increases from .81
at six years to .88 at fourteen years, with a mean value of .85. For the
shortened version of the SON-R, the reliability has a value of .90 and
the generalizability has a mean value of .77.
Stability through time
Test-retest research has not yet been carried out with the SON-R. In
the research with the SON-R with deaf subjects, test results on earlier
versions of the SON were available for 434 subjects. The mean
correlation of the SON-R with earlier versions of the SON is .76, and is
related to the age at administration of the first test and to the lapse
of time between the two administrations. As is the case in American
research on general intelligence tests, stability increases with age and
with shorter time intervals (Bayley, 1949).
Internal relationships
The correlations between the standardized subtest scores steadily
increase with age. The mean value is .38 at six years and .51 at
fourteen years. Correcting for the unreliability of the test outcomes,
the correlations increase from .52 to .68 with a mean value of .61.
Although the test scores cohere to an important degree, multiple
correlations per subtest with the six other subtests show that a
substantial part of the reliable variance per subtest is unique and
cannot be explained by the other subtests (averaged over the subtests,
this percentage is 47% at six years, and 32% at 14 years).
To investigate whether the interrelations can be explained by
uncorrelated components, and whether these components correspond to the
division of the subtests in tests for concrete reasoning, tests for
abstract reasoning, spatial tests and perceptual tests, principal
components analysis has been performed on the correlation matrix after
correction for attenuation. The dominance of the first component is
quite strong; it is the only component with an eigenvalue greater than
one and the percentage of explained true score variance is 59% at 6
years and 72% at 14 years. This indicates that the subdivision of the
subtests in four categories is not of major importance. For six year
olds, the loadings on the first four varimax rotated components confirm
to a great extent the above mentioned categorization of the subtests;
for the older subjects, most subtests have high loadings on several
components. The structural characteristics of different groups
(hearing/deaf, native/immigrant) are highly similar.
Presentation and interpretation of the
results
Given the above-mentioned
characteristics, the total score of the SON-R provides a reliable and
generalizable indication of nonverbal intelligence. The subtest scores
add information concerning specific abilities. For the interpretation of
the standardized subtest scores and the IQ scores, reliability is taken
into account in two different ways, namely by representing the scores as
norm scores and as latent scores. For both types of scores, the basis of
the standardization is that the distribution of true scores has a
population mean of 100 and a standard deviation of 15. The norm score
and the latent score are different approaches of estimating the true
score and they are used for different purposes.
The norm score is defined as the sum of the standardized true score and
the error of measurement. This unbiased estimate of the true score is
used for hypothesis testing, research on groups, and for computation of
the total test score. Although the more common standard scores (with
standardized observed scores instead of standardized true scores) are
also unbiased estimates of their true scores, these true scores do not
have a fixed distribution (the standard deviation is dependent on the
reliability) which means that for standard scores on different
(sub)tests there is no sensible basis for the comparison of their true
scores.
The latent score is the estimate of the true score computed by means of
linear regression. In combination with the accompanying probability
interval of the true score, the latent score is best suited for
individual interpretation of the test results and for intra-individual
comparison of subtest scores. The latent scores are presented
graphically on the scoring form. In the computation of the latent
subtest scores, the correlations between the subtests are used to
improve the prediction of the true score of each subtest. To predict the
true score of a specific subtest, also the performance on the other
subtests enters the multiple regression equation. The problem of
exaggeration of intra-individual differences in the profile is thereby
avoided.
Latent scores for the IQ are computed in two ways, denoted as specific
IQ and as generalized IQ. The regression of the specific IQ is based on
the reliability of the total score; the regression of the generalized IQ
is based on the coefficient of generalizability. The latter score is the
estimated performance on the entire domain of comparable intelligence
tests, and is best suited for interpretation of the test result as level
of intelligence.
Next to these scores which take reliability into account, some
descriptive characteristics of the test results are also presented. The
reference age is given for the subtests and for the total score; it
represents the age at which a specific test result corresponds with a
standardized score of 100. This 'mental' age makes interpretation from a
developmental viewpoint possible. The total score is also presented as a
standard IQ (fixed population standard deviation of 15) with the
corresponding percentile scores for the general hearing population and
for the population of the deaf. In contrast to earlier versions of the
SON, no separate subtest-norms are computed for the deaf.
Compared to intelligence tests that only present standard scores (which
do not take measurement errors, and errors of generalization into
account) the SON-R offers several extra possibilities for a
psychometrically sound interpretation of test results. These
possibilities require additional work when using the norm tables, but
when using the computer program all results are automatically computed
and printed.
Validity
In the standardization research with
the hearing subjects, data have been collected to substantiate the
validity of the test. Separate research has been performed with deaf
children to develop supplementary test norms and to further validate the
test for this particular group. The main findings will be summarized
below.
Sex differences
Between boys and girls, there is no difference in mean IQ scores. A
significant relation (p<.01) with sex is found only for Mosaics; girls
score somewhat lower than boys on this subtest.
Socio-cultural factors
For hearing as well for deaf native Dutch subjects, there is a
relatively strong association between occupational level of the parents
and the IQ scores. The mean difference between children of unskilled
workers and professionals (the two extremes for six categories of
socioeconomic status) is about 15 IQ points. Of the 7 subtests,
Analogies shows the strongest relation with occupational level.
In the research with hearing subjects, substantial differences in test
performance on the SON-R exist between immigrant children (based on
country of origin of the parents) and native Dutch children. The mean IQ
score for the Moroccan and Turkish children is 84, compared with a mean
score of 100.5 for the native Dutch children. The lag of the other
immigrant children is small (mean IQ is 99). Comparable differences
occur in the deaf research group, except that the lag of deaf children
from Surinam and The Netherlands Antilles is also considerable. For deaf
and hearing subjects, ethnic differences in performance concern all
subtests, but are most pronounced for Mosaics and Analogies. Neither for
the hearing, nor for the deaf immigrant children a relation exists
between number of years residing in The Netherlands and the test scores.
This indicates that lack of knowledge of the Dutch language is not an
important cause of their lower results. The differences between native
and immigrant children can for a great part be explained by differences
in socioeconomic status of the parents, as most parents of immigrant
children belong to the lower occupational levels. The difference between
native and immigrant children decreases with about one third after
controlling for socioeconomic status.
Educational variables
Because school achievement is strongly related to intelligence, and
prediction of school success is an important goal of intelligence
assessment, the relationship with school career is one of the most
direct indications of the validity of an intelligence test. For the
SON-R, the relationship of test performance with school career has been
examined by stepwise multiple regression for three indicators which
appear to play a different role at different ages. These indicators are
differentiation to type of school (like special education, general
education), grade repetition, and report marks.
In primary education, the relation of school type with the IQ scores is
limited; the difference between pupils of special education and general
education is considerable (16 IQ points), but relatively few pupils are
in special schools. Grade repetition relates strongly to the IQ scores.
A relatively large group of pupils in primary education have repeated
one or more grades and they have a lag in IQ scores of almost 19 points.
Report marks also add to the explained variance of the IQ scores; for
the younger group of primary education this is 10% and for the older
group it is 16%. The correlations of the IQ score with school subjects
like language, arithmetic, and history/geography are of the same order.
The multiple correlation of the different indicators of the school
career with the IQ scores is .54 in the age group of 7-9 years, and .60
in the age group of 10-11 years. For the children in secondary
education, the multiple correlation increases to .63. For these children
the relation is almost completely determined by the differentiation into
school type; grade repetition and report marks add little to the
explained variance.
In many primary schools a school achievement test is administered at the
end of the sixth grade. Scores on this test were available for 49
subjects. The correlation with the SON-R IQ is .66. The correlations
with the different parts of the achievement test (language, arithmetic
and information processing) are largely similar.
Other intelligence tests
A group of 36 children from an outdoor psychiatric clinic has been
tested with the SON-R, the WISC-R (Vander Steene et al, 1991) and the
Raven-SPM. The distributions of IQ scores on the SON-R (m=97.1; sd=16.4)
and the WISC-R (m=96.1; sd=16.0) are highly similar (Nieuwenhuys, 1991).
The scores on the Raven (based on English norms) have the same mean but
a smaller standard deviation. The correlation of the SON-R with the
WISC-R is .80; the correlation with the verbal part of the WISC-R is .65
and the correlation with the performance part is .79. Both SON-R and
WISC-R correlate .71 with the Raven.
Performance of deaf children
Starting with the first version, the deaf population has received
special attention in the SON-tests. Next to the nationwide sample of
hearing subjects, almost the complete population of deaf pupils from
6-14 years of the Institutes for the Deaf and the Schools for the
Partially Hearing, with a hearing loss of at least 90 dB, have been
examined with the SON-R.
The total group of 768 deaf children has a mean IQ of 90. The difference
with hearing children is reduced to 8.5 points when we control for the
proportion of immigrants, which is four times as large as in the hearing
group. After controlling for occupational level, the difference between
the native deaf and hearing subjects becomes 7.7 points. Further
analysis shows that this lag in performance of the deaf children is
strongly related to the presence of multiple handicapped children in the
deaf population (about 25%). Several causes of deafness, such as
complications during pregnancy and birth, and meningitis and
encephalitis, can also be the cause of mental retardation. Excluding the
multiple handicapped, the lag of the deaf children is 4 IQ points which
is mainly related to the subtests for abstract reasoning.
The correlation of multiple handicaps and teacher's evaluation of
intellectual insight with the IQ scores is .63 and this increases to .66
by also including specific evaluations of cognitive handicaps,
communicative handicaps and accuracy. The IQ scores correlate .49 with
the STADO-R, a written language test for the deaf (de Haan & Tellegen,
1986). This test consists of four parts, that is synonyms, word order,
idiom, and prepositions-conjunctions.
Discussion
The SON-tests have been developed as
an alternative to general intelligence tests for the assessment of
cognitive functioning of various groups of children who are handicapped
in the area of verbal communication. With the latest revision, the
SON-R, this has resulted in a test series which deviates from general
intelligence tests in contents and in administrative procedures. In this
section we will compare the SON-R both with GI-tests and with LP-tests.
The main difference between GI-tests and LP-tests is the help which is
offered to the subject. In GI-tests items are presented only once, often
with minimal instruction, and no training and feedback are given during
test administration. With LP-tests help is given in the form of extended
instructions, feedback, and training at the level at which the subject
fails to succeed. The score on the LP-test reflects test performance as
a result of the interactive help procedure.
Although no formal training is given in the SON-R there are several
elements of the administration that facilitate learning opportunities
during testing. These elements are: (a) the several examples given with
each subtest, (b) the feedback which informs the subject whether the
answer is correct and (c) the adaptive procedure by which easier items
are presented after some failures. In this respect the SON-R shares
important aspects of the testing procedure with LP-tests. The element of
training is even more pronounced in the SON-test for preschool children.
In the SON 2_-7 extensive feedback is given to the child after each
failure by presenting the correct solution.
A second consideration for the comparison of tests relates to test
contents and the specific abilities that are measured. Most general
intelligence tests consist of a verbal and a performance scale. The
verbal part, which also includes quantitative reasoning tasks,
emphasizes crystallized abilities which are greatly influenced by
schooling and also by more general experiences outside of school
(Thorndike, Hagen & Sattler, 1986, p. 4). The performance part is more
related to spatial-visualization abilities. Subtests that focus on
fluid-reasoning abilities, like analogies, classification and series
completion are included in either the verbal or the performance part,
depending on whether the elements of the items are verbal or figural.
The SON-R only contains subtests with a nonverbal content thereby
excluding subtests specifically aimed at measuring verbal ability and
quantitative reasoning. However, the composition of the SON-R in terms
of intelligence factors is wider than the performance part of most
GI-tests since it is less dominated by spatial tests. Four of the seven
SON-R subtests are fluid-reasoning tests in a nonverbal form.
Research with learning potential has for a large part been carried out
with existing verbal and nonverbal subtests. For instance Schroots
(1979) used subtests from the Leiden Diagnostic Test (Schroots & Alphen
de Veer, 1976); Hamers and Ruijssenaars (1984) used four subtests from
several intelligence tests; Spelberg (1987) used subtests from the SON-R
for a testing the limit procedure and Resing (1990) used two subtests
from the RAKIT (Bleichrodt, Drenth, Zaal & Resing, 1984). LP-tests are
not characterized by the specific contents of the tests but by the
inclusion of training in the testing procedure. When the goal of these
tests is to eliminate differences due to prior opportunities, nonverbal
tests and training procedures which do not require verbal skills seem to
be appropriate. Such a nonverbal LP-test for general use, the Learning
potential test for Ethnic Minorities (LEM), has recently been published
in The Netherlands (Hamers, Hessels & van Luit, 1991). In table 1 we
have summarized the main differences between the SON-R, the WISC-R, as
an example of a GI-test, and the LEM, as a somewhat special example of a
LP-test.
Table 1: Comparison of the WISC-R,
the SON-R and the LEM
|
. |
WISC-R |
SON-R |
LEM |
|
test contents |
verbal and nonverbal |
nonverbal |
nonverbal |
|
instructions |
verbal |
verbal and nonverbal |
nonverbal |
|
examples |
limited |
extended |
extended |
|
feedback |
none |
simple |
extended |
|
adaptive procedure |
age dependent |
individual |
individual |
As this table suggests there is a greater correspondence between the
SON-R and the LEM than between the SON-R and the WISC-R. To the extent
that possibilities for learning during test administration are offered:
the SON-R has a position in between the two other tests. With regard to
test contents: the SON-R is more similar to the LEM. Although we
classified the LEM as nonverbal, two subtests are related to verbal
ability, but they do not make use of meaningful words. One subtest
measures the learning of relations between meaningless words and objects
and the other measures memory of series of syllables. However, they are
nonverbal in the sense that they are not dependent on knowledge of a
specific language.
The analysis, thus far, of the different tests leads to the conclusion
that the question 'SON-R, a general intelligence test or a test for
learning potential?' is too simplistic; for a classification of tests
more dimensions are needed. One dimension of ordering tests concerns the
possibilities for learning during administration. On this dimension
LP-tests score high although there is a great diversity in the amount of
help and the type of training that is being offered. Traditional
intelligence tests score low on this dimension and the position of the
SON-R is somewhere in between. A second dimension concerns the use of a
specific language in instructions and test materials. Nonverbal tests
like the SON-R, LEM and the Raven aim at minimizing this aspect. A
third, and very complex, dimension concerns the different cognitive
aspects that are represented by the test, like verbal-, spatial- and
reasoning abilities and memory, and the extent to which the measurement
of these abilities depends on knowledge learned at school and/or the
cultural environment. Not only between, but also within the domains of
nonverbal tests, GI-tests and LP-tests, there are great variations in
test composition. However, nonverbal tests are more restricted since
they do not directly measure crystallized verbal ability.
The differentiation between intelligence tests is also reflected in
definitions of intelligence. The aspects of knowledge, problem solving
and ability to learn are stressed to different degrees, both in
definitions and in tests. Which test is 'the best' can only be
determined for specific situations on an empirical basis by looking at
the validity with regard to relevant theoretical and practical
questions. However, the comparison of tests is a very complex matter,
not only between separate studies because of differences in populations
and criterium measures, but also within a study it can be difficult to
differentiate between the effects of reliability and the multiple
factors related to content and administration on the test scores. When,
for example, immigrant children score higher on test A than on test B
this might be the result of differences in reliability of the tests
(when standard scores are used) and not result from differences in
contents and procedures.
In our opinion, the research results with the SON-R indicate that the
test is a useful instrument for the nonverbal examination of children's
intelligence, with high reliability and ample indications of the
validity. The variety of tasks and test materials is stimulating for the
subject and the adaptive procedure avoids repeated presentation of
excessively difficult items. An objection to a nonverbal test like the
SON-R might be that the concept of intelligence is substantially
narrowed by the exclusion of verbal ability tests. However by including
tests for concrete and abstract reasoning - areas that often have a
verbal form in general intelligence tests - the contents of the SON-R
are not limited to typical performance tests. Although the test can be
administered without using language, this does not exclude the
importance of verbal abilities for the evaluation of intelligence with
the SON-R, as is illustrated by the correlations of the test with report
marks and tests for language skills. Verbal intelligence tests often
require specific knowledge learned in school. When the main object of
using a test is to make predictions concerning school achievement, the
absence of verbal tests in the SON-R might reduce its predictive power.
If, however, the goal of intelligence assessment is to distinguish
between possible causes of poor school performance, a test that is not
dependent on specific knowledge is more appropriate. In such cases use
of the SON-R is not only indicated for special groups such as deaf and
immigrant children, but also suited for children with no specific
problems in the areas of language and communication.
References
Bayley, N. (1949). Consistency and variability in the growth of
intelligence from birth to eighteen years. Journal of Genetic Psychology,
75, 165-196.
Bleichrodt, N., Drenth, P.J.D., Zaal, J.N. & Resing, W.C.M. (1984). Revisie
Amsterdamse Kinder Intelligentie Test, Handleiding [Revision Amsterdam Child
Intelligence Test, Manual]. Lisse: Swets & Zeitlinger.
Cattell, R.B. (1950). Handbook for the individual of group Culture Fair
Intelligence Test. Scale I. Champaign, Ill: I.P.A.T.
Cronbach, L.J. Rajaratnam, N. & Gleser G.C. (1963). Theory of
generalizability: a liberalization of reliability theory. British Journal of
Statistical Psychology, 16, 137-163.
Estes, W.K. (1982). Learning, memory and intelligence. In R.J. Sternberg (Ed.),
Handbook of human Intelligence. Cambridge: Cambridge University Press.
Haan, N. de & Tellegen, P.J. (1986). De herziening van een schriftelijke
taaltest voor doven [The revision of a written language test for the deaf].
Groningen: Internal report, Department of Personality Psychology,
HB-86-828-SW.
Hamers, J.H.M., Hessels, M.G.P. & Luit, J.E.H. van (1991). Leertest voor
Etnische Minderheden, Handleiding [Learning test for Ethnic Minorities,
Manual]. Lisse: Swets & Zeitlinger.
Hamers, J.H.M. & Ruijssenaars, A.J.J.M. (1984). Leergeschiktheid en
Leertests [Learning potential and learning potential tests]. Lisse: Swets &
Zeitlinger (2nd edition 1986).
Laros, J.A. & Tellegen, P.J. (1991). Construction and validation of the
SON-R 5.5-17, the Snijders-Oomen non-verbal intelligence test. Groningen:
Wolters-Noordhoff.
Lord, F.M. & Novick, M.R. (1968). Statistical theories of mental test
scores. Reading, Mass: Addison-Wesley.
Mokken, R.J. (1971). A theory and procedure for scale-analysis. The Hague:
Mouton.
Nieuwenhuys, M. (1991). Een vergelijkingsonderzoek SON-R, WISC-R en Raven-SPM
[A comparative study of SON-R, WISC-R and Raven-SPM]. Amsterdam: Internal
report, Department of Developmental Psychology.
Nunnally, J.C. (1978). Psychometric Theory. New York: McGraw-Hill (2nd
edition).
Raven, J.C. (1938). Progressive Matrices: A perceptual test of intelligen-ce,
1938, individual form. London: Lewis.
Resing, W.C.M. (1990). Intelligentie en leerpotentieel [Intelligence and
learning potential]. Lisse: Swets & Zeitlinger.
Schroots, J.J.F. & Alphen de Veer, R.J. van (1976). Leidse Diagnostische
Test: Handleiding [Leiden Diagnostic Test: Manual]. Lisse: Swets &
Zeitlinger.
Schroots, J.J.F. (1979). Cognitieve ontwikkeling, leervermogen en
schoolprestaties [Cognitive development, learning ability and school
achievements]. Lisse: Swets & Zeitlinger.
Snijders-Oomen, N. (1943). Intelligentieonderzoek van doofstomme kinderen
[The examination of intelligence with deaf-mute children]. Nijmegen:
Berkhout.
Snijders, J.Th. & Snijders-Oomen (1970). Snijders-Oomen Non-verbal
Intelligen-ce Scale: SON-'58. Groningen: Wolters-Noordhoff.
Snijders, J.Th. & Snijders-Oomen, N. (1976). Snijders-Oomen Non-verbal
Intelligence Scale SON 2.5-7. Groningen: Wolters-Noordhoff.
Snijders, J.Th., Tellegen, P.J. & Laros, J.A. (1989). Snijders-Oomen
Non-verbal intelligence test: SON-R 5.5-17. Manual and research report.
Groningen: Wolters-Noordhoff.
Spelberg, H.C. (1987). Grenzentesten [Testing the limits]. Groningen:
Stichting Kinderstudies.
Starren, J. (1978). De ontwikkeling van een nieuwe versie van de SON voor
7-17 jarigen. Verantwoording en handleiding [The development of a new
version of the SON for 7-17 year olds. Manual and Research Report].
Groningen: Wolters-Noordhoff.
Thorndike, R.L., Hagen, E.P. & Sattler J.M. (1986). The Stanford-Binet
intelligence scale: Fourth edition technical manual. Chicago: The Riverside
Publishing Company.
Vander Steene, G., Haassen, P.P. van, Bruyn, E.E.J. de, Coetsier, P., Pijl,
Y.J., Poortinga, Y.H., Spelberg, H.C. & Stinissen, J. (1991). WISC-R,
Nederlandstalige uitgave: Verantwoording. [WISC-R, Dutch language edition:
Research Report]. Lisse: Swets & Zeitlinger.
Wechsler, D. (1974). Wechsler Intelligence Scale For Children - Revised. New
York: The Psychological Corporation.
Weiss, D.J. (1982). Improving measurement quality and efficiency with
adaptive testing. Applied Psychological Measurement, 6, 473-492.
(source:
http://www.testresearch.nl/sonroe/learnpote.html)
HOME (www.orthopedagogiek.com)
|