Norm- Referenced در این سیستم برای ارزشیابی فراگیران با یکدیگر آنها

را با هم مقایسه می کنند و نمرات آنها در مقام مقایسه با یکدیگر شکل می گیرد...

شاید برای شما پیش اومده پایان ترم در درسی نمره 18 که نمره خوبیه کسب

 می کنید ولی هنگامی که به نمرات دیگران که نگاه می کنیدمختون سوت می کشه

 و می گید من 18 فلانی هم 18...!!اینجا دیگه نه جایی برای اعتراض وجود داره

 و نه مدرکی... متاسفانه سیستم ارزشیابی نظام آموزشی ایران بر همین پایه

استوار است.در این روش نظرات شخصی و تصورات ذهنی معلم یا استادبیشترین

تاثیر را دارد و دوستان مستحضر هستند جایی که نظرات شخصی اعمال شود یعنی
 
نبودمعیار یعنی عدم توجه به خواست فراگیر(شخصی که نظام آموزشی برای او ایجاد

شده  و تمام توجهات به سوی او نشانه رفته است)و مطمئنا بازدهی چنین روشی

جز رنجش خاطر دلسردی فراگیران از مطالعه و تحقیق در رشته تحصیلی شان نخواهد

بود. من شخصا از این روش بیشترین رنج رو بردم....

Criterion-Referenced یعنی فراگیر با معلومات خود ونه شخص دیگر ارزیابی

 می شود.در این سیستم بررسی می شود که فراگیر تا چه حد مطلب یا مهارت

خاصی را فراگرفته است.اینجا نظرات سلیقه ای جایی ندارد و برای همه چیز

 استاندارد و معیارمشخص وجود دارد.بهترین و قابل قبول ترین و از دیدگاه سنجش

The most reliable می باشد.

این مقدمه ای بود بر مقاله ی انگلیسی  که در ادامه مطلب به تشریح این دو سیستم

پرداخته است.


Norm-Referenced Achievement Tests
 
Human beings make tests. They decide what topics to include on the test, what kinds of questions to ask, and what the correct answers are, as well as how to use test scores. Tests can be made to compare students to each other (norm-referenced tests) or to see whether students have mastered a body of knowledge (criterion or standards-referenced tests). This fact sheet explains what NRTs are, their limitations and flaws, and how they affect schools.
 
 

What are norm-referenced tests?
 


Norm-referenced tests (NRTs) compare a person's score against the scores of a group of people who have already taken the same exam, called the "norming group." When you see scores in the paper which report a school's scores as a percentage -- "the Lincoln school ranked at the 49th percentile" -- or when you see your child's score reported that way -- "Jamal scored at the 63rd percentile" -- the test is usually an NRT.


Most achievement NRTs are multiple-choice tests. Some also include open-ended, short-answer questions. The questions on these tests mainly reflect the content of nationally-used textbooks, not the local curriculum. This means that students may be tested on things your local schools or state education department decided were not so important and therefore were not taught.
 
Commercial, national, norm-referenced "achievement" tests include the California Achievement Test (CAT); Comprehensive Test of Basic Skills (CTBS), which includes the "Terra Nova"; Iowa Test of Basic Skills (ITBS) and Tests of Academic Proficiency (TAP); Metropolitan Achievement Test (MAT); and Stanford Achievement Test (SAT, not to be confused with the college admissions SAT). "IQ," "cognitive ability," "school readiness," and developmental screening tests are also NRTs.
 



Creating the bell curve.
 
NRTs are designed to "rank-order" test takers -- that is, to compare students' scores. A commercial norm-referenced test does not compare all the students who take the test in a given year. Instead, test-makers select a sample from the target student population (say, ninth graders). The test is "normed" on this sample, which is supposed to fairly represent the entire target population (all ninth graders in the nation). Students' scores are then reported in relation to the scores of this "norming" group.

To make comparing easier, testmakers create exams in which the results end up looking at least somewhat like a bell-shaped curve (the "normal" curve, shown in the diagram). Testmakers make the test so that most students will score near the middle, and only a few will score low (the left side of the curve) or high (the right side of the curve).
 
Scores are usually reported as percentile ranks. The scores range from 1st percentile to 99th percentile, with the average student score set at the 50th percentile. If Jamal scored at the 63rd percentile, it means he scored higher than 63% of the test takers in the norming group. Scores also can be reported as "grade equivalents," "stanines," and "normal curve equivalents."
 

One more question right or wrong can cause a big change in the student's score. In some cases, having one more correct answer can cause a student's reported percentile score to jump more than ten points. It is very important to know how much difference in the percentile rank would be caused by getting one or two more questions right.
 
In making an NRT, it is often more important to choose questions that sort people along the curve than it is to make sure that the content covered by the test is adequate. The tests sometimes emphasize small and meaningless differences among testtakers. Since the tests are made to sort students, most of the things everyone knows are not tested. Questions may be obscure or tricky, in order to help rank order the testtakers.

Tests can be biased. Some questions may favor one kind of student or another for reasons that have nothing to do with the subject area being tested. Non-school knowledge that is more commonly learned by middle or upper class children is often included in tests. To help make the bell curve, testmakers usually eliminate questions that students with low overall scores might get right but those with high overall scores get wrong. Thus, most questions which favor minority groups are eliminated.
 
NRTs usually have to be completed in a time limit. Some students do not finish, even if they know the material. This can be particularly unfair to students whose first language is not English or who have learning disabilities. This "speededness" is one way testmakers sort people out.
 
How accurate is that test score?

The items on the test are only a sample of the whole subject area. There are often thousands of questions that could be asked, but tests may have just a few dozen questions. A test score is therefore an estimate of how well the student would do if she could be asked all the possible questions.

All tests have "measurement error." No test is perfectly reliable. A score that appears as an absolute number -- say, Jamal's 63 -- really is an estimate. For example, Jamal's "true score" is probably between 56 and 70, but it could be even further off. Sometimes results are reported in "score bands," which show the range within which a test-takers' "true score" probably lies.


There are many other possible causes of measurement error. A student can be having a bad day. Test-taking conditions often are not the same from place to place (they are not adequately "standardized"). Different versions of the same test are in fact not quite exactly the same.


Sub-scores on tests are even less precise. This is mostly because there are often very few items on the sub-test. A score band for a Juanita's math sub-test might show that her score is between the 33rd and 99th percentile because only a handful of questions were asked.
 

Scores for young children are much less reliable than for older students. This is because young children's moods and attention are more variable. Also, young children develop quickly and unevenly, so even an accurate score today could be wrong next month.
 

What do score increases mean?
 
If your child's or your school's score goes up on a norm-referenced test, does that mean she knows more or the school is better? Maybe yes, maybe not. Schools cannot teach everything. They teach some facts, some procedures, some concepts, some skills -- but not others. Often, schools focus most on what is tested and stop teaching many things that are not tested. When scores go up, it does not mean the students know more, it means they know more of what is on that test.


For example, history achievement test "A" could have a question on Bacon's Rebellion (a rebellion by Black slaves and White indentured servants against the plantation owners in colonial Virginia). Once teachers know Bacon's Rebellion is covered on the exam, they are more likely to teach about it. But if those same students are given history test "B," which does not ask about Bacon's Rebellion but does ask about Shay's Rebellion, which the teacher has not taught, the students will not score as well.
 

Teaching to the test explains why scores usually go down when a new test is used. A district or state usually uses an NRT for five to ten years. Each year, the score goes up as teachers become familiar with what is on the test. When a new test is used, the scores suddenly drop. The students don't know less, it is just that different things are now being tested.
 

 
 
Can all the children score above average?
 

 
 

Politicians often call for all students to score above the national average. This is not possible.
 

NRTs are constructed so that half the population is below the mid-point or average score. Expecting all students to be above the fiftieth percentile is like expecting all teams in a basketball league to win more than half their games. However, because the tests are used for years and because schools teach to them, there are times when far more than half the students score above average.
 


Why use norm-referenced tests?


To compare students, it is often easiest to use a norm-referenced test because they were created to rank test-takers. If there are limited places (such as in a "Gifted and Talented" program) and choices have to be made, it is tempting to use a test constructed to rank students, even if the ranking is not very meaningful and keeps out some qualified children.

NRT's are a quick snapshot of some of the things most people expect students to learn. They are relatively cheap and easy to administer. If they were only used as one additional piece of information and not much importance was put on them, they would not be much of a problem.
 

The dangers of using norm-referenced tests
 

Many mistakes can be made by relying on test scores to make educational decisions. Every major maker of NRTs tells schools not to use them as the basis for making decisions about retention, graduation or replacement. The testmakers know that their tests are not good enough to use that way.

The testing profession, in its Standards for Educational and Psychological Measurement, states, "In elementary or secondary education, a decision or characterization that will have a major impact on a test taker should not automatically be made on the basis of a single test score."
 
Any one test can only measure a limited part of a subject area or a limited range of important human abilities. A "reading" test may measure only some particular reading "skills," not a full range of the ability to understand and use texts. Multiple-choice math tests can measure skill in computation or solving routine problems, but they are not good for assessing whether students can reason mathematically and apply their knowledge to new, real-world problems.

Most NRTs focus too heavily on memorization and routine procedures. Mutiple-choice and short-answer questions do not measure most knowledge that students need to do well in college, qualify for good jobs, or be active and informed citizens. Tests like these cannot show whether a student can write a research paper, use history to help understand current events, understand the impact of science on society, or debate important issues. They don't test problem-solving, decision-making, judgement, or social skills.

Tests often cause teachers to overemphasize memorization and de-emphasize thinking and application of knowledge. Since the tests are very limited, teaching to them narrows instruction and weakens curriculum. Making test score gains the definition of "improvement" often guarantees that schooling becomes test coaching. As a result, students are deprived of the quality education they deserve.

Norm-referenced tests also can lower academic expectations. NRTs support the idea that learning or intelligence fits a bell curve. If educators believe it, they are more likely to have low expectations of students who score below average.
 

 

Schools should not use NRTs
 

 
 

The damage caused by using NRTs is far greater than any possible benefits the tests provide. The main purpose of NRTs is to rank and sort students, not to determine whether students have learned the material they have been taught. They do not measure anywhere near enough of what students should learn. They have very harmful effects on curriculum and instruction. In the end, they provide a distorted view of learning that then causes damage to teaching and learning.
 
 
 
Criterion- and Standards- Referenced Tests
 

Criterion-referenced tests (CRTs) are intended to measure how well a person has learned a specific body of knowledge and skills. Multiple-choice tests most people take to get a driver's license and on-the-road driving tests are both examples of criterion-referenced tests. As on most other CRTs, it is possible for everyone to earn a passing score if they know about driving rules and if they drive reasonably well.
 

In contrast, norm-referenced tests (NRTs) are made to compare test takers to each other. On an NRT driving test, test-takers would be compared as to who knew most or least about driving rules or who drove better or worse. Scores would be reported as a percentage rank with half scoring above and half below the mid-point (see NRT fact sheet).
 

In education, CRTs usually are made to determine whether a student has learned the material taught in a specific grade or course. An algebra CRT would include questions based on what was supposed to be taught in algebra classes. It would not include geometry questions or more advanced algebra than was in the curriculum. Most all students who took algebra could pass this test if they were taught well and they studied enough and the test was well-made.
 

On a standardized CRT (one taken by students in many schools), the passing or "cut-off" score is usually set by a committee of experts, while in a classroom the teacher sets the passing score. In both cases, deciding the passing score is subjective, not objective. Sometimes cut scores have been set in a way that maximizes the number of low income or minority students who fail the test. A small change in the cut score would not change the meaning of the test but would greatly increase minority pass rates.
 

Some CRT's, such as many state tests, are not based on a specific curriculum, but on a more general idea of what students might be taught. Therefore, they may not match the curriculum. For example, a state grade 10 math test might include areas of math which some students have not studied.
 
Standards-Referenced Tests
 

A recent variation of criterion-referenced testing is "standards-referenced testing" or "standards based assessment." Many states and districts have adopted content standards (or "curriculum frameworks") which describe what students should know and be able to do in different subjects at various grade levels. They also have performance standards that define how much of the content standards students should know to reach the "basic" or "proficient" or "advanced" level in the subject area. Tests are then based on the standards and the results are reported in terms of these "levels," which, of course, represent human judgment. In some states, performance standards have been steadily increased, so that students continually have to know more to meet the same level.
 

Educators often disagree about the quality of a given set of standards. Standards are supposed to cover the important knowledge and skills students should learn -- they define the "big picture." State standards should be well-written and reasonable. Some state standards have been criticized for including too much, for being too vague, for being ridiculously difficult, for undermining higher quality local curriculum and instruction, and for taking sides in educational and political controversies. If the standards are flawed or limited, tests based on them also will be. In any event, standards enforced by state tests will have -- and are meant to have -- a strong impact on local curriculum and instruction.
 

Even if standards are of high quality, it is important to know how well a particular test actually matches the standards. In particular, are all the important parts of the standards measured by the test? Often, many important topics or skills are not assessed.
 

A major reason for this is that most state exams still rely almost entirely on multiple-choice and short-answer questions. Such tests cannot measure many important kinds of learning, such as the ability to conduct and report on a science experiment, to analyze and interpret information to present a reasonable explanation of the causes of the Civil War, to do an art project or a research paper, or to engage in serious discussion or make a public presentation (see fact sheet on multiple-choice tests). A few standards-based exams have gone beyond multiple-choice and short-answer, but even then they may not be balanced or complete measures of the standards.
 
CRTs and NRTs
 

Sometimes one kind of test is used for two purposes at the same time. In addition to ranking test takers in relation to a national sample of students, a NRT might be used to decide if students have learned the content they were taught. A CRT might be used to assess mastery and to rank students or schools based on their scores. In many states, students have to pass either an NRT or a CRT to obtain a diploma or be promoted. This is a serious misuse of tests. Because schools serving wealthier students usually score higher than other schools, ranking often just compares schools based on community wealth. This practice offers no real help for schools to improve.
 

NRTs are designed to sort and rank students "on the curve," not to see if they met a standard or criterion. Therefore, NRTs should not be used to assess whether students have met standards. However, in some states or districts a NRT is used to measure student learning in relation to standards. Specific cut-off scores on the NRT are then chosen (usually by a committee) to separate levels of achievement on the standards. In some cases, a CRT is made using technical procedures developed for NRTs, causing the CRT to sort students in ways that are inappropriate for standards-based decisions.
 

Sometimes the NRT is changed to more closely fit the state standards and to report standards- referenced scores. As a result, a state could report that 35 percent of its students were proficient according to state standards (depending, of course, on where the cut-off score is set), but that 60 percent of its students were above the national average score on the norm-referenced test. Adapting an NRT also means that while everything on the test is in the standards, much of what is in the standards is not in the tests.
 
Conclusion
 

If standardized tests are used at all, CRTs make more sense for schools than do NRTs. However, they should be based on relevant, high-quality standards and curriculum and should make the least possible use of multiple-choice and short-answer questions. As with all tests, CRTs and NRTs, no matter what they are called, should not control curriculum and instruction, and important decisions about students, teachers or schools should not be based solely or automatically on test scores


 

 

 

منبع