Friday, September 14, 2007

The Meaninglessness of Numerical Grading

The following is adapted from my web page on grading.

In 2005, St. Lawrence changed its grading system, from the grades of 4.0, 3.5, 3.0, 2.5, etc. (0.5-interval grading) to grades of 4.0, 3.75, 3.5, 3.25, 3.0, 2.75, etc. (0.25-interval grading).

It was the students who wanted the finer gradations. They said that they wanted the grades to more accurately reflect their performance in their courses. The faculty passed this proposal (but not without debate, and not without some faculty arguing in a very different direction). The new grading system went into effect in the 2005-2006 academic year.

It is interesting to note that this change is not really just a refinement of an existing system. The two systems are in fact different enough that it is inappropriate to think that the Grade Point Averages (GPAs) computed in each system can be directly compared.

The following chart shows how GPAs do in fact change if you round grades in different ways. This table shows the kinds of rounding historically used at St. Lawrence University. Other schools often use +/- systems, which numerically convert to grades such as 3.3, 3.7, 4.0. What my little table shows is that it is dubious to compare GPAs on 4.0 grading scales if the systems of rounding are different.

Actual

.25 Rnd

.5 Rnd

.0 Rnd

3.35

3.25

3.5

3

2.8

2.75

3

3

3.6

3.5

3.5

4

2.3

2.25

2.5

2

3.0125

2.9375

3.125

3


Note that not all sets of grades would necessarily always round down on .25 intervals and up on .5 intervals. What is interesting is just that the GPAs are different. Two students with the same grades would have different GPAs depending on whether they started before or after the change in grading intervals – and yet those students who came in the midst of the change have both kinds of grades averaged together, as if averaging these incommensurable scales is legitimate!

Here is a table showing what happens when you average together the grades of students graded under both systems. Imagine these five hypothetical students who happen to get exactly the same raw grades in their courses every year (the grades from the above table) -- but the grading system changes for all except Student 1 sometime during their time here. This table shows the differences in their final GPAs at the end of their four years (the yearly GPAs are taken from the table above):


Yr 1 GPA

Yr 2 GPA

Yr 3 GPA

Yr 4 GPA

Final GPA

Student 1

3.125

3.125

3.125

3.125

3.125

Student 2

3.125

3.125

3.125

2.9375

3.078125

Student 3

3.125

3.125

2.9375

2.9375

3.03125

Student 4

3.125

2.9375

2.9375

2.9375

2.984375

Student 5

2.9375

2.9375

2.9375

2.9375

2.9375



In this case, the student lucky enough to have arrived before the change has the highest GPA. The student unlucky enough to have spent all four years under the new grading system has the lowest. Again, it is not the case that this change results in lower GPAs for all students -- the point is that the very same raw grades average out to different GPAs depending on the grading system. Worse, these GPAs are then compared to those of students from schools that may use the .3/.7 intervals (plus/minus grading) -- an altogether different system, but because it is also a 4.0 scale we think it is essentially the same!

We place a lot of faith in these numbers that we are not in fact even computed in a mathematically responsible way. Can we really say that the GPA has a stable and unambiguous meaning?

I wish we could switch to a system of grading that does not convert grades to numbers. My favorite option is to use a high-pass, pass, fail system. The "high pass" would be a grade specifically to indicate that the student did so well that you regard that student as having graduate school potential. After all, the main distinctions we wish to make are whether the student should pass, and whether the student has worked with the material so well that you would recommend them for more advanced study in that field.

But I would also be content with a return to A, B, C, D, F grading: no pluses or minuses, and no attempt to convert these grades to numbers. The meanings here are excellent, good, satisfactory, low-pass, and fail.

When I criticize the grading system, people often leap to the assumption that I want to replace it with narrative evaluations of all students. But that is not true. I see the value of our having a shorthand way of representing the quality of students' work. What I object to is converting grades to numbers, averaging these numbers, and tying so much to this average (sometimes taken out to the thousandth decimal place) when this use of numbers is not warranted and thus highly misleading.

No comments: