Putting Quacquarelli Symonds’ universities ranking methodology to the test

Faisal Wali

Queen's Tower, Imperial College London

Queen's Tower, Imperial College London

Of late, the New Asia Republic has been involved in a flurry of exchange with staff of the Quacquarelli Symonds (QS). It started with my penning of the article titled “Why the QS World Universities rankings should be ditched“.

In my initial piece, I addressed the latest rankings of universities by QS within the discipline of medicine. I highlighted initially that if QS wanted to rank such universities, it would have been better if they ranked universities that produce medical graduates.

The basis of my reasoning is from the standpoint of a prospective medical student looking to select from a ranking list schools he wishes to attend. I initially contended that including institutions that collaborate with medical schools in terms of teaching and research will add unnecessary noise to the ranking list.

This is especially so when such institutions do not admit and matriculate medical students themselves. In the latest correspondence with QS, the latter has since indicated that students will be “ill-advised” to “make their university choice solely on the basis of rankings”.

The company further added that:”where a university performs well in our medicine rankings, this does not necessarily imply that it is active in the particular area of medicine in which a given candidate using the rankings is interested. It recommends basing application on numerous information sources, which is a reasonable step.

Ranking may seem a futile exercise to some – the main argument is that every education institution included has its own strengths, orientation, purpose and goals. However, if ranking must be done, we have to be sure to be comparing apples with apples, oranges with oranges.

For the discipline of medicine, it is worthwhile looking at how US News does it. They have their flaws in their ranking methodology but one thing they got right is comparing apples with apples. For instance, they ranked medical schools with complete information on admission scores and based on research and primary care.

The list of medical schools ranked by US News are those that admit students, and eventually produce medical graduates. Interestingly, US News also conduct ranking of hospitals and according to specialties. Teaching hospitals affiliated to medical schools are included in the rankings.

Thus, the leaf for QS to take out of the US News rankings’ book is that if it must really rank, it should rightly compare those that are comparable. And in situations where it is impossible to rank entities with different orientations, specialties, enrolments and staffing, it will be worthwhile to create a separate list to rank.

For example, when US News ranked hospitals, it took into consideration that different hospitals have different strengths in various medical specialties. Thus, it chose to rank hospitals according to different specialties instead.

Next, I pointed out the subjective nature of academic peer rating, specifically with regards to sourcing or surveying feedbacks from academics. Academic peer rating takes up 40% of QS’ rank score.

However, I would firstly like to elaborate on surveying and sampling methodology. When we survey a population of individuals, for example students or academics, the way to go about it is to obtain a random sample that more or less reflects the trend of the larger population.

If the sampling process of the population is done appropriately, the results I will get is reflective of the population at large. So, if I conduct a first round survey and identify a certain trend, and I wish to repeat this survey again, I will be able to reproduce this trend.

Thus, it is now clear that poor sampling methodology will affect reproducibility of results. Hence, if the first round of sampling has respondents whose views represent one end of the spectrum, and in the second round of sampling, respondents have views that represented the other end of the spectrum, then the whole exercise of survey is futile and tells us nothing about the population. The sampling method is problematic to begin with.

Of course, it is possible with proper sampling methodology to obtain results from a survey that is non-reproducible, and if repeated, will give a different trend. This is where the subjectivity I was talking about kicks in. It could be that our topic of survey produces varying responses that does not produce a clear trend.

That means every individual has his opinion about something, and varies from his other counterparts. Thus, what we mean by subjective is basically every respondent having his own opinions, which could differ from others. And secondly, when we conduct a survey even with proper sampling, we will get a different trend even if we repeat the survey.

That was why I pointed out findings by Fred L. Bookstein, Horst Seidler, Martin Fieder and Georg Winckler published in the journal Scientomentrics that academic peer ratings conducted in QS’ rankings demonstrated unacceptably high fluctuations. This begs the question of whether academic peer ratings from surveys by QS are reproducible.

In response, the QS gave additional details of its reputation survey – they specifically asked academics which universities are producing the best research in their field of expertise. QS’ reasoning is that it is the academics’ job to be in the know of institutions producing the best research and they are in a good position to judge what constitutes high quality research.

It is not an unreasonable assumption for QS to make – academics are in a good position to judge which institutions are doing the best research, and their feedback should be valuable. However, one must also realise that academia is so huge, with many fields and sub-fields plus academics with narrow research interests.

Furthermore, different institutions have varying strengths in different sub-fields. If QS is going to ask an academic who does research on vaccines which universities are top for medical research, you have to be open to the possibility that he is going to name leading institutions in vaccines research.

This is what we call opinion bias. Because academics usually concentrate on one area of research interest, he may exercise bias in judgement towards institutions that are leading the way in his area of research interest. Thus, one thing for QS to consider is the possibility of opinion bias when surveying academics.

Given the high weightage given to academic peer feedback, opinion bias can in a way skew the rankings. Therefore, the subjectiveness that I spoke about earlier is opinion bias, phrased in another way.

Other possible sources of erroneous judgement can occur again due to the tragedy of specialisation. If we ask the same academic who does research into vaccines which institutions are top in psychiatric medicine, chances are that we will not be able to glean an accurate answer.

This is also another consideration. An academic specialising in his narrow sub-field of interest may be able to tell you which institutions are leading in this narrow sub-field. However, if you ask him anything outside his sub-field of interest, chances are that you may get an erroneous reply.

QS mounted a strong defence of its approach in surveying academics in comparison with Shanghai Jiaotong university’s Academic Ranking of World Universities (ARWU), stating that its approach allows them “to be more receptive to trends and developments” which involve changes “such as major new pieces of research, or the migration of high-quality faculty from one university to another.”

QS further added that such developments are “reflected in our annual survey more immediately than in long-term citation-based measures such as those favoured in the ARWU.”

At first glance, QS’ defence of its academic survey method seems pretty strong and appears to be in touch with latest developments as compared with ARWU’s ranking methodology. However, in order to properly approach this topic, we have to understand a little about what it takes to do research.

There are times when investigators engage in a piece of promising research, given a previous paper that he published earlier, only to find it comes to naught. A very good example is in the field of clinical pharmacology.

A pharmacologist may think that a potential drug works perfectly fine in cells and animal models. The pharmacologist publishes a paper, and many labs think that this could be the latest trend in drug research. However, when they test the drug on humans, it failed miserably. And people usually don’t publish experiments that fail. The initial discovery became a one-hit wonder that died down quickly.

In addition, the nature of research is such that one can be working on a topic without any result and not publish a paper. It can in some cases take many years for him to make a breakthrough.

Furthermore, if one is familiar with the process of publishing research papers, he would know that his paper would be screened by editors of the journal. If there are issues with the way the research is conducted, the paper will be rejected.

While QS’ survey of academics may track the latest developments in the field, the company must also understand that not all trendy research developments produce anything meaningful. Some can lead to a illusory trail, when something that appears meaningful turn out meaningless after all. No paper gets published and the topic fades into oblivion.

The company should also realise that movement of faculty do not necessarily translate to meaningful research at the new institution all the time. It has more to do with the nature of research, where one can work on the topic without any breakthrough.

Lastly, research is also subjected to peer review when the researcher submits his paper to the journal. Thus, whilst survey of academics can track the latest trends in research and movement of faculties, it is premature to conclude that such activities will bear any fruit to the intellectual community.

This is why the only way to judge the end product of research is through measures adopted by ARWU – publication and citation.

Interestingly, in QS’ latest reply, it was willing to concede that employer’s review in the rankings possesses a subjective element. However, it occupies a lesser weight. The point is that the same can occur for academics peer review, which also has its subjective element.

It is interesting now with QS’ revelation that the 40% component of ranking is scored through opinions of academics on universities doing the best research. Thus, it appears that research quality and achievements take up a large portion of QS’ rankings.

With this knowledge, it appears that QS’ focus on research, at least in the form of research citations and seeking academic peer opinion makes up 60% of the rankings. If anything, it vindicates ARWU’s focus on research achievements.

ARWU measures achievements in various fields by Nobel Prizes and Fields Medals won, publications in top journals and highly cited researchers. The only difference between QS and ARWU is that the latter measures by tangible achievements in the field of research, which are more objective key performance indicators (KPI).

The former, on the other hand chooses to measure the majority of research achievements by survey of academics and the rest by research citations.

Earlier on, I highlighted the disparity in ranking between QS and ARWU for medicine. NUS (18th) is ranked above University of Pennsylvania (21st) and Cornell University (23rd) in the QS rankings. In ARWU, NUS did not make the top 100.

Actually, there are other evidences of disparity between QS and ARWU rankings specifically in the area of medicine. With no intended disrespect to our Swiss counterparts, the University of Auckland occupied 39th position in the QS rankings, while Switzerland’s University of Basel occupied 101 – 150 position.

However, in the ARWU, University of Auckland did not make the top 100, and University of Basel occupied 46th position in the world.

In the ideal world, academics’ opinions, if asked for which institution is producing the best research should correlate with research achievements. Good, original research get you Nobel Prizes (probably), publications in top journals, and make you a highly-cited researcher. However, given the findings by Bookstein and colleagues on the fluctuating nature of QS’ academic peer review, we know this is not the case.

Assuming that QS was rigorous in its sampling methodology, the only other possibility is opinion bias. Then, wouldn’t it be better to use an objective KPI adopted by ARWU that is free of opinion bias?

QS has responded with a defence of the volatility of its rankings by claiming it is less volatile than the Financial Times’ ranking of Business Schools, Times Good University Guide and The Guardian.

My response is that QS is an education and study-abroad company. Since education is its bread and butter, more rigorous ranking methodology should be expected if it is going to rank universities. The primary focus of Financial Times, Times and The Guardian is in media to begin with, not so much in the education business.

Obviously, QS’ rankings will fluctuate less than the other three publications. It is of no surprise.

However, the fact remains that if we want to compile a rank list, we have firstly got to be sure that we are ranking entities that are comparable. Secondly, we have to question whether the ranking methodologies incorporate a large degree of subjective element or in this case opinion bias in them. If there is a high level of subjectivity and bias, the end result will be distorted.

What is then the use of a ranking exercise whose methodology leads to a distorted result?

Faisal’s view has been emailed to the Press Office of Quacquarelli Symonds

Attached below is a defence by Quacquarelli Symonds’ representatives on its rankings methodology.

The QS medicine rankings incorporate both undergraduate and postgraduate courses, and since MIT and Caltech both admit students in postgraduate medical sciences programs they are eligible for our rankings. While we did discuss the possibility of excluding these institutions on the basis that you mention prior to compiling the rankings, the criteria for inclusion and methodology for all of our rankings is made clear on our website. The salient point here is that we are not proposing that students make their university choice solely on the basis of rankings (whether those of QS or anyone else) – that would be ill advised. Where a university performs well in our medicine rankings, this does not necessarily imply that it is active in the particular area of medicine in which a given candidate using the rankings is interested – we advise prospective students to use these rankings as a preliminary short listing device, and base their eventual application and program decisions on numerous sources of information.

You state that the use of peer review data is excessively ‘subjective’ but you give no details of what exactly you mean by this claim. To clarify: the QS academic reputation survey specifically asks academics which universities are producing the best research in their field of expertise at a given time. Quite simply, it is part of an academic’s job to have an informed view on this question. It may be subjective in the sense that it depends on a given academic’s judgement of what constitutes high-quality research, but academics are by definition those best qualified to make this judgement. Far from being vague and subjective, we are asking experts to give us the benefit of their expertise, not merely asking for their opinion on an institution’s general reputation. Your criticism of the survey suggests that you may be unaware of its specific nature.

The reason we mention that this allows us to be more receptive to trends and developments is that changes such as major new pieces of research, or the migration of high-quality faculty from one university to another, are likely to be reflected in our annual survey more immediately than in long-term citation-based measures such as those favoured in the ARWU. We would be more willing to concede a subjective element in our (lower weighted) employer review, which asks global graduate recruiters which universities have in their experience produced the best graduate employees. But this subjectivity serves a clear purpose for our target audience of students – it tells them about the perception of universities in the eyes of those who will be making a judgement as to the merits of the qualification they have gained.

This relates to an important difference between the various QS exercises and the ARWU. The ARWU was originally a government-initiated exercise aimed at benchmarking the scientific research performance of Chinese universities in relation to those elsewhere in the world. While research is clearly an important function of a university, our exercises try to balance this with other factors that are of interest to prospective students, such as teaching commitment, the perception of a university among employers, and its international make-up. You criticize our rankings, but you make no mention of the fact that the ARWU practically excludes less citation-driven academic fields such as the arts and humanities and social sciences. You state that the QS rankings are an ‘inferior exercise’ to the ARWU, yet to a prospective student in the arts and humanities and social sciences our rankings offer useful information, whereas the ARWU offers virtually none. (nor does it pretend to).

We do not dispute the validity of the ARWU in the areas that it measures but its reach is arguably limited and its audience very specific. The QS and ARWU exercises are not mutually exclusive, but provide different types of information that will be more or less useful depending on the aims and priorities of the person using it.

As to your comments regarding volatility, we believe you have been misguided here. The level of yearly volatility in the QS rankings is lower than the Financial Times business school ranking, as well as the national rankings produced in the Times Good University Guide and The Guardian. Given that there are far fewer variables involved in comparing institutions within the same national system, this reflects well on the stability of our methods.