Using Data, Not Proxies to Understand Scholarship

  • I, myself, am a student of Indian dialects.
  • Are you? Do you know Colonel Pickering, the author of Spoken Sanskrit?
  • I am Colonel Pickering. Who are you?
  • I’m Henry Higgins, author of Higgins’ Universal Alphabet.
  • I came from India to meet you!
  • I was going to India to meet you!

    My Fair Lady (film), 1964

Much of the way that science (and many other fields) is conducted and evaluated today is heavily influenced by the foundations established in Europe during the Scientific Revolution in the 1600s. Through a network of societies and academies, groups of scholars would gather or correspond through letters to debate and evaluate each others’ work (peer review). Once reviewed, the work may have been published or otherwise shared with the public. One’s reputation was built through sustained affiliation with these groups and the members within them. Social and scholarship circles were often intermingled. Member/ fellowship in a society or academy enabled one to participate in these activities and was granted through nomination by those already included. In other words, you needed to be known and promoted by an “insider” to become a part of this elite and exclusively male (until the mid-1900s) group.

Scholarship review models haven’t changed

This system persists today hundreds of years later, though it has been modified slightly to accommodate modern scale. There are still a set of elite institutions that drive much of the recognized scholarship. An insider likely no longer directly sponsors one’s membership in these institutions; instead, the individual’s scholarship determines inclusion. Peer review is still conducted by a small group established and considered to be experts in the field, also based on their scholarship. This all sounds like a meritocracy - we want to promote outstanding scholarship. However, given the amount of research done today, the community can’t evaluate each piece of work, scholarly contribution, or theory based on their individual merits as was done in the reading and discussion of scientific letters during regular meetings of the 1660s and 1670s at the Royal Society1 . Instead, we have established a set of secondary evaluations - proxies - that provide a shortcut for evaluation and (theoretically) enable us to normalize the value we place on individual scholarly contributions. As a result, these proxies directly impact how we evaluate an individual’s scholarship and reputation, so what is considered and assessed by proxies is critically consequential to individual reputations and the progression of knowledge.

Why proxies are a problem

These proxies are primarily based on the systems that had been established hundreds of years prior - membership in a society as selected by esteemed colleagues, reputation based on how much others built on already-reviewed work, and gated opportunities for publication and sponsored support. Unfortunately, the proxies of this system also captured the values and biases that had been in place at that time - ones that prioritized scholarship from specific populations, affiliation with certain institutions, publication in certain journals, and emphasis of formally peer-reviewed journal publication over other forms of contribution. These proxies help perpetuate their encoded biases while obscuring them, making them harder to recognize and change. For example, it may have been a useful proxy to prioritize contributions from a handful of elite institutions because, at the time, many (but not all) of the valuable contributions came from individuals that were affiliated with these institutions. The proxy simplifies this evaluation for us, but in its simplification it misses too much. “Contributions from institution A are great” is not the same as “many of the people who make great contributions are from institution A.” The proxy encodes the first statement when the second is true. At the time, the two statements may have been equivalent. But now, significant contributions come from many more institutions. The known proxy-preference for institution A emphasizes the place where work was done more than the work itself.

Attempts have been made to mitigate the biases that are systemically encoded by these proxies through activities like blinded reviews and submissions, programs to train and establish members of omitted populations into these elite groups, and tweaking metrics to broaden the limited scope included in proxies. But these miss the point. These proxies will always encode the simplified system of 300 years ago. They can’t begin to provide for the breadth of contribution, thought, areas of discovery, and assessment of value that is present in today’s world. Surely we can do better. Fortunately, our ability to accurately capture, catalog, and analyze vast amounts of data has also advanced since this time.

Data is better

Through more complete and accurate primary data, we can more directly understand the source and progression of discovery. Increased metadata, including persistent identifiers, have made it possible to codify both the source of contributions and their outcomes. As a result, we can consider all of the elements that facilitate a discovery’s progress and evolution, including the people involved, theses tested, data collected, and tools used. By returning to the primary factors of scholarship over relying on proxies, we can be more agile and flexible about how we assess impact, for example, in providing societal benefit, advancing knowledge, or accomplishing a specific goal. Our ideas of what impact is important may change. But a new system that focuses on the data of contribution and influence over the reinforcement of an outdated set of proxies would enable the flexibility needed to encourage and incentivize the complete, complex ecosystem, understand the world in which we live, and address the challenges that we face.

Building a new way forward

So, I challenge the entrepreneurs that target and the leaders that rely on scholarship, innovation, and the assessment of research impact. Can we capture the innovation impact of a gathering to help decide the most impactful uses of our time? Quantify the scholarship role of a piece of equipment to determine the potential return on investment of new equipment? Understand how a particular place and time ignited a scholarship renaissance so that we may try to recreate its magic? Let’s take a fresh look at how scholarship and its impact are measured. Ensure that the metadata that describes scholarship outputs and activity are open and available. Take aim at those simplified proxies that encode the “who you know” reputation culture of the 1700s. Instead, let’s employ sophisticated data capture and analysis tools that will help us understand the complexity of the many varied contributions needed to address big problems while broadening who participates in the innovation or tomorrow.

_Image: Minutes of ordinary meetings of the Royal Society including papers submitted, gifts received and notes on experiments performed., CC BY 4.0 via Wikimedia Commons._


  1. Rusnock, Andrea. “Correspondence Networks and the Royal Society, 1700-1750.” The British Journal for the History of Science 32, no. 2 (1999): 155-69. Accessed May 14, 2021. http://www.jstor.org/stable/4028081. [return]
Categories: