Retrospective h-index - A simple R function

The inspiration behind this post comes from an ongoing `citation analysis’ side project, where I wanted to calculate h-index of a researcher based on their Google Scholar profile retrospectively. Easy peasy except the following two issues:

  1. Internet tells me no readily available solution currently exists
  2. ChatGPT was not quite able to solve it either

While ChatGPT was able to efficiently turn my R function into a Shiny code, it failed to provide me the solution I needed.

Yet, the journey to get this work published was a bit bumpy, which happened to be an instance of early-career setback (broadly defined), an almost-unheard-of phenomenon until recently.

Motivation

In particular, I wanted to achieve the following 5-step workflow where ChatGPT was not super helpful:

Step I - Get a data frame of publications and citation history for a given Google Scholar ID

Step II - Slice citation history to remove publications beyond the given window

Step III - Correct total cites per article in the reduced data frame (On the reduced data frame we still don’t have the current number of citations per article for the given window as it has the total citation counts by default)

Step IV - On the corrected reduced data frame, calculate the retrospective h-index using the h-index formula

Step V- Output current and retro h-index

Solution

Cutting to the chase, here is the simple R function that achieves this objective.

Next Steps

Unfortunately, Google limits how frequently you can connect with the Google Scholar server, making the code inefficient for a Shiny app. Nevertheless, a work-in-progress Shiny app is also available which is open for improvements.

Avatar
Himel Mallick, PhD, FASA
Principal Investigator

Applied statistician with broad research interests in biomedical and applied data science, working on problems in machine learning and computational biology.

Related