The Largest Vocabulary in Hip hop

"Literary elites love to rep Shakespeare's vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever.

I decided to compare this data point against the most famous artists in hip hop. I used each artist's first 35,000 lyrics. That way, prolific artists, such as Jay–Z, could be compared to newer artists, such as Drake.

35,000 words covers 3–5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don't have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick).

I used a research methodology called token analysis to determine each artist's vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin' vs. pimpin), they're removed from the dataset. It still isn't perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses.

It's still directionally interesting. Of the 85 artists in the dataset, let's take a look at who is on top."

(Matt Daniels, May 2014)




Simon Perkins
15 MARCH 2012

Seeking quality in criterion referenced assessment

"Norm and criterion referenced assessment are two distinctly different methods of awarding grades that express quite different values about teaching, learning and student achievement. Norm referenced assessment, or 'grading on the curve' as it is commonly known, places groups of students into predetermined bands of achievements. Students compete for limited numbers of grades within these bands which range between fail and excellence. This form of grading speaks to traditional and rather antiquated notions of 'academic rigour' and 'maintaining standards'. It says very little about the nature or quality of teaching and learning, or the learning outcomes of students. Grading is formulaic and the procedure for calculating a final grade is largely invisible to students.

Criterion referenced assessment has been widely adopted in recent times because it seeks a fairer and more accountable assessment regime than norm referencing. Students are measured against identified standards of achievement rather than being ranked against each other. In criterion referenced assessment the quality of achievement is not dependent on how well others in the cohort have performed, but on how well the individual student has performed as measured against specific criteria and standards. Underlying this grading scheme is a concern for accountability regarding the qualities and achievements of students, transparency and negotiability in the process by which grades are awarded, an acknowledgement of subjectivity and the exercise of professional judgement in marking."

(Lee Dunn, Sharon Parry and Chris Morgan, 2002)


Simon Perkins

