The Largest Vocabulary in Hip hop

"Literary elites love to rep Shakespeare's vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever.

I decided to compare this data point against the most famous artists in hip hop. I used each artist's first 35,000 lyrics. That way, prolific artists, such as Jay–Z, could be compared to newer artists, such as Drake.

35,000 words covers 3–5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don't have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick).

I used a research methodology called token analysis to determine each artist's vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin' vs. pimpin), they're removed from the dataset. It still isn't perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses.

It's still directionally interesting. Of the 85 artists in the dataset, let's take a look at who is on top."

(Matt Daniels, May 2014)




The Adventure of English: the evolution of the English language

"The Adventure of English is a British television series (ITV) on the history of the English presented by Melvyn Bragg as well as a companion book, also written by Bragg. The series ran in 2003.

The series and the book are cast as an adventure story, or the biography of English as if it were a living being, covering the history of the language from its modest beginnings around 500 AD as a minor Germanic dialect to its rise as a truly established global language.

In the television series, Bragg explains the origins and spelling of many words based on the times in which they were introduced into the growing language that would eventually become modern English."

[Complete eight part series available on YouTube distributed by Maxwell's collection Pty Limited, Australia]



Revisiting Craft 2: Tools of Craftsmanship

"To McCullough, computer animation, geometric modeling, spatial databases–in general, all forms of media production or design–can be said to be 'crafted' when creators 'use limited software capacities resourcefully, imaginatively, and in compensation for the inadequacies of prepackaged, hard–coded operations' (21).... Again, as Sennett suggests, we 'assert our own individuality' against the prepackaged, predetermined processes and limitations of the tools we're using. Craftsmanship, says aesthetic historian David Pye, is 'workmanship using any kind of technique or apparatus, in which the quality of the result is not predetermined, but depends on the judgment [sic], dexterity and care which the maker exercises as he works' (45).

'Workmanship engages us with both functional and aesthetic qualities. It conveys a specific relation between form and content, such that the form realizes the content, in a manner that is enriched by the idiosyncrasies of the medium' (McCullough p.203). '[E]ach medium,' McCullough says, 'is distinguished by particular vocabulary, constructions, and modifiers, and these together establish within it a limited but rich set of possibilities' (McCullough p.230). Similarly, each methodology, or each research resource, has its own particular vocabulary, constructions, modifiers, obligations, and limitations. We need to choose our tools with these potentially enriching, and just as potentially debilitating, idiosyncrasies in mind. Do we need advanced software, or will iMovie suffice? Do we need to record an focus group in video–or will the presence of the camera compromise my rapport with my interviewee? Will an audio recording be more appropriate? Do we need to conduct primary interviews if others have already documented extensive interviews with these same subjects? Do we need to conduct extensive, long–term field–work–or can we accomplish everything in a short, well–planned research trip? How do I match my problem or project to the most appropriate tool?"

(Shannon Mattern, Words in Space)

Malcolm McCullough, Abstracting Craft: The Practiced Digital Hand (Cambridge, MA: MIT Press, 1996).




A comparable dichotomy between metaphor and metonymy

"Roman Jakobson found a comparable dichotomy between metaphor and metonymy in his seminal paper, 'Two Aspects of Language and Two Types of Aphasic Disturbances,' published in his monograph, Fundamentals of Language (Mouton & Co––Gravenhage, 1956). Here Jakobson discussed two types of aphasia based on complementary disorders in comprehending language: (a) a similarity disorder whereby one primarily depends on syntactic context to draw words into use (pp. 63–64); and (b) a contiguity disorder whereby one's style becomes a telegraphic 'word heap' without much, if any, evidence of syntax (pp. 71–72). According to Jakobson, two faculties are thus involved in the use of language: (a) selection in the choice of words to express an idea (metaphoric); and (b) the combination of words, again to express an idea (metonymic). Elaborate sentences without a particularly impressive vocabulary (for example in the prose of Henry James) illustrates the similarity disorder, while big vocabulary in loosely constructed sentences (for example in the prose of James Joyce) illustrates the contiguity disorder. Joyce heaped together his words with apparent abandonment, while James strenuously belaboured his syntax to produce exactly the right effect––an effect he found difficult to articulate with words alone as opposed to their combination in intricate sentences. An inferior choice of words, Jakobson claimed, is at the sacrifice of metaphor, whereas an inferior combination of words is at the sacrifice of metonymy (p. 76)."

(Edward Jayne)

Jakobson, R. (1971). "Fundamentals of Language". The Hague/Paris: Mouton, Harvard University and Morris Halle, Massachusetts Institute of Technology.

1). Edward Jayne. "The Metaphor–Metonymy Binarism"


Folksonomies: improving tagging technique

"Here are some of the techniques used by professionals:

Universe – knowing the complete vocabulary, so you know what categories are available

Synonyms – that one of the meanins of ultrasound is the same as sonography.

Hierarchy – a Volvo is a kind of car, is a kind of transportation device.

So here are some ideas for how we could improve folksonomy software to make us better at this, without involving any editors.

Suggest tags for me. A Google Suggest–style interface will help familiarize people with the universe of existing tags, so you can use an existing tag rather than invent your own, when the existing tag applies equally well. It would also reduce typos and inconsistencies, like 'blog' vs. 'blogs', and it might serve as inspiration to get past the obvious tags. The pool of tags suggested from could be a weighted list of my own tags, my friends' tags, all tags, and tags other people have already used for this link.

Find synonyms automatically. In the browsing interface, Flickr is pretty good about showing related tags. Why not show these related tags when I am tagging a photo, thus making it easy for me to just add the ones that apply. They could even do a quick lookup on WordNet for more synonyms. Since the related tags in the browsing interface feeds off of tags used on the same images on the input side, this would also help make strong links stronger.

Help me know what tags other people use. When doing both the Google Suggest and the synonyms above, show the most used tags in a larger size than less used tags. There is value in people using the same tag for the same thing, and we want to encourage that, without in any way preventing people from choosing different tag if they want to.

Infer hiearchy from the tags. I have a habit of using multiword tags, so instead of saying 'socialsoftware' like you're supposed to on delicious, I say 'social software', which really makes it two separate tags. That's not necessarily a bad thing, though. If this habit is generally applied, we could look at home many links that are tagged with 'social' are also tagged 'software', and maybe infer that 'social' is frequently used in conjunction with 'software', and thus might imply a special kind of software (or the other way around, that software is a special kind of social), thus offering the combined tag 'social software' to contain links that are tagged with both. A different example would be items tagged 'volvo car'. If most of the time something is tagged 'volvo', it is also tagged 'car', we might infer that volvo is a kind of car.

Make it easy to adjust tags on old content. If the above and other ideas work, people's tagging skills should improve over time. So why not augment the browsing interface so that it's very easy for me to add or remove tags from my iamges or links right there, e.g. from a list of suggested tags on the page, and I'm sure that sometimes, someone would use it. Another incentive to retag my content is if I'm searching for a link on Buenos Aires, but the link wasn't tagged with 'buenosaires', so I find it under 'argentina', say, it should be very easy to add the 'buenosaires' tag to that item."

(Lars Pind, 23 January 2005)


