The Old English Scrabble Project
Galvanised by something I don’t now remember, I decided to see if there was such a thing as Latin Scrabble. Sure enough, the Internet provided: it does exist, and you can play it online here or buy your own set of tiles (US$25 plus shipping) from a bloke called Sean at the Toronto University Centre for Medieval Studies, here.
Having established that you can play Scrabble in at least one dead language, I investigated the existence status of Old English (aka Anglo-Saxon) Scrabble. It doesn’t exist. I’m not sure whether this constitutes a surprise or not: on the one hand, the number of people interested in OE is far, far smaller than that of people interested in (say) Latin; on the other hand, there is almost nothing, no matter how obscure or geeky, that has not been done by somebody, somewhere . . . to which the rest of this post is testament, I suppose.
I’m generally bad at bringing things I think would be cool and awesome into existence, as is attested by the fact that my folder of abandoned, never-finished literary efforts contains 1,901 files that take up forty megabytes between them. That said, when the cool awesome thing can be satisfactorily realised without a) costing money or b) taking up enough time for me to get bored with it, the hit rate is much higher than otherwise. I am pleased to announce that I produced the first viable set of Old English Scrabble in under two days for no cost, and now you can too!
Scrabble values aren’t assigned at random. The reason that, in a normal English Scrabble set, there are lots of Es worth only one point and only one Z, worth ten, is, obviously enough, because English uses E an awful lot and Z far, far less. All the other letters fall somewhere between those two in terms of frequency of usage, and are given values accordingly, with less common letters being worth more points.
You determine the exact frequencies involved by getting a large sample of written English and, well, counting the letters. These days, we do it by computer; back in the Thirties, when Alfred Butts made the first Scrabble set, he had to do it by hand, painstakingly counting the occurrences of each letter in articles from the New York Times.
Historically, the main application of frequency analysis has been in cryptography – careful letter-counting is the way to break monoalphabetic substitution ciphers (like a Caesar shift) or simple-ish polyalphabetic ones (like the Vigenère cipher). It’s also been used in the literary world to help determine questions of authorship: works by different authors have subtly but measurably different letter distributions, and it’s sometimes possible to identify forgeries or misattributed works by comparing such distributions.
Neither of these things is really relevant to the study of the Anglo-Saxon period. They didn’t use cryptography, as far as we know – in an era when few people could read at all, merely writing something down would be pretty safe, and writing in Latin practically impenetrable – and while questions of authorship are important, letter frequencies aren’t that helpful for the texts of an age when manuscripts were copied by hand (with all the mistakes you would expect) and scribes tended to alter the texts they copied into their own accent (before standardised spelling, people generally spelt things how they said them – which, considering the wild proliferation of accents in the British Isles, led to a lot of variation).
So there’s no published research on Old English letter frequencies, because to be honest it would be a bit useless. So, before I could start assigning Scrabble values to Old English letters, I had to do the maths.
The first thing to do was assemble a corpus. Corpus is from the Latin for ‘body’, and means just that – a body of work, a big collection of text that linguists and other bods put together in order to do statistics on. (The stem is corpor-, as in ‘corporal’, ‘corporation’, ‘incorporate’, and so on.) The biggest English corpus in existence contains ninety million words – for comparison, Proust’s seven-volume À la Recherche du Temps Perdu, a book famous for being insanely, ridiculously long, is only about 1.5 million.
Frequency analysis corpora, however, do not need to be anywhere near as big – you want representative, rather than comprehensive – and in any case there isn’t actually that much Old English in existence. I compiled my corpus from the excellent Internet library hosted by Georgetown University, and got together a decent selection without too much trouble. The texts used in the final analysis were the following:
Psalms 52-100 (from the Paris Psalter)
Dream of the Rood
The Lord’s Prayer I, II, III
The Order of the World
The Ruin (intact sections)
The Wife’s Lament
Wulf and Eadwacer
The Battle of Maldon
The Battle of Brunanburh
Caedmon’s Hymn (Northumbrian)
Caedmon’s Hymn (West Saxon)
Sermo Lupi ad Anglos
Riddles 1-20 (exc. 19)
Generating the corpus wasn’t without its problems. As observers familiar with the Anglo-Saxon period may already have noticed, the list of texts above is almost entirely poetry, with only Archbishop Wulfstan’s Sermo Lupi ad Anglos representing the prose side of things – and the Sermo employs a lot of the conventions of poetry. Poetry in any language has historically tended to use a distinctly, if not majorly, different set of words from prose, to the point that Wordsworth’s proposal to write in the voice of the ‘common man’ was a rather radical idea. Old English poetry is no exception: it’s full of words that are attested only rarely, or sometimes only once, which were most likely not in everyday use. It’s entirely possible, even probable, that the letter-distribution of Anglo-Saxon prose would be different to that of poetry, and that therefore the frequencies I came up with aren’t representative of Old English as a whole.
There’s Old English prose about, of course, but it seems to be a deal harder to get hold of for free on the Internet, which was the main criterion in choice of texts. (If I were writing this up for a journal it’d be a different story.) One major prose text that is freely available is the Anglo-Saxon Genesis (the book of the Bible, not to be confused with the poems of the same name); I did try including the Genesis in the corpus, but took it out again after it became clear that it was distorting the results. Genesis is full of proper names and numbers – this is the part of the Bible notorious for its long genealogies, after all. What with those names being Hebraic, and the numbers being expressed in Latin numerals, the stats were getting skewed by extraneous j‘s, z‘s, and v‘s – letters that wouldn’t normally exist in Old English at all.
The final corpus, minus Genesis, came to 47,725 words, or 230,618 characters with the punctuation and spaces stripped out. This seemed like a respectable size for a corpus, so I started calculating. I used a little piece of freeware called, baldly enough, FreqAnalysis, found through Google and downloadable from here.
The corpus I put together, and therefore the Scrabble set that came out of it, uses the following twenty-four letters:
a b c d e f g h i l m n o p r s t u w x y æ þ ð
This was not quite as obvious as might be thought. Making a Modern English Scrabble set is easy: take all the letters in the English language, take out the ones not used for spelling native words (e.g. you won’t find ñ outside a few Spanish imports) and then frequency-analyse the rest. In Old English it’s not quite so simple: even once I’d evicted j, v, and z, which were only used in Biblical proper names, there were still other things I had to decide whether to include or leave out.
- Length-marks. Old English vowels (a e i o u æ y) can be long or short, and are pronounced differently depending on length. As with Latin, it’s possible to intuit the length of a vowel from spelling, grammar and metrical position once you’re semi-fluent; however, most people aren’t, and so sometimes (as with Latin) Old English texts are printed with diacritics – usually a macron or an acute accent – distinguishing long vowels. However, as this is modern editorial practice rather than a feature of Old English itself, I have not taken length-marks into account – in Old English Scrabble, the a-tile should be played for both long and short a, and so on.
- The question of wynn. In Old English manuscripts, the w-sound is represented in two ways: sometimes by uu (the origin of the name and shape of our modern w) or by another letter, sufficiently obscure that it’s not in most character-sets, which looked a bit like a p and was called wynn. Because it’s so easily confusable with p, modern editions of Old English nearly all replace wynns with w’s for ease of reading. In this case, I’ve followed the modern practice – mainly because you’d struggle to find many online Old English texts that use the wynn, and so constructing a viable corpus would be hard. There’s also the practical point that Scrabble tiles – especially handwritten ones, as these are – need to be easy to tell apart, and given that there’s already one Old English letter persisently confused with p (the thorn, þ) adding another didn’t seem like that great an idea.
- The question of yogh. The yogh is a Middle English letter that looks a bit like a 3 moved down half a line, or an ornamental z. Its pronunciation in Middle English is somewhere between y, g, and the sound made by ch in Scots or German (/x/, for those readers familiar with IPA). The letter-form does appear in Old English, but only as a variant of g; it doesn’t appear to have become phonetically distinct until some time after the Anglo-Saxons. I’ve left it out.
- The question of k. Unlike j, v, and z, k does occasionally appear in Old English outside of Biblical texts. Most of the very few instances of it that I can track down appear to be in proper names (e.g. the Anglo-Saxon Chronicle spells ‘Canterbury’ with a K on a couple of occasions), which are of course inadmissible in Scrabble; the few that snuck into the corpus are all in the same word, kynin[c]g ‘king’, usually spelt cyning. As this appears to be an isolated spelling anomaly (very isolated: there are fourteen k’s in 230,000 characters), and it doesn’t seem worth including an extra letter for the sake of one word, I’ve left it out.
- Thorn and eth. Old English used two letters for the sound we now spell as th: þ, called thorn, and ð, called eth. Of modern times, they are regarded as signifying different sounds: þ is used for the th-sound in ‘thorn’, ð for the th-sound in ‘then’. It’s likely that spoken Old English also made this distinction (Wikipedia claims it as fact, but doesn’t give a citation) but – whether it did or not – the written language mostly doesn’t, and treats þ and ð as interchangeable: the king in Beowulf, Hrothgar, is spelt both Hroðgar and Hroþgar, the villainous Unferth as both Unferð and Unferþ.
A more extreme example is the Old English conjunction meaning ‘until’, which shows up in the corpus sixty-two times with its medial th-sound spelt four different ways: oþþæt 23 times, oððæt 14 times, oðþæt 22 times and oþðæt 3. There doesn’t appear to be any particular rhyme or reason to the distribution of these variations – they don’t cluster together within texts, and seem to be randomly distributed between texts. (They might cluster by dialect, but figuring out whether this is the case or not is more research than I am willing to do in the name of Scrabble.) Beowulf features all four.
In Latin, u and v are interchangeable, as are i and j. The Latin Scrabble set’s solution to what to do about this is only to include one letter of each pair, and to feature only v’s and i’s. It would be possible to do something similar for Old English Scrabble, and lump occurrences of þ and ð together under one or the other; in the case of a professionally produced set, with printed tiles, it might well also be cheaper to produce four þ tiles rather than two of each. However, given that production costs weren’t a factor in the making of this prototype, and also that I feel that part of the point of Old English Scrabble would be to showcase its ridiculously wide variety of spelling, I decided to leave both of them in, but to treat them as interchangeable for all gameplay purposes.
The final distribution
With my corpus complete and my alphabet sorted out, I was able to run the frequency-analysis program for the final time and see what it gave me. The final figures, which I used as a baseline for determining letter numbers and values, were as follows:
From these percentages I did a bit of judicious juggling and rounding to obtain my set of tile numbers and values. The rounding is necessary because you obviously can’t have three-quarters of an e tile; the juggling, because once you round 7.91% a’s up to eight tiles, that means you have to leave one out somewhere else. The whole process is also slightly complicated by having to map percentages, fractions of 100, onto only 98 tiles – can’t leave out the blanks! (Some languages ignore this and have 102 tiles including blanks. I feel 100 is a much neater number.)
The final values came out as:
e: 14 tiles worth 1 point
n: 9 tiles worth 1 point
a: 8 tiles worth 1 point
o, r: 6 tiles worth 1 point
d, s: 5 tiles worth 1 point
i, l, g, h: 4 tiles worth 1 point
t, m, w, f: 3 tiles worth 2 points
æ, u, c: 3 tiles worth 3 points
þ, ð, y: 2 tiles worth 4 points
b: 1 tile worth 5 points
p: 1 tile worth 8 points
x: 1 tile worth 10 points
I cut them out of medium-thickness corrugated cardboard from the side of an ordinary cardboard box, and wrote the letters and values on in felt-tip; I didn’t make a board as I already had a perfectly usable Modern English one. For people interested in constructing their own sets in a similarly low-budget fashion, a Scrabble tile is 19mm to a side; if you have a new-style board, as I do, with small raised lines between the spaces, you can get away with 2cm tiles.
I stopped there, for the moment. I am however kind of tempted to buy a cheap Scrabble set and do it over properly, altering the board into Old English and everything. To which end, I here provide translations should anybody else wish to do this for themselves. (All translations done by me. I can’t vouch for their accuracy other than to say I have checked them against a dictionary and my Magic Grammar Sheet, and I came top of the class in my Old English translation exam.)
- The title. Other-language versions of Scrabble seem just to go with a Scrabble in foreign format: Welsh is Scrabble yn Gymraeg, Spanish is Scrabble: Edición en Español. Following this line, Old English Scrabble would be Scrabble on Englisc (Bosworth-Toller gives examples of both the accusative and dative being used for ‘in English’. I’ve gone with accusative.) If, on the other hand, you wanted to translate the title outright, it comes from the verb ‘to scrabble’ and, insofar as single-word titles have grammar, seems to be an imperative. (You! Scrabble!) Old English doesn’t have an equivalent verb, but you could easily make one up – scrablian, perhaps – and give it an imperative ending. Anyone for a game of Scrabla?
- The board. I’ve never been sure whether the ‘double’ and ‘triple’ designations were imperatives or not – the difference is between ‘this letter counts double’ (adjective or adverb, hard to tell) or ‘double this letter’ (imperative). The French edition uses the adjectival forms, giving you ‘lettre double’ and ‘mot triple’. On that model, you would be labelling an Old English board with the following: word twifeald, double word; stæf twifeald, double letter; word þrifeald, triple word; stæf þrifeald, triple letter.
And now . . .
I have a viable-seeming letter set, but there’s only one way to find out whether the game is actually playable – actually playing it. With this in mind I will be testing it out on medievalists of my acquaintance as soon as I can (which won’t be for at least a few weeks as term hasn’t started yet, but soon, hopefully). If it turns out that the current letter set is unbalanced, I’ll have to rejig, and will probably post about that as well.
With real play in mind, there have to be rules, obviously.
The rules (may be modified as proves necessary)
- Basic rules of play are the same as Modern English Scrabble.
- All inflections of a word are permissible plays, including all possible combinations of number, gender, case, tense, voice, person, and so on. Regular but not specifically attested forms are played at the discretion of the group.
- For attested words, all attested spellings are legitimate, though you may get dirty looks for particularly outlandish ones.
- Eth (ð) and thorn (þ) are to be considered interchangeable.
- Final rulings on permissible spellings are to be taken from An Anglo-Saxon Dictionary, ed. J. Bosworth and T. N. Toller, as revised. (I am aware that the Dictionary of Old English, ed. A. Crandell Amos et al., supersedes Bosworth-Toller, but as it currently only goes to G it is less than ideal for Scrabble purposes.)