One of the things I have said off and on is that a computer is a robot without legs. Now I am trying to give legs to a computer.

——

Have not played scrabble since I was a kid. To get to know a command called grep a little better, tried to explore how it could be used to an advantage. What would one need? Maybe a list of words and a way to search the list of words. Let’s see there is a list of words on a linux system. That file could be easily copied to the home directory.

$ cp /usr/share/dict/words .

Might want to add to the word list later. That can be done with a simple text editor such as nano, vim, or etc. For now, how can I search through the file. The grep command will work perfectly. Well how do you use it. Lets say we want to add some letters where there is ed at the end. We use the “$” sign to show that we want something at the end of the word.

$ grep “ed$” words


Alfred
Americanized
Anglicized
Appleseed
Brailled
Englished
Ethelred
Fed
Fred
Frenched

This generates a rather long list. so we might want to paginate the output.

$ grep “ed$” words | less

or we might want to save the list to a file we can edit and peruse for later use.

$ grep “ed$” words > wordfile

That is nice, now lets do letters at the beggining of the file. We use the “^” sign to show that we want something at the beggining of the word.

$ grep “^th” words


thank
thanked
thankful
thankfuller
thankfullest
thankfully
thankfulness
thankfulness’s

You can combine the two commands so that you can find a word that has the beginning and the ending we want. We want to use an “*” to say any letters can go in between. In this case thought we want one of the letters to be “m”. We use “[]” to show which possible letter or letters we want in between.

$ grep “^zo*[m]*ed$” words

zed
zoomed

There you can use the “[]” to say I want to have any words that start with these two letters.

$ grep “^[xz]”


xylem’s
xylophone
xylophone’s
xylophones
xylophonist
xylophonists
z
zanied
zanier
zanies
zaniest

$ grep “^[xz]“


xylem’s
xylophone
xylophone’s
xylophones
xylophonist
xylophonists
z
zanied
zanier
zanies
zaniest

grep “^m..t..s$” words
mantels
mantles
martyrs
masters
matters
mentors
misters
mittens
mortals
mortars
mottoes
mouthes
mutters
mystics

Blow by blow:

* ^ – The carat (shift-6) says “this is the beginning of the line”. Without it, it would find all words like “fundamentals”.
* $ – The dollar sign is the same thing, only for the end of the line. Without it, you’d also get words like “mattresses”.
* . – The period means “any character here”. One, and one only, character will match here.

But suppose you’re dealing with a game besides a crossword puzzle, like Scrabble for instance, and you’re limited by more constraints than in a crossword. You might want to ‘hook’ (Scrabble lingo for ‘add letters to the beginning or end of a word to form more words’). So, let’s see how many words end in “are”.

ß grep “are$” words | wc -l
43

Well, those are good odds. But we hit the edge of the board with some of them (I peeked). So, we need words that are seven letters or less which end in “are”. “^….are$” would get all of the seven letter words, but not the shorter ones. The solution is rather cryptic this time:

ß grep “^.{1,4}are$” words
airfare
aware
bare
beware
blare
care
compare
dare
declare
ensnare
fanfare
fare
flare
glare
hare
mare
pare
prepare
rare
scare
share
snare
spare
square
stare
unaware
ware
warfare
welfare

…but we’ve met the caret, dollar sign, and period before, so really the new part is the {1,4}. This says “match as few as one, and as many as four, repetitions of the previous character”. The activator for the number range is the curly braces, which then have to be escaped with slashes (does anybody know why, class?). And since the previous character is a period, which matches any letter, we’ve found all the words shorter than eight letters which end in “are”.

This is all well and good, but we only have so many letters to work with in Scrabble at one time. Say that our current rack has the letters “C F T W A B M”. Can we limit it to only words which use those letters?

ß grep “^[cftwabm]{1,4}are$” words
aware
bare
care
fare
mare
ware

Ah, now we’re getting somewhere! The [] square brackets give the set of acceptable characters. Another way to use them is to express a range (e. g. [0-9]), but that’s hardly the usual case in word games.

The only limitation is the imagination. If you are playing scrabble, let your partners know what you doing to prevent any hurt feelings. I like to use this same idea for crossword puzzles also.

Update:

If you want to combine two or more word lists to have a greater vocabulary to work from is fine also: (words.new must be an ascii or text file.)

$ mv words words.old
$ cat words.old words.new > words.temp
$ sort words.temp | uniq -u > words

List amount of words

$ wc -l words

You should now have a larger word file!

One other hint:
One zip file I downloaded of words had the words in files separated by first letter. I needed them in one list. No problem.

$ cat A.DIC > words.new
$ cat B.DIC >> words.new
$ cat C.DIC >> words.new
….

Translates to

$ cat {A..Z}.DIC > words.new

The files had an unneeded carriage return (^M) that needed to be removed.
loaded words.new into vim

$ vim words.new
:%s/^M//
Removing smiley face from blog post:
exit with “: x”
$ _

and that took care of that. (Surprised I did not need a “”).

Like anagrams or wonder what words you can make then try the “an” command.

$ sudo apt-get update
$ sudo apt-get install an

$ an “instructables”
inscrutable st
inscrutable t’s
inscrutable ts
inscrutable t s
inscrutable t s
….
….
….

Want a limited list of only 10 options try:

$ an -n 10 “instructables”
inscrutable st
inscrutable t’s
inscrutable ts
inscrutable t s
inscrutable t s
inscrutable t s
inscrutable t s
incurables st t
incurables t’s t
incurables ts t

Advertisements