Sunday, 14 February 2010

Anagrams

Another programming challenge is to compute anagrams from a dictionary and output the longest sets of anagrams first. We submitted the following elegant 3-line F# solution:

let words = System.IO.File.ReadAllLines @"unixdict.txt" [|for _, words in Seq.groupBy (Array.ofSeq >> Array.sort) words -> Array.ofSeq words|] |> Array.sortBy (fun s -> -Seq.length s)

This tiny program produces the following sets of anagrams in only half second:

val it : string [] [] = [|[|"alger"; "glare"; "lager"; "large"; "regal"|]; [|"caret"; "carte"; "cater"; "crate"; "trace"|]; [|"abel"; "able"; "bale"; "bela"; "elba"|]; [|"elan"; "lane"; "lean"; "lena"; "neal"|]; [|"angel"; "angle"; "galen"; "glean"; "lange"|]; [|"evil"; "levi"; "live"; "veil"; "vile"|]; [|"aden"; "dane"; "dean"; "edna"|]; [|"emit"; "item"; "mite"; "time"|]; [|"pare"; "pear"; "rape"; "reap"|]; [|"esprit"; "priest"; "sprite"; "stripe"|]; [|"hare"; "hear"; "hera"; "rhea"|]; [|"resin"; "rinse"; "risen"; "siren"|]; [|"keats"; "skate"; "stake"; "steak"|]; [|"lascar"; "rascal"; "sacral"; "scalar"|]; [|"amen"; "mane"; "mean"; "name"|]; [|"nepal"; "panel"; "penal"; "plane"|]; [|"lemon"; "melon"; "menlo"; "monel"|]; [|"least"; "slate"; "stale"; "steal"|]; [|"leap"; "pale"; "peal"; "plea"|]; [|"ames"; "mesa"; "same"; "seam"|]; [|"leapt"; "petal"; "plate"; "pleat"|]; [|"lien"; "line"; "neil"; "nile"|]; [|"abet"; "bate"; "beat"; "beta"|]; [|"mate"; "meat"; "tame"; "team"|]; [|"beard"; "bread"; "debar"; "debra"|]; [|"lament"; "mantel"; "mantle"; "mental"|]; [|"aires"; "aries"; "arise"; "raise"|]; [|"enol"; "leon"; "lone"; "noel"|]; [|"cereus"; "recuse"; "rescue"; "secure"|]; [|"manor"; "moran"; "norma"; "roman"|]; [|"latus"; "sault"; "talus"; "tulsa"|]; [|"diet"; "edit"; "tide"; "tied"|]; [|"lima"; "mail"; "mali"; "mila"|]; [|"are"; "ear"; "era"; "rae"|]; [|"apt"; "pat"; "pta"; "tap"|]; [|"dare"; "dear"; "erda"; "read"|]; [|"ate"; "eat"; "eta"; "tea"|]; [|"ant"; "nat"; "tan"|]; [|"yates"; "yeast"; "yeats"|]; [|"nerve"; "never"; "verne"|]; [|"ether"; "there"; "three"|]; [|"dave"; "vade"; "veda"|]; [|"earn"; "near"; "rena"|]; [|"brag"; "garb"; "grab"|]; [|"magneto"; "megaton"; "montage"|]; [|"earnest"; "eastern"; "nearest"|]; [|"den"; "end"; "ned"|]; [|"lair"; "liar"; "rail"|]; [|"alton"; "talon"; "tonal"|]; [|"now"; "own"; "won"|]; [|"earth"; "hater"; "heart"|]; [|"dire"; "reid"; "ride"|]; [|"bard"; "brad"; "drab"|]; [|"bare"; "bear"; "brae"|]; [|"earthen"; "hearten"; "teheran"|]; [|"kale"; "lake"; "leak"|]; [|"arnold"; "roland"; "ronald"|]; [|"cpu"; "cup"; "puc"|]; [|"earthy"; "hearty"; "thayer"|]; [|"abut"; "tabu"; "tuba"|]; [|"lame"; "male"; "meal"|]; [|"hank"; "kahn"; "khan"|]; [|"riot"; "tori"; "trio"|]; [|"dater"; "trade"; "tread"|]; [|"rite"; "tier"; "tire"|]; [|"army"; "mary"; "myra"|]; [|"alert"; "alter"; "later"|]; [|"acts"; "cast"; "scat"|]; [|"carven"; "cavern"; "craven"|]; [|"grate"; "great"; "greta"|]; [|"rosa"; "soar"; "sora"|]; [|"argot"; "gator"; "groat"|]; [|"demo"; "dome"; "mode"|]; [|"klein"; "kline"; "liken"|]; [|"lisa"; "sail"; "sial"|]; [|"cruel"; "lucre"; "ulcer"|]; [|"along"; "anglo"; "logan"|]; [|"listen"; "silent"; "tinsel"|]; [|"list"; "silt"; "slit"|]; [|"iran"; "nair"; "rain"|]; [|"baird"; "braid"; "rabid"|]; [|"dearth"; "hatred"; "thread"|]; [|"ape"; "epa"; "pea"|]; [|"eros"; "rose"; "sore"|]; [|"ares"; "sear"; "sera"|]; [|"acm"; "cam"; "mac"|]; [|"gas"; "gsa"; "sag"|]; [|"acme"; "came"; "mace"|]; [|"part"; "rapt"; "trap"|]; [|"argon"; "groan"; "organ"|]; [|"ester"; "steer"; "terse"|]; [|"acre"; "care"; "race"|]; [|"brain"; "brian"; "rabin"|]; [|"parse"; "spare"; "spear"|]; [|"bin"; "ibn"; "nib"|]; [|"result"; "rustle"; "ulster"|]; [|"armco"; "macro"; "marco"|]; [|"janos"; "jason"; "jonas"|]; [|"ante"; "nate"; "neat"|]; [|"goer"; "gore"; "ogre"|]; ...|]

This is one of the shortest and fastest of all the solutions on Rosetta Code. For example, the Haskell solution is more than twice as long at 8 lines of code.

7 comments:

Paddy3118 said...

Could you possibly update your RC code to use the wordlist mentioned in the task?

Thanks.

Flying Frog Consultancy Ltd. said...

@Paddy3118: Our F# code already used the wordlist mentioned in the task.

Paddy3118 said...

Hi again,
The correct answer is that given by most of the other examples such as Haskel, Common Lisp, Factor, Groovy, Python, ...

Might you have interpreted the task goal differently from others, and if so, how?

Thanks, Paddy.

Paddy3118 said...

Aha, What you have failed to do is print just the largest sets of anagrams. But then, it seems strange that you missed the obvious huge size of your output compared to the output of other examples when you did your comparison?

- Paddy.

Anonymous said...

Unfortunately your comment on Haskell's line count ignores the point that the Haskell code gives the right answer and the F# code does not.

Flying Frog Consultancy Ltd. said...

@Paddy: You're quite right that I missed the size of my output because my code prints all sets of anagrams and not just the largest.

@Anonymous: I have uploaded a correct solution that is still substantially shorter than the Haskell.

surikator said...

Comparing such short programs in number of lines is not accurate. If I write a one line program in OCaml and then write the same exact program with an additional line containing ";;", I will have just DOUBLED the size of my program.

You should compare number of characters, possibly renaming all variables to 1 character identifiers.