2

For fun I am trying to solve the Wordle related problem Matt Parker talks about in the linked video from his channel Standup Maths in Haskell. Basically: find 5, 5-letter words that do not share any letters in common. Thus using 25 different alphabet letters in total (for English).

It took Matt a month of computer time in Python. He freely admits his code was not at all optimized and he shows a viewer submitted program that solved it in 15 minutes on his laptop.

Matt Parker Wordle Video

Here is my naive Haskell code, and note I tried to use parallel computation, but I am sure I don't know what I'm doing there because I'm only seeing one core active. I am not sure it will take a month to run, but after a few hours, my laptop was a toaster and I admitted I am not good at optimizing.

I borrowed an idea from https://www.youtube.com/watch?v=947Ewgue4DM where he encodes the words as 26 bit binary. However, I did not use his mechanism for reducing the size of the comparison lists where he skips ahead to the first possible match. I was not able to figure that out in Haskell.

Also, I did as Matt does and remove anagrams from the word list and then at the end, spit out all the anagrams a given binary encoding corresponds to.

Also, all of the impactful inefficiency is in the unique5Combos function. Meaning, the rest of the code is not efficient either, but it doesn't have much impact. I was able to get pairs of 5 letter words with no duplicate letters for the entire corpus in under 2 seconds. Getting 3 words took a lot longer (not sure how long, I went to get lunch and it was done when I returned), and getting 4 and 5 words unknown time to complete.

Lastly, I am running this on a 8 year old laptop with Arch Linux. So I'm not expecting blazing fast, but I am sure my simple code could be better at limiting the size of data to evaluate and evaluating faster.

I welcome any suggestions. Many thanks in advance.

module Main where

import Data.Bits ( Bits(testBit, (.&.), bit) ) 
import Data.Char ( ord, chr, intToDigit, toUpper ) 
import Numeric (showIntAtBase) 
import Data.List ( nub, sort ) 
import Control.Monad ( guard )  
import Control.Parallel.Strategies ( parListChunk, rdeepseq, using )

main :: IO () 
main = do
        let fn = "/usr/share/dict/words"
        w5 <- filter (\w -> length w == 5 
                        && nub w == w
                        && all (\c -> c `elem` ['A'..'Z']) w)  
                <$> lines
                <$> map toUpper 
                <$> readFile fn 
        let e5 = (sort . nub . map bitEncode) w5
        let e25 = unique5Combos e5 `using` parListChunk 1000000 rdeepseq
        let answer = (map . map) (\x -> anagrams (bitDecode x) w5) e25
        print $ answer

unique5Combos :: [Int] -> [[Int]] 
unique5Combos es = do
                   a <- es
                   b <- es
                   c <- es
                   d <- es
                   e <- es
                   let abcde = [a,b,c,d,e] 
                   guard (isSorted abcde)
                   guard ((foldr1 (.&.) abcde) == 0)
                   return abcde

isSorted :: [Int] -> Bool 
isSorted = and . (zipWith (<) <*> tail)

bitEncode :: [Char] -> Int 
bitEncode xs = sum [bit (90 - ord x) | x <- xs]

bitDecode :: Bits a => a -> [Char] 
bitDecode xs = [chr (90 - n) | n <- [25,24..0], testBit xs n]

anagrams :: [Char] -> [[Char]] -> [[Char]] 
anagrams xs w5 = [w | w <- w5, (sort xs) == (sort w)]
amalloy
  • 89,153
  • 8
  • 140
  • 205
dsagman
  • 481
  • 1
  • 8
  • This looks very closely related to https://stackoverflow.com/q/73385955/625403. You might find some useful inspiration there. Most obviously your 5-deep list comprehension followed by a `guard` that the numbers turned out to be sorted is very expensive, compared to iterating in a way that doesn't bother to try non-sorted orders at all. – amalloy Aug 25 '22 at 23:26
  • @amalloy, thank you. It is very similar, it is the code from the other video I linked. Apologies for not finding that thread myself. I am trying the approaches from there using Data.List tails rather than the sort. Seems to be performing better. I will update after some more research. – dsagman Aug 26 '22 at 01:13
  • It would be interesting if someone that was an expert in Haskell actually did all the optimizations and parallelisations. From that video the JAVA version has been optimized to run in under 1 second and if you go to the gist, you see that the C and Rust versions are even faster. – matt Aug 26 '22 at 01:56
  • I have rolled back your edits. We don't do "Update" edits on Stack Overflow that change the substance of the question, because they make existing comments or answers confusing. If a comment or Answer answers your original question, and you now have a new question, please ask that as a new Question. You can find your edited text in the edit history of this question, if that's useful for your new question. – amalloy Aug 26 '22 at 18:17
  • Got it, thanks. Didn't know. Won't make the same error again. – dsagman Aug 26 '22 at 21:53

0 Answers0