Hello Wordle

Basics of how Wordle works

Wordle is a word game where you get 6 tries to guess a word that changes every day. For every attempt you enter you get three types of hints;

green if the letter is in the solution and in the right place, yellow if the letter is in the solution but in the wrong place and finally gray if there is no more of the letter in the solution.

Attempts at writing the algorithm

The rules are pretty simple so writing the an algorithm should've been easy. My first thought was creating dictionaries for every type of hint and adding letters with their positions to the dictionary, so for every letter in a guess if the letter is green, the letter along with it's position will be added to the dictionary, same for yellow and gray except for the position which won't matter in grays.

Then simply go over every word in a list of words and remove the word if it doesn't have all green and yellow letters, if it has a gray letter, or if it has a green or a yellow letter in a position it's not supposed to be in.

def to_colors(guess: str, hints: dict[int, str]) -> tuple[list]:
    green = dict()
    yellow = dict()
    gray = set()

    for i, hint in hints.items():
        letter = guess[i]
        if hint == True: green[i] = letter
        elif hint == False:
            value = yellow.get(letter, [])
            yellow[letter] = value + [i]
        elif hint == None: gray.add(letter)

    return (green, yellow, gray)


def eliminate(guess: str, words: list, green: dict, yellow: dict, gray: set) -> list:
    retained_words = copy(words)
    counter_green = Counter(green.values())

    def check_green():
        for position, letter in green.items():
            if guess[position] != letter:
                retained_words.remove(word)
                return True
        return False

    def check_yellow():
        for letter, positions in yellow.items():
            if guess.count(letter) < len(positions):
                for position in positions:
                    if guess[position] == letter:
                        retained_words.remove(word)
                        return True
        return False

    def check_gray():
        for letter in gray:
            greens = counter_green.get(letter) if counter_green.get(letter) != None else 0
            yellows = len(yellow.get(letter)) if yellow.get(letter) != None else 0

            if letter in guess:
                if guess.count(letter) < greens + yellows:
                    retained_words.remove(word)
                    return True
        return False
        
    
    for word in words:
        if check_green(): continue
        if check_yellow(): continue
        if check_gray(): continue

    return retained_words

That approach (other than being written terribly) has a few issues; firstly it doesn't account for the fact that the solution can have the same letter appearing multiple times and secondly that a gray hint means that there's no more of the letter in the word not that the letter simply isn't present in the word.

This way took an average of 14 guesses to get to the solution if it didn't already eliminate the solution itself from the possibilities. After this point I had a few ways of rewriting the code, but it was really hard to get something that ran fairly quick and wouldn't crash my computer due to the the amount of memory the nested loops took.

How the Algorithm works

Finally I did find a way that was good enough, each letter has 4 properties; known positions, impossible positions, minimum count and count frozen. The three types of hints could easily be converted into these properties.

@dataclasses.dataclass
class LetterData:
    """Class for storing each letter's match data"""

    known_positions: set = dataclasses.field(default_factory=set)
    impossible_positions: set = dataclasses.field(default_factory=set)
    min_count: int = 0
    count_frozen: bool = False

Then I'd have to make a function that took each guess and it's hints and converted them to those properties.

def build_letters_data(letters: dict[str, LetterData], guess: str, hints: dict):
    """Take the hints from Wordle and update each letter's LetterData object"""

    yellows = {}

    for i, hint in hints.items():
        letter = guess[i]

        # green - letter is in the word and in the right position
        if hint is True:
            letters[letter].known_positions.add(i)
        # yellow - letter is in the word but not in the right position
        elif hint is False:
            letters[letter].impossible_positions.add(i)
            yellows[letter] = yellows.get(letter, 0) + 1
        # gray - there is no more of the letter in the word
        elif hint is None:
            letters[letter].count_frozen = True

        # min count of the letter: positions in green + positions in yellow
        letters[letter].min_count = len(
            letters[letter].known_positions) + yellows.get(letter, 0)

Now based on the letter data for each letter another function would go through every word in a set of words and check if every letter in the word matches these properties and if not it's removed from the set. This was a bit more complicated for the yellow hints but it could be done.

def eliminate(possible_words: set, guess: str, letters: dict):
    """Check every word in wordslist and remove it if it doesn't meet the requirements"""

    retained_words = copy.copy(possible_words)

    for word in possible_words:
        for i, letter in enumerate(word):

            if i in letters[guess[i]].known_positions and word[i] != guess[i]:
                retained_words.remove(word)
                break
            elif i in letters[letter].impossible_positions:
                retained_words.remove(word)
                break
            elif letters[letter].count_frozen is True:
                if word.count(letter) != letters[letter].min_count:
                    retained_words.remove(word)
                    break

        # check if all the yellows are present in the word in their possible positions
        if word not in retained_words: continue
        for letter in letters.keys():

            if not letters[letter].impossible_positions: continue
            elif letter not in word:
                retained_words.remove(word)
                break
            elif word.count(letter) < letters[letter].min_count:
                retained_words.remove(word)
                break

    return retained_words

Now after all of this a word can be picked from the possible words to be the new guess and everything would be done all over again to get every next guess until the guess is the solution (all letters are green).

Optimizations

In the possible words you could just pick any word at random to be your new guess which would be fine, mathematically it should get to the solution in a maximum of 5 attempts if you follow all the hints correctly therefore beating the game every time.

But I had to try and do better so every guess entered would now be added to another set and every guess guessed would be compared against every possible word and a score would be generated of how unique the word is from the ones guessed till now. There was again another problem with this; the most unique words would also be the words that have a lot of repeating letters for example if the guess was stack, both dizzy and blink would be share the same ratio of comparison with it even though blink is clearly a better guess so they don't really give us much additional information.

To maximize the information gained you'd also have to account for those cases, so then I used SequnceMatcher to get ratios between a possible word and all the guesses and also subtracted the total letters in the word (5) from the unique letters and added that along with the ratio to get a score, then I choose the word with the smallest score to be the new guess.

def choose_word(guesses: set, possible_words: set, randomize: bool = False):
    """Get an optimized choice of a word to be the next guess from the possible words"""

    if randomize: return random.choice(list(possible_words))
    comparison = {}
    # The best next guess seems to be the one that differs most from the previous guesses
    # since this diversifys the letters used therefore maximizing the hints received

    for word in possible_words:
        for guess in guesses:
            # get the similarity between the word and the guess
            comparison[word] = comparison.get(word, 0) + SequenceMatcher(
                None, word, guess).ratio()
        # number of total letters - number of unique letters
        comparison[word] = comparison.get(word, 0) + len(word) - len(set(word))

    return min(comparison, key=comparison.get)

After this and a few more functions I finally had an efficient algorithm to guess any given word, I just had to get it to actually work on the Wordle website which using Microsoft's Playwright module and a few more days I did, and I also managed to do the same for Absurdle (a wordle spinoff) and here's all the code. You can run it yourself, or instead run wordguesser.py and give it your own word to guess if you don't want it to spoil today's Wordle for you.

mp4

Wordle Solver Recording.mp4

2.6MB
The algorithm solving Wordle

The "Not a word in list" errors are caused as the word list I'm using (from the Moby project) has way more words than Wordle's word list, I choose to keep it this way as it doesn't affect the way my algorithm works and keeps the code modular and immune to any changes made to either Wordle or Absurdle's word lists, even though it means it will run slightly slower.


Write a comment ...

Write a comment ...