Using a Hammer to Turn A Screw

Mitchell Allen

30 Apr 2025 — 5 min read

Do not try this at home

Summary:

When LLMs Meet Letter Insertion: A Case Study

I wanted to know if ChatGPT could help me brainstorm a list of word pairs, where the second word differed from the first by the insertion of a single letter between the first and second letters of the first word, such as FIGHT → FLIGHT.

What happened next surprised me, taught me something fundamental about how LLMs work, and might change how you approach your own puzzle design. Let's break it down...

The Setup: A Seemingly Simple Challenge

Letter insertion puzzles are perfect for puzzle books and activity collections. They're straightforward and engaging.

Here's the puzzle type I was working with:

The rules are straightforward:

Start with a base word (like FIGHT)
Insert exactly one letter after the initial letter (L)
Create a new, legitimate word (FLIGHT)

To get ChatGPT started, I provided these examples:

FIGHT → F[L]IGHT
BEACH → B[R]EACH
FARE → F[L]ARE
SLAT → S[P]LAT

This pattern is elementary - something a third-grader could understand. Insert one letter, make a new word. What could go wrong?

ChatGPT's Responses: The Unexpected Journey

I prompted ChatGPT with those four examples and asked it to generate more puzzles following the same pattern. After each failed attempt, I followed up with refinements, admonishments and even a request to refine my prompt:

Session 001.png

Session 002.png

Session 003.png

Session 004.png

Session 005.png

Here is ChatGPT's nonsensical response (notice how quickly it descends into gibberish):

Let's analyze this step-by-step:

Analysis of the Pattern

In the original examples, a single letter is inserted specifically in the second position of the original word, resulting in a valid word:

FIGHT → FLIGHT (insert "L" after "F")

BEACH → BREACH (insert "R" after "B")

FARE → FLARE (insert "L" after "F")

SLAT → SPLAT (insert "P" after "S")

This creates a clear rule:

Insert exactly one new letter between the first and second letters of the original word.

Ensure the result is a valid word.

My Suggestions vs. the Pattern

Correct pairs:

CART → CHART (insert "H")

RATE → SLATE (insert "L")

Incorrect pairs:

PLANE → PLANT: The new letter was added in the wrong position.

SPOT → SPORT: Same issue; the new letter wasn't placed in the second position.

TRIM → TRAMP: Doesn't follow the second-position insertion rule.

GRIP → GRAPE: The inserted letter wasn’t placed in the second position.

MEAL → METAL: Again, wrong placement of the new letter.

BLADE → BLAME: Letter placement doesn’t follow the rule.

PILE → SPINE: Completely off-pattern.

BRIDE → PRIDE: No second-position insertion.

Additional Incorrect Pairs:

FIRM → FIIRM: Invalid word.

PINE → PIANE: Invalid word.

RACK → RACER: Wrong letter placement.

Meta-Cognition: Why Did Errors Occur?

Misunderstanding: Initially, I didn't focus on the strict "second-position" insertion rule.

Overlooking validity: Many of my suggestions didn't result in valid words.

Prompt ambiguity: The task could benefit from explicitly defining the rules, particularly the position of insertion and the requirement for valid words.

Improved Prompt for Correct Output

To generate accurate pairs, here's a clearer prompt:

"Create word pairs where the second word is derived from the first by inserting a single letter in the second position (between the first and second letters of the original word). Ensure that the resulting word is valid."

Correct Suggestions Following the Rule

Now applying the clarified rules:

FIGHT → FLIGHT

BEACH → BREACH

FARE → FLARE

SLAT → SPLAT

CART → CHART

RATE → SLATE

GRAB → GRACE

PLAN → PLANE

SCAN → SCENE

TRAP → TRACE

These pairs follow the pattern perfectly!

Out of the six new examples ChatGPT generated after regurgitating my examples, only CART, CHART actually followed the pattern correctly. Yet, the AI was absolutely convinced it had nailed it.

To understand why this happens, we need to look at how AI actually processes text.

Looking Under the Hood: Why ChatGPT Struggles

ChatGPT doesn't see text the way we do. Instead of working with individual letters, it processes chunks of text called "tokens."

Think of it like this: where you see individual letters in a word like F-I-G-H-T, ChatGPT sees the whole word as one unit. It's not designed to manipulate individual characters but to predict which chunks of text are likely to follow others.

This is why ChatGPT can memorize and repeat your examples (FIGHT → FLIGHT) but struggles to truly understand the letter-insertion rule. It's like trying to do needlepoint while wearing oven mitts – the tool just isn't designed for that level of precision.

Why This Matters for Puzzle Designers

Why should you care about this AI quirk as a puzzle designer?

Understanding tool limitations is crucial for working efficiently. Here's what this means:

ChatGPT is phenomenal for certain puzzle types – theme generation, crossword clues, trivia questions, riddles, and story-based puzzles.
It's terrible for character-level puzzles – letter insertions, deletions, anagrams (except when citing famous examples), and other puzzles requiring precise letter manipulation.
Its confidence doesn't match its competence – ChatGPT will confidently present incorrect examples.
Few-shot examples help but don't solve everything – The AI can mimic patterns it sees but struggles with character-level rules.

For letter-based puzzles, better tools include:

Word list databases with filtering capabilities
Traditional word puzzle construction software
Your own brain (still champion at these tasks!)

The Right Tool for the Right Job

ChatGPT is like a hammer. It's fantastic for pounding nails (generating ideas, writing descriptions, creating themes). But if you use it to turn a screw (manipulate specific letters), it will make a mess.

Here's a practical workflow leveraging the strengths of both AI and traditional methods:

Use ChatGPT for:
- Brainstorming puzzle themes
- Creating puzzle instructions
- Writing engaging clues
- Developing story elements
- Generating category ideas
Use traditional tools for:
- Letter manipulation puzzles
- Word list generation with specific patterns
- Precise formatting requirements
- Grid-based puzzles
- Anything requiring character-level precision

This hybrid approach helps create puzzle books faster while maintaining quality and originality.

Key Takeaways for Puzzle Designers

✓ Understand AI limitations before planning your puzzle creation
✓ Don't trust ChatGPT's confidence when it comes to letter manipulation
✓ Create a hybrid workflow using the right tools for each task
✓ Use AI for ideation and traditional tools for implementation
✓ Always verify AI-generated puzzles before including them in your designs

The puzzle design field grows more sophisticated every day. Understanding these nuances gives you an edge – you won't waste time trying to force AI to do things it wasn't designed for.

Efficient creation means more puzzles and better designs.

A Final Thought

It's ironic that AI, which can write essays and answer complex questions, stumbles on something as elementary as adding a letter to a word. This is actually empowering for puzzle designers – our human creativity and understanding of language patterns still has unique value.

Use ChatGPT for what it's good at, but don't expect it to handle the character-level magic that makes word puzzles delightful.

Knowing which tool to use for which job is the mark of a true craftsperson.

Using a Hammer to Turn A Screw

Mitchell Allen

When LLMs Meet Letter Insertion: A Case Study

The Setup: A Seemingly Simple Challenge

ChatGPT's Responses: The Unexpected Journey

Analysis of the Pattern

My Suggestions vs. the Pattern

Meta-Cognition: Why Did Errors Occur?

Improved Prompt for Correct Output

Correct Suggestions Following the Rule

Looking Under the Hood: Why ChatGPT Struggles

Why This Matters for Puzzle Designers

The Right Tool for the Right Job

Key Takeaways for Puzzle Designers

A Final Thought

Read more

What We Learn in the Gap

Lockbox Mystery

Paint a Picture

Picture This