The premise of this project was simple: investigate whether a machine-learning-inspired probabilistic approach could automatically generate new, playable Pac-Man levels.
Instead of hand-crafting every map, I wanted a system that could look at original retro layouts, learn why they worked, and spin up brand-new ones on demand.
The Strategy: Spatial Markov Chains
A standard Markov Chain is usually used for sequences, predicting the next state based on the previous one. To make this work for a 2D grid-based arcade map, the generator had to predict what tile should appear at a specific position based on its local neighbors.
If a pure local model is left to its own devices, it completely loses the plot and forgets the global structure — it starts putting ghost chambers in random corners. To fix this, I engineered an enhanced contextual state by looking at five distinct features:
- The tile to the left
- The tile directly above
- The upper-left diagonal tile
- The horizontal region (divided into thirds)
- The vertical region (divided into thirds)
By adding those horizontal and vertical regions, the generator learned global context: top regions tend to contain long horizontal wall structures, center regions need a dedicated ghost chamber, and lower regions should be more open for player traversal.
C#// The DNA of a tile position
markovState += leftTile + upperTile + upperLeftTile + yThird + xThird;
During training, the system crunched existing maps, counted which tiles followed which states, and normalized those numbers into valid probability distributions. When generating a level, it samples these probabilities tile-by-tile, introducing controlled randomness without creating utter chaos.
The Core Challenges
- The Balancing Act: If you give the state representation too much information, it creates unique, hyper-specific states. During generation, the system hits an "unseen state," gets confused, and falls back to default behavior. I had to carefully balance specificity with generalizability.
- Local vs. Global: Because Markov Chains only model local spatial relationships, they don't inherently understand high-level concepts like "is there a valid path from point A to point B?"
The final evaluated models generalized beautifully, hitting a test score of ~0.798. This proved the system was actively capturing meaningful structural layouts rather than just memorizing and copying the training maps.
Going Rogue: The Rule-Based Generator
Alongside the learned Markov model, I implemented an extra-credit procedural generator that took the opposite approach: strict, handcrafted rules.
Instead of relying on probabilities, this generator guarantees a classic arcade feel by using clean, hardcoded layout constraints:
- Perfect Symmetry: It automatically mirrors left-side structures over to the right side, ensuring visual balance.
- Spawn Safety: It uses explicitly reserved zones to completely ban obstacles from spawning anywhere near Pac-Man or the central ghost chamber.
- Dynamic Layouts: While the rules are strict, the dimensions, pillar counts, and corridor lengths are completely randomized on every single run.
Future Improvements
While the current generator builds some incredibly nostalgic layouts, there are a few things I'd love to add next:
- Pathfinding & Connectivity Validation: Adding an automated pathfinding check would guarantee every single pellet is actually reachable — right now, the Markov model could theoretically trap a ghost behind a solid wall.
- Incremental Complexity Scaling: Tying the probability matrices directly to game difficulty, generating tighter, more claustrophobic corridors as the player advances to higher levels.
- Hybrid Generation: Using the rule-based generator to lay down a perfectly symmetric, safe base layout, and letting the Markov Chain fill in the creative details.
The Takeaway
Local statistical models can generate surprisingly believable, highly replayable game content if you design the underlying state representation with enough care.
The project is fully functional inside Unity, proving that you don't always need a heavy, unoptimized neural network to generate smart content. Sometimes, a well-tuned probability matrix and some clever spatial logic are all it takes to keep the ghosts moving.
focused