Inside the Machine: How a CPU Works
The Central Processing Unit (CPU) is the brain of the computer. It doesn't think like a human; instead, it follows a strict routine. Its job is simple: get an instruction, understand it, and do it billions of times every second.
The Instruction Cycle
At its core, a CPU repeats a basic loop called the Fetch-Decode-Execute cycle. This is driven by the internal clock (measured in GHz).
1. Fetch
The CPU has a special register called the Program Counter, which holds the address of the next instruction. The CPU asks the memory (RAM), "What's at this address?" and pulls that raw instruction code into the Instruction Register.
2. Decode
The raw 1s and 0s (binary) don't mean anything until they are interpreted. The Control Unit decodes this binary. It might translate 10110001 into "Add the numbers in Register A and Register B" This is where the specific language of the CPU (Instruction Set Architecture or ISA) matters.
3. Execute
Now that the CPU knows what to do, it acts.
ALU(Arithmetic Logic Unit): If it's math (add, subtract) or logic (AND,XOR), theALUcrunches the numbers.Memory Operations: It might move data betweenregistersandRAM.Control Flow: It might jump to a different part of the program (changing theProgram Counter).
Pipelining
If the CPU did steps 1, 2, and 3 strictly one after another, parts of the circuitry would sit idle. To fly through instructions, modern CPUs use Pipelining.
- While Instruction A is
Executing... - Instruction B is
Decoded... - And Instruction C is
Fetched.
This ensures that every part of the CPU is busy every single clock cycle.
Superscalar Architecture
Pipelining is great, but what if we could do two things at once?
Superscalar processors have duplicate functional units (like multiple ALUs). Instead of just overlapping the stages (pipelining), the CPU physically executes multiple instructions in the exact same cycle, provided they don't depend on each other.
The scheduler analyzes the stream of instructions to find ones that don't depend on each other. It then assigns these independent instructions to available execution units, allowing them to run side-by-side in the same clock cycle.
The Memory Hierarchy
Modern CPUs are significantly faster than main memory (RAM). If the processor had to fetch data from RAM for every operation, it would spend the majority of its cycles idle, waiting for data to arrive.
To solve this, we use a hierarchy of caches:
Registers(Instant): Tiny storage right inside the CPU. Access takes ~1 cycle.L1 Cache(Very Fast): Small, extremely fast memory built into the core. ~4 cycles.L2/L3 Cache(Fast): Larger but slightly slower caches shared between cores. ~10-40 cycles.Main Memory/RAM(Slow): Massive storage but far away. ~200 cycles.SSD/Disk (Glacial): Millions of cycles.
The goal is to keep the data that the CPU needs in the L1 cache, ensuring the processor never has to stall.