Monday, May 20, 2024

Maybe machine (part 2)

At the end of 6.004, we had a contest for improving the performance of the Maybe machine. There was an 8-bit ALU included in the chip set we were given, so the logical thing to do was to incorporate it into the machine in place of the shift register. This would naturally improve the performance of arithmetic operations, but there were other reasons that the Maybe machine was slow.

One of the rules of the contest was that we were not allowed to change the clock speed. But the Maybe machine had a four phase clock. The first phase would address the device driving the bus, the second phase would connect it to the bus, the third phase would clock the receiving device, and the fourth phase would keep the data asserted to obey the hold time on the logic.

The four phase clock was technically unnecessary. We had built it in this manner to illustrate set-up and hold times for the bus, but the TTL chips we were using were specifically designed to be used on an open collector bus with no hold time, so you could actually clock the bus on every other phase rather than divide by four and still be within valid hardware specs. The fundamental clock speed was not changed, but the machine would run twice as fast.

The microcode word on the Maybe machine had four parts. The first part loaded the shift register from memory. The second part operated on the shift register. The third part wrote the shift register back to memory, and the fourth part was the next microcode address. But much of a typical microinstruction was wasted time. More often than not, the shift register was loaded from the last location it was written to, so it already contained the data it needed. The load cycle was often unnecessary. If you just left the data in the shift register, it would already be there for the next instruction. A memory to memory move involved a no-op on the shift register, which took clock cycles. Advancing to the next instruction took more clock cycles.

Instead of a four part microinstruction, I made a one-part microinstruction. True, this meant that I would occasionally need more microinstructions, but each one was smaller, and I could omit the no-ops, so I could accomplish the same amount in fewer clock cycles. I eliminated the next-instruction field by replacing the microcode address register with a counter.

The original Maybe machine had a two-address microcode — each instruction would specify a separate load and store address. Most people retained the wide microcode and could specify each input to the ALU independently. But I wanted to squeeze each microinstruction into 16 bits, so I turned the machine into a one-address machine with an accumulator. This made my microcode one quarter of the size, but it made it difficult to find enough bits in the microcode to accomplish all the tasks it needed to do. I didn’t have enough bits to directly address the accumulator, so I attached the accumulator to the output of the ALU. One ALU input came from the bus, the other was fed back from the accumulator. To load the accumulator, you had to pass the data through the ALU.

There weren’t enough bits for jump instructions, but I did a hack: I incremented the instruction address half way through the instruction cycle. This way, you could read the bits of the next instruction while you were still executing the current instruction. So you could do an unconditional jump by placing the target in the next instruction. There still weren’t enough bits for a conditional jump, so the microcode address space ended up being partitioned into sixteen segments and you could only conditionally jump within the same segment. To jump between segments you had to use an unconditional jump.

I obviously had to do a complete rewrite of the microcode. My friend Bob Baldwin helped me out by writing an assembler for the new microinstruction format.

The new microcode ran substantially faster than the original. When the contest came, my machine was clearly faster than the competition. Unfortunately, it crashed on the third benchmark and I ended up disqualified. I think it was a bug involving switching between the segments of microcode, but I never had the opportunity to debug it fully. I learned a hard lesson about keeping things simple.

No comments: