Wednesday, April 10, 2013

Nasty details

John Cowan wanted more detail.  If device drivers and bus cycles are not up your alley, I suggest you skip this post.  Seriously.


Still here?  You were warned.

For various historic reasons, the LMI Lambda was built as a NuBus card set. The NuBus interface could be driven from the memory interface, and this microcode shows how:
((MD) a-map-scratch-block)      ;following inst gives maps time to settle.
 ((M-1) DPB C-PDL-BUFFER-POINTER-POP (BYTE-FIELD 8 24.) A-1)  ;FULL 32 BIT NUBUS ADR
 ((L2-MAP-CONTROL) (a-constant 1464))    ;no caching.
 ((L2-MAP-PHYSICAL-PAGE) LDB M-1 (BYTE-FIELD 22. 10.) A-ZERO)
 ((VMA-START-READ) LDB (BYTE-FIELD 8 2) M-1 a-map-scratch-block)
The first line loads the MD (memory data) register with the virtual address of the "scratch block". The second line pops the desired NuBus address off the stack. The third and fourth lines write the Level 2 maps (the maps are addressed by MD) to address NuBus space. The final line initiates the bus cycle, and the result can be read from MD when it arrives.

With some argument checking and boxing of the result, this is surfaced to the Lisp world as a primitive instruction called %NUBUS-READ:
(DEFMIC %NUBUS-READ 761 (NUBUS-SLOT SLOT-BYTE-ADR) T)
                                ;SLOT is really the high 8 bits.
                                ;the "top F" can be supplied via slot,
    ;avoiding bignums.
(SETF (DOCUMENTATION '%NUBUS-READ 'FUNCTION)
  "NUBUS-SLOT is top 8 bits of physical address (usually the top 4 bits are 1's).
SLOT-BYTE-ADR is the lower 24 bits of physical address.")
;;arglist = (NUBUS-SLOT SLOT-BYTE-ADR)
If there was a debug card installed on your machine, you could find out which NuBus slot it was in by reading the system configuration, and you could talk to it by calling %NUBUS-READ and %NUBUS-WRITE:
(defun assure-debug-board ()
  (when (null *my-debug-board*)
    (let ((slot (find-a-debug-board)))
      (setq *my-debug-board* (logior #xF0 slot)))))     ;put it in slot space

(defun read-debug-board (address)
  (%nubus-read *my-debug-board* address))

(defun write-debug-board (address data)
  (%nubus-write *my-debug-board* address data))
And if you know the offsets of the control registers on the card
(defregister debug-mode-register              *my-debug-board* #xFFF7FC :read-write)
(defregister debug-address-register           *my-debug-board* #xFFF7F8 :write-only)
(defregister debug-data-start-transfer        *my-debug-board* #xFFF7F4 :write-only)
(defregister debug-response-register          *my-debug-board* #xFFF7F4 :read-only)
(defregister debug-control-start-transfer     *my-debug-board* #xFFF7F0 :write-only)
(defregister debug-status-response            *my-debug-board* #xFFF7F0 :read-only)
(defregister debug-nubus-analyzer-ram-pointer *my-debug-board* #xFFF7EC :read-write)
(defregister debug-nubus-analyzer-ram-data    *my-debug-board* #xFFF7E8 :read-write)
(defregister debug-nubus-analyzer-ram-control *my-debug-board* #xFFF7E4 :read-write)
(defregister debug-nubus-analyzer-function    *my-debug-board* #xFFF7E0 :write-only)
(defregister debug-test-register              *my-debug-board* #xFFF7C4 :read-write)
and what the bit fields in the registers are
(defconstant %%mode-init                 (byte 1. 0.))
(defconstant %%mode-master               (byte 1. 1.))
(defconstant %%mode-led                  (byte 1. 2.))
(defconstant %%debug-mode-loopback       (byte 1. 3.))
(defconstant %%debug-mode-speed          (byte 1. 4.))
(defconstant %%debug-mode-txwait         (byte 1. 6.))
(defconstant %%debug-mode-response-ready (byte 1. 7.))

(defconstant %%debug-control-start-bit (byte 1. 0.))
(defconstant %%debug-control-ack-bit   (byte 1. 1.))
(defconstant %%debug-control-tm-bits   (byte 2. 2.))
(defconstant %%debug-control-axr-bit   (byte 1. 4.))

(defconstant %%debug-response-start-bit (byte 1. 0.))
(defconstant %%debug-response-ack-bit   (byte 1. 1.))
(defconstant %%debug-response-tm-bits   (byte 2. 2.))
(defconstant %%debug-response-axr-bit   (byte 1. 4.))
and some good values to stuff in
(defconstant $$init 1.)

(defconstant $$disable-mastership 0)
(defconstant $$enable-mastership  1)

(defconstant $$led-off 0)
(defconstant $$led-on  1)

(defconstant $$disable-loopback 0)
(defconstant $$enable-loopback  1)

(defconstant $$slow-transfers 0)
(defconstant $$fast-transfers 1)

(defconstant $$transmitter-idle 0)
(defconstant $$transmitter-busy 1)

(defconstant $$no-response    0)
(defconstant $$response-ready 1)
it is pretty straightforward to run the debug card. Here are some examples
(defun reset-debug-board ()
  (setf (debug-mode-register) (dpb $$init %%mode-init 0)))

(defun blink-debug-light ()
  (loop
    (sleep .5)
    (setf (debug-mode-register)
          (dpb $$led-on %%mode-led (debug-mode-register)))
    (sleep .5)
    (setf (debug-mode-register)
          (dpb $$led-off %%mode-led (debug-mode-register)))))

(defun reset-nubus-analyzer ()
  (setf (debug-nubus-analyzer-function)
        (dpb $$disable-analyzer %%nubus-analyzer-enable 0)))

(defun board-idle? ()
  (= $$transmitter-idle (ldb %%debug-mode-txwait (debug-mode-register))))
The debug card in the Lambda was connected via a serial cable to the debug slave card in the K machine test rack. Writing to the data, addr, and control registers on the card would cause the slave to perform bus cycles on the test rack.
(defun debug-read-word (addr)
  (write-debug-addr addr)
  (write-debug-control #x01)
  (wait-for-debug-response))

(defun debug-write-word (addr data)
  (write-debug-data data)
  (write-debug-addr addr)
  (write-debug-control #x09)
  (wait-for-debug-response))
Now that we can do bus cycles on the test rack, we can talk to other cards on the test rack. Some of the data and control registers of the K machine can be read directly from the bus.
(defvar k-mem-addr   nil "K Processor - Memory address base")
(defvar k-io-addr    nil "K Processor - I/O address base")
(defvar k-mode-addr  nil "K Processor - Mode register address")
(defvar k-hptr-addr  nil "K Processor - History RAM pointer address")
(defvar k-hram-addr  nil "K Processor - History RAM data address")
(defvar k-pc-addr    nil "K Processor - Program Counter address")
(defvar k-mmfio-addr nil "K Processor - MMFIO bus address")
(defvar k-spy0-addr  nil "K Processor - Low spy Instruction Register address")
(defvar k-spy1-addr  nil "K Processor - Hi spy Instruction Register address")
(defvar k-spyc-addr  nil "K Processor - Spy command register address")
(defvar k-int-addr   nil "K Processor - NUBUS interrupt register base address")
Of particular interest here is the Spy Command register. This can be written with one of these values
(defconstant $$spy-command-stop                0.)
(defconstant $$spy-command-run                 1.)
(defconstant $$spy-command-step                2.)
(defconstant $$spy-command-reload-instruction  3.)      ;called ICLOAD in spy-pal
(defconstant $$spy-command-clear-opc-clock     4.)
(defconstant $$spy-command-set-opc-clock       5.)
(defconstant $$spy-command-clear-spy-mode      6.)
(defconstant $$spy-command-set-spy-mode        7.)
(defconstant $$spy-command-stepmode-full-clock 8.)      ;called clr-stepmode
(defconstant $$spy-command-set-stepmode 9.)             ;       set-stepmode
The Spy Command register allows you to start, stop, reset, and debug the machine externally. For example,
(defun k-stop ()
  "Stop the processor clocks, and init spy modes"
  (k-spy-cmd $$spy-command-stop)
  (k-spy-cmd $$spy-command-stepmode-full-clock)
  (k-spy-cmd $$spy-command-set-spy-mode)     ; set spy mode
  (k-spy-cmd $$spy-command-clear-opc-clock)     ; clear opc clk
  (setq k-run-flag nil)
  (k-read-spy-pc))

(defun k-run ()
  "Start the processor running"
  (k-spy-cmd $$spy-command-stop)
  (k-spy-cmd $$spy-command-clear-spy-mode)
  (k-spy-cmd $$spy-command-set-opc-clock)
  (k-spy-cmd $$spy-command-reload-instruction)
  (k-spy-cmd $$spy-command-run)
  (setq k-run-flag t)
  (k-read-spy-pc))
Especially useful was this one:
(defun k-step ()
  "Step the processor one clock cycle"
  (k-stop-if-running)
  (k-spy-cmd $$spy-command-stop)
  (k-spy-cmd $$spy-command-clear-spy-mode)
  (k-spy-cmd $$spy-command-set-opc-clock)
  (k-spy-cmd $$spy-command-reload-instruction)
  (k-spy-cmd $$spy-command-step)
  (k-read-spy-pc))

You may have noticed that it takes multiple local NuBus cycles to instruct the debug card to issue a single remote cycle, and it takes multiple remote cycles to drive the K hardware through the spy interface. It was fast enough for debugging, but it is much faster to put the K processor in the same rack as the LMI Lambda and get rid of the debug board altogether. We used the debug board early on so we could turn off the test rack and remove the hardware.

Still here? I salute your intestinal fortitude. The above code was selected from the files at https://code.google.com/p/jrm-code-project/source/browse/#svn%2Ftrunk%2Fkmachine

Thursday, April 4, 2013

Esoterica

I occasionally get email about things I never thought I would have to remember.  Recently, I was asked how the LMI K-machine booted.  The short answer is this:

In the K-machine schematics, in the NuBus Spy Data Paths, there is a switch near the upper right labeled AUTOBOOT.  If it is on, the machine boots from the boot prom.  Otherwise, the machine powers up in a halted state and some other device on the bus is expected to boot it externally.

Writing is hard.  My original post went into detail about we used a serial cable from an LMI Lambda to a NuBus debug card in the test rack.  You could write the K-machine memory from the NuBus to load a bootable program, then you could single step the machine through the debug path.

(Ref:  The code for the K-machine remote debugger.)

The nitty gritty details are there in the code.

Monday, March 18, 2013

About nothing

Here's a way to quickly compute the leftmost digit of a number:

(define (leftmost-digit radix n)
  (if (> radix n)
      n
      (let ((leftmost-digit-pair (leftmost-digit (* radix radix) n)))
 (if (> radix leftmost-digit-pair)
     leftmost-digit-pair
     (quotient leftmost-digit-pair radix)))))

It's harder to compute the remaining digits.

Monday, January 14, 2013

Memory management

My college professors constantly encouraged us to "Go back to first principles."

Consider a computing task that runs for some amount of time and then halts.  If a task dynamically allocates more memory than is available, it must re-use some (or crash!)  This is irrespective of the means of re-use, whether manual deallocation as in malloc/free or automatic deallocation with a garbage collector.

The amount of allocated memory at any point in the program is the difference between the total amount allocated up to that point and the total amount deallocated up to that same point.

allocationfinal = allocationtotal - freetotal

or

freetotal = allocationtotal - allocationfinal

If we free memory in discrete amounts, then the average amount freed each time is simply the total amount freed divided by the number of times we freed memory (by definition). The average amount retained each time is easily computed by subtracting the amount freed from the memory size. This is all simple arithmetic.

In particular,

freetotal / deallocation count = freemean (by definition)

and at deallocation,

memory size - freemean = retainedmean (ditto)

The total amount of memory that a task uses might vary when the task runs at different times and different memory settings. A task could use reflection to adjust memory consumption according to resources, or the task might change resource consumption because of external circumstances such as time or memory alignment. But we expect that simple deterministic tasks will consume resources in a repeatable manner. If that is true, then the total allocation and the amount of reachable storage at any time should not depend upon the amount of memory. If you reduce the amount of memory, you'll just need to recycle it that much more to make up the difference.

For a fixed amount of freed memory, the deallocation count times freemean is constant. So for some task that frees a certain amount of memory, the product of the deallocation count and freemean will lie on a hyperbola. The product of the deallocation count and the memory size will not lie exactly on a hyperbola, but it will be pretty close, especially if the memory size is quite a bit larger than retainedmean. Again, this is simply the consequence of arithmetic. Whether we use a garbage collector to deallocate or some other memory management technique doesn't matter.

In the Economics of Garbage Collection, Singer and Jones investigate the issue of garbage collection from a microeconomics background. They introduce what they call the allocation curve of the program. A benchmark program is run with several different heap sizes and the number of garbage collections is counted for each run. The red curves in this figure show the results:
We see the expected pseudo-hyperbolas. (The blue lines are elasticity, which is not relevant here.)

Singer and Jones created these charts by empirically measuring execution of benchmark programs. But it is pointless to "measure" an arithmetic relationship. The positions of the points on the charts are deterministic located at the point where the deallocation count times the amount freed is constant. If a point does not fall on the expected line, this can only be because the total allocation or the retainedmean have changed. One of the desiderata of a good GC benchmark is having the total allocation and retainedmean be invariant across different heap sizes. The benchmarks used by Singer and Jones are not invariant across different heap sizes (or the pseudo-hyperbolas would be perfect), but the variations are generally small, so the charts are pretty close.

If we partition our allocation into "small" and "large" allocations, then we will have a pseudo-hyperbola for each case. Singer and Jones's figure 5 illustrates this:

Singer and Jones note that some benchmarks have a pronounced "knee" in the curve. In this blogpost I show how the appearance of a "knee" is an artifact of presentation, and not a property of the data.

Singer and Jones note that the upper extreme of the curve is the point at which the amount of memory a program has available is equal to what the program needs. No deallocation need be done. The extreme lower end is the point where the amount of memory given is just under the maximum amount of live storage needed. Singer and Jones note that the curve approaches an asymptote, but they do not identify the asymptote as the value of retainedmean. (Of course the curve does not reach the asymptote unless the peak memory usage is the same as the mean.)

In this post, I plot the GC count vs. the memory size for various runs of MIT Scheme. The plot is in log-log space, so the product of the GC count and memory size falls on a line rather than a hyperbola. The axes are swapped from the convention of Singer and Jones, so the asymptote is vertical rather than horizontal. A non-zero value of retainedmean displaces the hyperbola to the right and makes the upper end approach a vertical asymptote rather than intersect the axis. The blue "unadjusted" line is simply a plot of GC count * memory size. The green "adjusted" line is GC count * (memory size - retainedmean).


Monday, August 20, 2012

A little puzzle

I set my cell phone display-up on a flat surface (a table) and I turn on Google Maps.  I rotate the phone so that it "points north" (that is, the phone is on its back, the display is facing up, the top edge of the display is facing north, the left and right edges are parallel to rotation of the earth).

On Google Maps there is a little arrow-shaped icon that represents the position and orientation of the phone.  Naturally, it points at the top of the map, which is at the top of the display, and therefore the little arrow points north as well.

If I now rotate my phone slightly clockwise, what does the arrow do?

      a) move in the same direction as the phone (clockwise)
      b) move in the opposite direction as the phone (counter-clockwise)
      c) not rotate

and how fast?

    a) twice the speed as the phone rotates
    b) exactly the same speed as the phone
    c) half the speed as the phone rotates
    d) it doesn't rotate

Try to solve this in your head, then check with your phone.

Thursday, July 26, 2012

The ability to write nested loops is not a sufficient condition of employment

But it ought to be a necessary one.

Suppose you were helping me write a TicTacToe program. I have this Java code so far:
package com.vaporware.tictactoe;

import com.vaporware.common.logging.FormattingLogger;

class Board {
  Piece cells [][];

  Board() {
    this.cells = new Piece [3][3];
  }

  boolean xWins () {
    return wins(Player.X);
  }

  boolean oWins () {
    return wins(Player.O);
  }

  boolean wins (Player player) {
    // FINISH ME!
    return false;
  }
}
Assume further that there is a getOwner() method on a Piece that returns a Player.
Can we finish the wins (Player player) method?

Tuesday, June 5, 2012

Curious

(define (fmod numerator denominator)
  (- numerator
     (* (floor (/ numerator denominator))
 denominator)))

(define (test-fmod numerator denominator)
  (let ((answer (fmod numerator denominator)))
    (if (> answer denominator)
 (begin (display "Whoops: ")
        (display (list numerator denominator answer))
        (newline)))))

1 ]=> (do ((i 0 (+ i 1))) ((>= i 1000)) (test-fmod (random 1.0) (random 1.0)))

; Value: #t

1 ]=> (test-fmod .59 .01)
Whoops: (.59 .01 1.0000000000000009e-2)
;Unspecified return value