What has happend before: My PDP-11/34 has a CPU problem, and I try to identify the error with the help of DEC diagnostics.

I already have found the diagnostics in the database: needed are FKTA, FKAB, FKAC and FKTH.

I also was able to run them: simply by booting XXDP 2.5 from RL02 disk.

FKAC !

Here diagnostics FKAA ("basic instruction test") and FKAB ("trap test") run without errors.

But FKAC exposed an error in the extended instruction set (EIS) of the PDP-11/34. These are some complex instructions like MUL, DIV and advanced SHIFTs, which are not implemented in the original PDP-11/20 design.

See the whole log from booting RL02 up to the error message:

@DL
XXDP-SM SMALL MONITOR - XXDP V2.4
REVISION: D0
BOOTED FROM DL0
16KW OF MEMORY
UNIBUS SYSTEM

RESTART ADDRESS: 072010
TYPE "H" FOR HELP

.R FKAC??
FKACA0.BIC

001006   000001
001150   000003
001312   000005
001454   000007
001616   000011
001762   000013
001006   000001
... and so on ...

The listing

Hmm, these error messages are not exactly "self-explanatory" ...

To understand them, the assembler listing for FKAC must be consulted.  It is in file "MAINDEC-11-DFKAC-A-D_1134-EIS-Instruction-Tests_Dec75.pdf".
Some hints are found in section "Error":

fkac errprintout

So the first error message

001006   000001

means: "Error detected by test at address 1006".

Well, let's search the listing for address 1006:

fkac code1006

You see a line

        001006   004767 015426   JSR PC,$HLT

This calls the subroutine which prints the error message. The listing has good comments, you learn a lot of them. Apparently the test of the "ASH" instruction (flexible "arithmetic shift left/right") failed.
The CPU status flags after a ASH seem to be other than expected.

Stabilize the error

I repeated the test several time, to rule out a temporary flaw. I also reseated the both CPU boards M8266 and M8265 several times, to make sure the board connections were good. And I swapped boards until I narrowed the error location down to the M8265 "data path" board. The error was there every time, so we nailed it.

DEC field service now would've simply replaced the M8265. But we accept the challenge and dig deeper!

Begin fixing

If you want to fix PDP-11 electronics, you need to search for malfunctioning chips with a logic analyzer.
Board schematics are usually available in "FPMS"  (Field Maintence Print Sets) documents for of the failing hardware. First address to search is bitsavers.org again.

The next step is to convert the observable error (here: the error printout) into a trigger condition for the logic analyzer. This can be very easy ... or almost impossible!

Triggering on a code address?

First idea may be to connect your logic analyzer ("LA") to the UNIBUS address and trigger on occurence of the value "001006" there. Then the failing diagnostic statement was executed and signal of the LA probes would indicate the pre-history of the error in the TTL circuits on M8265.

 I found triggering on UNIBUS addresses not to work for several reasons:

  • you need a lot of LA probes for the 18 UNIBUS address lines and the MSYN/SSYN signals .. at least 20!
  • some diagnostics are relocated by the memory management unit. So what you see on UNIBUS are physical addresses with no simple relation to the logical addresses in the assembler listing.
  • On the UNIBUS you monitor program code fetches, but not the code execution. On PDP-11 this is usually the same (no pipeline!), but you can get fooled (for example, if a program moves itself).

Trigger on a HALT breakpoint!

I used another trick: I overwrite the call to the error printout routine with a HALT opcode. In other words: I set a destructive breakpoint.
So instead of printing the error address, the 11/34 simply HALTs. This has some pros and cons:

  • A processor HALT is directly represented by the electrical signal HALT GRANT. This signal is accessible on every UNIBUS SPC (Small Peripheral Controller) slot. So I connect one probe of my LA to the SPC slot and use this as trigger condition.
  • After the HALT you can examine registers and memory over the programmer console.
  • You have to re-insert the breakpoint manually every time you run the diagnostic.
  • You can not continue program execution after the breakpoint is hit, because the HALT overrides code and destroys the test program.

The HALT signal can be grabbed from a SPC slot over the wire wrap pins on the backplane's backside. But using an flip chip extender card is more convenient:

halt on SPC slot

The extender (the board with the green connector) is plugged into the SPC rows C and D.
On row D the usual GRANT CONTINUITY card is placed.
In row C a small adapter card is used to connect the logic analyzer to HALT GRNT on pin T1.
Also on row C the NPG chain on pins A1 and B1 is closed with a jumper.

Instead of a special adapter you can also use the slot fingers of a killed flip chip card.

Inserting a HALT by hand

pdp1134 console 1006

To insert a HALT at address 1006 in diagnostic FKAC use this procedure:

@DL
XXDP-SM SMALL MONITOR - XXDP V2.4
REVISION: D0
BOOTED FROM DL0
16KW OF MEMORY
UNIBUS SYSTEM

RESTART ADDRESS: 072010
TYPE "H" FOR HELP

.

BOOT into XXDP

.L FKTA??
Load FKAC into memory, but do not execute it.

[CTRL]-[HALT]

Operate the PDP-11/34 programmers console to HALT XXDP.
[CLR] 1006 [LAD]

Set address to 1006.

 

[EXAM] -> "004767" Check: 1006 must contain the JSR instruction as shown in the listing.
[CLR] [DEP] Overwrite 1006 with "00000", this is the HALT instruction.
[CTRL]-[CONT] Continue running XXDP.
   
 

Connect the logic analyzer to the circuit.
Set trigger condition on the HALT GRANT probe.

Start the logic analyzer.

.S
Start the manipulated FKAC diagnostic.
"S" needs no arguments, the start address is still known from the "L" command above.

 Now on error the machine stops. You can examine all memory and registers over the console,
For example at 1006 you would EXAM the value of variables TEMP4 and PSWORD.
In parallel, the LA should trigger ... the bug hunt in the circuit begins!