Every bug hunt is a story on its own.
Perhaps it's like storis about fishing or hunting:
You <sehr gerne> tell them everybody, but, well to hear them might be bnoring.

So <ich reisse mich zusammen> und here is only one of those debugging stories.
(and a rather long one!)


MACHINE ALMOST WORKING
The Problem hunted here was a M8266 DATA PATH card on a 11/34a.
When switchingh on, the machine did not boot into M9312 console emulator.
But when manually started at the entry address of the emualtoir with
165020 LAD START it showed its repsonse on the serial port, but a bit scrambled.
<foto>

However the all emulator commands (Load address, Deposit, Examin, STArt) seem to work.
I let PDP11GUI "type in" the PDP-11 diagnsotic CXCPAG, which test the basic
PDP-11 instruction set. After start, CXCPAG failed.


HALT AFTER POWER ON
First I tried to understand why  the machine sotpped after power on.
This can have many causes. Since the self-test in the M9312 monitor run with out error,
I did not supsect a hard error on the data path card, but instead a intermittent problem.
Perhaps a bad chip driver output, which only after power-on had the wrong signal.
Those errors are very difficult to trace down.

But when I looked at the trace of micro steps, I found that
the stop was caused by the self testing micro progrram, that is started after power on.

The startup-micro code tests wether the PC is working:
It is cleared, incremented, and compared to zero and overflow.
Most of the central data path loop is tested.
So it was a "hard" error: good news!

STARNGE PATTERNS
So the power-on selftest found an error, while the M9312 selftest  (which is
more <gründlich> did not.
You can manually EXAm and DEPOSIT alues into CPU regsiters over the KY11 programmer cosole.
THis test almost hte same data apth componetns as the build.in pwoer.-on selftest.
So I wrote some values into the PC and read them back
<foto ky11.b>
777707 LAD      (7777707 is the UNIBUS address of R7 = PC)
177777 DEP      wirte all one's
EXAM => 000010
Apparently some thing on data path was damaged.

I concetranted on bit 0 in thge data path and follwoed it:
from SPM through ALU to AMUX, SSMUX and BUS interface.
Debugging was <behidnert> by the fact, that the error soimetimes was there, and some times not :


Luckily it was not intermitted, but follwed a strange rhytm
777707 LAD      write to PC
1777777 DEP     write all ONes
EXAM => 000010  see wrong value #1 in PC
1777777 DEP     write again
EXAM => 170340  see wrong value #2 in PC
1777777 DEP
EXAM => 000010  see again  value #1 in PC ??

When testing with different data values, I foud a cionfusing relationship:
The error depended only on Bit 15 of the data value:
When using 100000 (only bit 15 set) as test value, I got the error behaviour:
777707 LAD      write to PC
1000000 DEP      write 1 to bit15
EXAM => 000010  see wrong value #1 in PC
1000000 DEP     write again
EXAM => 100000 OK
1000000 DEP
EXAM => 000010  see again  value #1 in PC ??

When using 077777 (only bit 15 cleraed), everything was fine:

777707 LAD      write to PC
077777 DEP      write 0 to bit 15
EXAM => 077777  OK
077777 DEP
EXAM => 077777  OK
....

Well this was interesting, but not helpful!

DATA FLOW ON DATA PATH
Anoterh day I debugged into data path and finally found, that the 4-way multiplexer
AMUX on sheet K1-4 got wrong switch signals:
instead of switching to ALU output (S0:1 = 00), it switched to some Interruptvector.
(the micro code ROMS on CONTROl generate differnet interrupt vectors for differnet traps.
These are fed in over AMUX too).

An ACTIVE TRAP
And also the INT VEC line got active, so the machine was indeed trying to perform a TRAP.
THe trap logic is on sheet K2-3. Cheking the inputs of ROM E52 resulted in a
active IR CODE 00. THe instruction decoder E53 genrated IR CODE2:0 to 001, clearly indicating an
"ILLEGAL ISNTRUCTION" trap.
This explained the value 000010 I read on EXAM: 0000010 is just the trap vector for
ILL INSTR.

ILLEGAL INSTRUCTION?
This is nonsense: in halted state, the Instruction Register is cleared,
so it contains all 0's: the HALT opcode.
I controlled IR of the stopped machine: indeed there was all 0's IN IR15..IR.0
This was as expected, so why the Illegal instruction trap?

USER MODE AFTER POWER-ON
When looking in the schematic K2-6 next day, I noticed that ROM E53 has one input labeld
"user mode". A lucky insight came up:
The 11/34 can be operated in "kernel" and "user mode". I know this
mode would influence the address genreantion in the memory managment unit (MMU).
But also in "user mode" certain instructions are forbidden and generate a illegal instruction trap"
And HALT is one of these: user programs may not stop the machine!

An indeed: I found USERMODE = H!
THis is wrong, because after beeing HALTed the machine is always in kernel mode.

CHANGING PSW
So next question: where does USER mode signal come from?
It is connected to the Program Status word register PSW, bit 15.
(bit 15 and 14 contain the current mode.)

The PSW is located on sheet  K1-4
The PSW is not a full 16b it regsiter, because most bits are generated by Condition codes and not writable.
PSW15..PSW13 are implemented as 4 bit flip flop of type 74175, labeled E82.

When looking at the CLOCK of E82, I found that the PSW is indeed written at,
at the same moment were the PC is written.
So when I do a
DEP 1777777 into 777707, also all ones are written into PSW 15:13.
This causes USER MODE to be set, which switches the vector 000010 onto the data path.
On the next DEP, 0000010 is written into PSW. PSW15 goes to 0 then, USER mODE is cleared,
the trap condition is cleared and AMUX show the content of the internal datapath again.
If BIt 15 of the value in PC is 1 (like in 1777777) the next DEPOosit sets USER MODEagain, and so on.

So the strange rhytm was also expained.
777707 LAD      write to PC
1777777 DEP     write all ONes
EXAM => 000010  see wrong value #1 in PC
1777777 DEP     write again
EXAM => 170340  see wrong value #2 in PC
1777777 DEP
EXAM => 000010  see again  value #1 in PC ??

PSW alwys
But why I see a 170340 instead of my 1777777?
Hmmm, 170340 look like a PSW content: apparently
on Adressing 7777707, I actually work on the PSW = 7777776 all the time?

PSW IS ALYWAYS LOADED
The PSW register is loaded from intenral data path, if
LOAD HPSW on sheet 1-4 goes L->H.
And indeed: when writing to 777707, the PSW also got a LOAD Signal.
LOAD HPSW is generated by (too much) little gates :
E101 nad E112 on sheet K1-3,
by E122 on sheet K1-1, and by an address decoder ROM on K1-10.

GOT IT!
I probed all the gates and was almost running out of logic analyzer probes,
when I detected a logical malfunction in the OR gate 7432 E122:
despite one input was High the output was LOW.

 

<foto 7432>

Changing E122 immediately cured the strange behaviour when
DEPOSITing into R7. YAHOO!