I have a MicroVax in a BA23 case. It has a RD54 disc with 160 MB, and a TK50 tape drive. Also a raster video card is built in. This is the card cage:

 

CPU M7620 BA KA650 workstation license EK-KA650-UG
M7621 MS650-AA 8MB Mem
M7621 MS650-AA 8MB Mem
M7516 DELQA
M3106 DZQ11-M
M7169  VCB02 4-plane video controller module
M7168  VCB02 4-plane colour bitmap module
M7555 RQDX3 MFM+floppy
M7546 TMSCP TK50
M7513 RQDX extender

The case label says “MicroVAX II/GPX”, but since the CPU is KA650, it is a uVAX 3200/3400/3600. (reference). DTJ 7 states that KA650 is for MicroVAX 3500/3600.

Other stickers say: Model: 6300V-B3, SN: AY 81901335.

Bad documentation, but enough other KA6xxx CPU’s

On the KA650 CPU is a CVAX 78034 VAX CPU chip ... the one Bob Supnik developed in his time at DEC.

Despite I searched a lot, I did not found any technical description for the KA650 CPU. vt100.net lists “EK-180AB-MG KA650 CPU SYS MAINTENANCE GUIDE” and “EK-KA650-UG KA650 GUIDE”, but has none of them. So I had to use lots of similar CPU documentation, as for KA640, KA655, KA660, KA680. This puts my further conclusion on an instable ground.

Luckily I found all schematics for the KA650 in a document called “MP02538 650QS Pedestial BA213 Field Maintenance Print Set”.

And the KA650 CPU and its cache is described in “Digital Technical Journal Number 7” (DTJ 7)

Self test on boot:

Did I told you? My MicroVAX has an error:

At boot, it displays:

KA650-B  V1.2/0123                               

Performing normal system tests.

23..22..21..20..19..

?05.50 2 0C FE 04 0000
10000000 10012000 00002000 00000000 00000000
00000000 00000000 00000000 1000B4F8 00000000
1000B500 55555555 55555555 AAAAAAAA AAAAAAAA
00000960 10000000 AAAAAAAA 00002000 80C00040

18..17..16..15..14..13..12..11..10..09..08..
07..06..05..04..03..


Normal operation not possible.
>>>

I decoded this error as follows, according to :

?05.50 2 0C FE 04 0000                                                   (1)
10000000 10012000 00002000 00000000 00000000                             (2)
00000000 00000000 00000000 1000B4F8 00000000                             (3)
1000B500 55555555 55555555 AAAAAAAA AAAAAAAA                             (4)
00000960 10000000 AAAAAAAA 00002000 80C00040                             (5)

The first line “?05.50 2 0C FE 04 0000” means this:

  • "05.50" is the number of the test that bombed.
    A list of test is printed with “>>>test 9e”. This lists “05.50” as
    “05  50  6760  Cach2_integrty  start_addr end_addr addr_step *******”
    So the cache is the problem.
  • "2" is the severity factor.
    "2" causes the register dumps to be displayed and the autoboot prohibited.
    "1" just prints this error message line, and doesn't disables the autboot functionality.
  • "0c" "error" is a number, that in conjunction with listings files, isolates  to within a few instructions where the diagnostic detected the error. This field is also called subtestlog.
  • "FE" "de_error" is the code of the error found.
    FF: normal error exit form diag,
    FE: unanicipated interrupt,
    FD: interrupt in cleanup mode,
    FC: interrupt in interrupt handler,
    FB: test script requirements not met,
    FA: no such diagnostics,
    EF: unanticipated exception in executive.
  • "04" "vector" is the SCB vector (if non-zero) through which an unexpected exception or interrupt trapped, when the de_error field indicates an unexpected exception or interrupt (FE or FF)
    “0000" "count" is the number of previous errors encountered

Line (2): P1..P5 are the first five longwords of the diagnostic state.
This is internal information that is used by repair personnel.
Line (3): P6..P10 are the last five longwords of the diangostic state.
Line (4): R0..R4 are the first five GPRs ate the moment the error was detected
Line (5): R5..R8 are additional GPRs and ERF is a diagnostic summary longword

The last 32 bit value is ERF and very important. I use KA655 documentation,“EK-306A-MG-001 KA655 CPU System Maintenance”, page 4-33. The KA655 has a “SOC” chip, which is a CVAX 78034 CPU, CFPA floating point processor, clock and 8KB second level cache combined. I hope also it’s ROM-based diagnostics are close enough to my KA650.


Here ERF=80C00040, also 82000180 and 80c00000

 

Bits/digits

register

info

my value’s

 

31..24

 

machine check code

80

82

23

MSER

CDAL parity error

“C” = 1

“0” = 0

22

MSER

Mchn chck CDAL parity error

1

0

21

MSER

Machine check cache parity

0

0

20

MSER

cache data parity error

0

0

19

MSER

cache tag parity error

“0” = 0

“0” = 0

18

unused

 

0

0

17

MEMCSR16

Uncorrectable ECC error

0

0

16

MEMCSR16

Two or more uncorrectable errors

0

0

15

MEMCSR16

Correctable single bit error

“0” = 0

“0” = 0

14

MEMCSR16

Page address bits 25:22 of ...

0

0

13

MEMCSR16

... location that caused error ...

0

0

12

MEMCSR16

... These four bits point to the ...

0

0

11

MEMCSR16

... failing 4-Mbyte bank of memory

“0” = 0

“1” = 0

10

MEMCSR16

DMA read/write error

0

0

9

MEMCSR16

CDAL parity error on write

0

0

8

CBTR

CDAL bus time out

0

1

7

CBTR

CPU read/write bus timeout

“4” = 0

“8” = 1

6

DSER

Q22-bus NXM

1

0

5

unused

 

0

0

4

DSER

Q22-bus parity error

0

0

3

DSER <4

Read main memory error

“0” = 0

“0”= 0

2

DSER

Lost error

0

0

1

DSER

No grant timeout

0

0

0

IPCRn <15

DMA Q22-bus memory error

0

0

This seems to indicate a CDAL parity error on the KA650 CPU. “CDAL” are the “CVAX Data and Address Lines”, it is the multiplexed CPU front end bus. Interface to QBUS 22 is then through the “QBIC” chip, interface to memory boards is through the “MEMCTL” chip. The second level cache is build with discrete memory and 74Fxxx chips. Interface between CDAL and second level cache is through an port of five bidirectional 74F544 latches. Also connected to CDAL are some small on-board peripherals, as serial ports, LED regsiters etc.

Trying to repair

I had no clue what to do. I changed a few cache driver chips, but the bug was not influenced. I Even made a comparator adapter for running a test memory chip above with the built-in chips. My idea was: if a cache memory chip is defective, I will see differing signals between the output of the original and the reference chip. Lets call it the "Run-Reference-Chip-Parallel-Adapter ("RRCPA")!

But in practice the signals where quite to complex to get compared, and I did not trusted my RRCPA at thes high operating frequencies of > 10MHz.

 

DTJ 7

Later I read int DEC Technical Journal 7, that the uVAX2 CPU design ist very compact for cost-reasons. They explicitly state that the source of an local bus parity error can not be traced to some component.

As usual, their repair strategy is "change part and throw it away".

THE 2nd KA650 is good

Just as I needed it, I found a KA650 on eBay.com. It was just $50 + $20 for shipment. It arrived after four weeks, and it was completly working. So once more, a big problem could be solved by a small deal.

Good for the VAX, but bad for my pride!