Some not-yet-successful stories

MicroVAX CPU parity error

Parent Category: Stories Category: Some not-yet-successful stories
Written by Administrator

I have a MicroVax in a BA23 case. It has a RD54 disc with 160 MB, and a TK50 tape drive. Also a raster video card is built in. This is the card cage:

 

CPU M7620 BA KA650 workstation license EK-KA650-UG
M7621 MS650-AA 8MB Mem
M7621 MS650-AA 8MB Mem
M7516 DELQA
M3106 DZQ11-M
M7169  VCB02 4-plane video controller module
M7168  VCB02 4-plane colour bitmap module
M7555 RQDX3 MFM+floppy
M7546 TMSCP TK50
M7513 RQDX extender

The case label says “MicroVAX II/GPX”, but since the CPU is KA650, it is a uVAX 3200/3400/3600. (reference). DTJ 7 states that KA650 is for MicroVAX 3500/3600.

Other stickers say: Model: 6300V-B3, SN: AY 81901335.

Bad documentation, but enough other KA6xxx CPU’s

On the KA650 CPU is a CVAX 78034 VAX CPU chip ... the one Bob Supnik developed in his time at DEC.

Despite I searched a lot, I did not found any technical description for the KA650 CPU. vt100.net lists “EK-180AB-MG KA650 CPU SYS MAINTENANCE GUIDE” and “EK-KA650-UG KA650 GUIDE”, but has none of them. So I had to use lots of similar CPU documentation, as for KA640, KA655, KA660, KA680. This puts my further conclusion on an instable ground.

Luckily I found all schematics for the KA650 in a document called “MP02538 650QS Pedestial BA213 Field Maintenance Print Set”.

And the KA650 CPU and its cache is described in “Digital Technical Journal Number 7” (DTJ 7)

Self test on boot:

Did I told you? My MicroVAX has an error:

At boot, it displays:

KA650-B  V1.2/0123                               

Performing normal system tests.

23..22..21..20..19..

?05.50 2 0C FE 04 0000
10000000 10012000 00002000 00000000 00000000
00000000 00000000 00000000 1000B4F8 00000000
1000B500 55555555 55555555 AAAAAAAA AAAAAAAA
00000960 10000000 AAAAAAAA 00002000 80C00040

18..17..16..15..14..13..12..11..10..09..08..
07..06..05..04..03..


Normal operation not possible.
>>>

I decoded this error as follows, according to :

?05.50 2 0C FE 04 0000                                                   (1)
10000000 10012000 00002000 00000000 00000000                             (2)
00000000 00000000 00000000 1000B4F8 00000000                             (3)
1000B500 55555555 55555555 AAAAAAAA AAAAAAAA                             (4)
00000960 10000000 AAAAAAAA 00002000 80C00040                             (5)

The first line “?05.50 2 0C FE 04 0000” means this:

  • "05.50" is the number of the test that bombed.
    A list of test is printed with “>>>test 9e”. This lists “05.50” as
    “05  50  6760  Cach2_integrty  start_addr end_addr addr_step *******”
    So the cache is the problem.
  • "2" is the severity factor.
    "2" causes the register dumps to be displayed and the autoboot prohibited.
    "1" just prints this error message line, and doesn't disables the autboot functionality.
  • "0c" "error" is a number, that in conjunction with listings files, isolates  to within a few instructions where the diagnostic detected the error. This field is also called subtestlog.
  • "FE" "de_error" is the code of the error found.
    FF: normal error exit form diag,
    FE: unanicipated interrupt,
    FD: interrupt in cleanup mode,
    FC: interrupt in interrupt handler,
    FB: test script requirements not met,
    FA: no such diagnostics,
    EF: unanticipated exception in executive.
  • "04" "vector" is the SCB vector (if non-zero) through which an unexpected exception or interrupt trapped, when the de_error field indicates an unexpected exception or interrupt (FE or FF)
    “0000" "count" is the number of previous errors encountered

Line (2): P1..P5 are the first five longwords of the diagnostic state.
This is internal information that is used by repair personnel.
Line (3): P6..P10 are the last five longwords of the diangostic state.
Line (4): R0..R4 are the first five GPRs ate the moment the error was detected
Line (5): R5..R8 are additional GPRs and ERF is a diagnostic summary longword

The last 32 bit value is ERF and very important. I use KA655 documentation,“EK-306A-MG-001 KA655 CPU System Maintenance”, page 4-33. The KA655 has a “SOC” chip, which is a CVAX 78034 CPU, CFPA floating point processor, clock and 8KB second level cache combined. I hope also it’s ROM-based diagnostics are close enough to my KA650.


Here ERF=80C00040, also 82000180 and 80c00000

 

Bits/digits

register

info

my value’s

 

31..24

 

machine check code

80

82

23

MSER

CDAL parity error

“C” = 1

“0” = 0

22

MSER

Mchn chck CDAL parity error

1

0

21

MSER

Machine check cache parity

0

0

20

MSER

cache data parity error

0

0

19

MSER

cache tag parity error

“0” = 0

“0” = 0

18

unused

 

0

0

17

MEMCSR16

Uncorrectable ECC error

0

0

16

MEMCSR16

Two or more uncorrectable errors

0

0

15

MEMCSR16

Correctable single bit error

“0” = 0

“0” = 0

14

MEMCSR16

Page address bits 25:22 of ...

0

0

13

MEMCSR16

... location that caused error ...

0

0

12

MEMCSR16

... These four bits point to the ...

0

0

11

MEMCSR16

... failing 4-Mbyte bank of memory

“0” = 0

“1” = 0

10

MEMCSR16

DMA read/write error

0

0

9

MEMCSR16

CDAL parity error on write

0

0

8

CBTR

CDAL bus time out

0

1

7

CBTR

CPU read/write bus timeout

“4” = 0

“8” = 1

6

DSER

Q22-bus NXM

1

0

5

unused

 

0

0

4

DSER

Q22-bus parity error

0

0

3

DSER <4

Read main memory error

“0” = 0

“0”= 0

2

DSER

Lost error

0

0

1

DSER

No grant timeout

0

0

0

IPCRn <15

DMA Q22-bus memory error

0

0

This seems to indicate a CDAL parity error on the KA650 CPU. “CDAL” are the “CVAX Data and Address Lines”, it is the multiplexed CPU front end bus. Interface to QBUS 22 is then through the “QBIC” chip, interface to memory boards is through the “MEMCTL” chip. The second level cache is build with discrete memory and 74Fxxx chips. Interface between CDAL and second level cache is through an port of five bidirectional 74F544 latches. Also connected to CDAL are some small on-board peripherals, as serial ports, LED regsiters etc.

Trying to repair

I had no clue what to do. I changed a few cache driver chips, but the bug was not influenced. I Even made a comparator adapter for running a test memory chip above with the built-in chips. My idea was: if a cache memory chip is defective, I will see differing signals between the output of the original and the reference chip. Lets call it the "Run-Reference-Chip-Parallel-Adapter ("RRCPA")!

But in practice the signals where quite to complex to get compared, and I did not trusted my RRCPA at thes high operating frequencies of > 10MHz.

 

DTJ 7

Later I read int DEC Technical Journal 7, that the uVAX2 CPU design ist very compact for cost-reasons. They explicitly state that the source of an local bus parity error can not be traced to some component.

As usual, their repair strategy is "change part and throw it away".

THE 2nd KA650 is good

Just as I needed it, I found a KA650 on eBay.com. It was just $50 + $20 for shipment. It arrived after four weeks, and it was completly working. So once more, a big problem could be solved by a small deal.

Good for the VAX, but bad for my pride!

 

 

 

Trying to setup an ALL07 PROMmer

Parent Category: Stories Category: Some not-yet-successful stories
Written by Administrator

I’d like to program all those old PROM chips.

I’ve noticed that those chips have a high failure rate

  • on my pdp11/44-CPU an 82S101 was failing
  • for 11/44, I had to program BOOTloader PROMs, a little 512 x 4 bipolar PROM
  • in an RLV12 QBUs RL02 controller, I have a dead 82S181 1K x 8 bipolar PROM.
  • in C64, a dead 82S100 is main reason for system failure,.

 But this project began as an endless story of disappointment and failures.

ALL07s

 I bought an old ALL-07 Hilo programmer on eBay. I choose this model, because

  • it's an universal programmer from 1995, so it has rich support for all those old chips (but it cannot deal with modern chips, of course)
  • my company has also an ALL-07, so in case if trouble I could test & verify by changing parts of both programmes.

Despite it was made in 1995, there is still some support for the ALL-07.

But things went all wrong.

Reworking an ALL-07

The eBay deal was one of the worst I ever made. The programmer was announced as “100% OK”. When it arrived I noticed, that it was not a standalone device with LPT parallel port interface, as I expected. It was another model: no own power supply, and it needed an special ISA interface card to provide LPT signals and power. The dealer knew nothing of this card. He said he had taken the programmer from the hobby workshop of a guy who did not pay him some money back.

I openend the ALL-07, rewired the LPT-interface so I could connect to a normal LPT port, then I bought a little switching power supply and connected it to the ALL-07.

ALL-07 software on virtual PC

The ancient ALL-07 software is DOS only, so I had to setup a virtual machine with Microsoft Virtual PC, where I installed the old control software. I hoped that the parallel port emulation was good enough to let the software control the programmer. All this took two weeks. I switched everything on, and the software made contact to programmer, but said there was "no PAK inserted". a A PAK is an adapter for the programmer base device, so it can program a certain type of chips.

Comparing two ALL-07

I had so much home-build components in my setup now, it was difficult to debug. But I could use the ALL-07 from my company, and this device worked fine on the virtual machine. I took the PAK of my company's ALL-07, plugged it into my own ALL-07, and the software could make contact! So all my workarounds (virtual machine installation, power supply, own LPT adapter) seemed to be OK.

I opened the defective PAK and saw that somebody made heavy repair attempts: a lot of chips where solder out and remounted on sockets. So this PAC definitive had a problem!

Fighting a liar

I was quite angry on the seller, because clearly he was lying to me: He insisted to “have the programmer tested”, but how could he? He didn’t even know that the device needed a special interface card to get powered on. I got rather unfriendly to him, and he bitched back at me. I wanted my money back, he offered my to give me just 50€. I should have taken that money, but I was to upset.

As a result, I got no money at all, and he wrote me a bad eBay feedback: “because I did not payed” ... what an asshole! I was clearly defeated by him.

HILO's still alive

Meanwhile I could made contact to HI-LO Netherlands. They still have old PAKs as spare parts to offer. My 40pin unviersal PAK did cost 25€ + 15€ shipment, so I bought this and tried to forget all the trouble.

Burning?

Well, at least I tought after all that trouble I had a working programmer now.

I could program an 82S131 Boot PROM for my PDP-11/44. Quite strange, the bit patterns were also copied to the upper half of this ROM, which should have left untouched.

I could program standard 64Kbtye EPROMS (27C512). I failed to program an old 27C16, and I could not program a older EEPROM. Hmm.

To exclude failures in my virtual machine setup, I installed the programmer software also to an old slow notebook (500MHz CPU), but nothing changed.

To program 82S100, I needed an special adapter: ALP.PLS100. I bougth it from HILO Netherlands, for 150€ (they had to produce one for me).

Buying old PROMs

It arrived just before Xmas holidays 2009. I tought I had two boards in my inventory with defective 82S100, so I tried to buy some 82S100. This was not easy. Lot of dealers for obosolete or discontinued chips state they could provide them. But after I sent out a bunch of price request, feedback was poor. Most companies did not answer, some answered “sorry, not available”, one offered me 10 82S100 for 1000€ total (no joke!). Finally I could get 5 chips for total 140€ ... a lot of money. But my zodiac sign is Aries, I hate to give up in the middle.

When the chips arrived, I began to play with my programmer. First surprise: one board did not needed an 82S101 at all, it needed an 82S181!!! My documentation were some bad scans, and I had mistaken the numbers. So it seemed I did not needed so much of these pricvy 82s100 chips ... a mispurchase.

Well, I did not have 82S181, but I could try to read a good 82S181. Next surprise: the ALL-07 had it in his library, but reading delivered always a buffer of hex FF’s, no real content.

I could not get 82S181, but I bought two AMD 27S191 (which have double capicty, and one more address pin).

So after all this strangeness with my ALL-07, there was another chip it could not handle! Maybe not just the PAK, but also the main device is defective.

Trying the 3rd ALL-07

I throw everything into a dark closet and tried to forget the whole project ... and suceeded. A few months later I got an e-mail from an mikrocontroller.net forum member. He had read some old request of me for HILO addresses ... and asked, wether I could need his ALL-07.

Honestly, I was not amused. A bad joke? Start this agony again? But it was an offer I could not refuse.

And this ALL-07 was in pretty good shape: I could program 82S171 and 82S181, and read a 82S100 with it!