Tuesday, November 7, 2023

A broken QL: How to find faulty RAM without a diagnostics ROM

I got a broken QL, the fault was described as white screen on boot, this seems a symptom of a failed RAM test, it should be an easy repair.


Nice looking Sinclair QL, UK version.

Once it arrived, I powered it up, and I saw the described white screen.


QL starting sequence. It freezes after clearing the screen. The initial random pattern should be in color, but this is PAL computer, and I am using an NTSC monitor, so, no color.


The QL's 128k of RAM is organized as 2 banks of 64k, each implemented as 8 64k x 1-bit chips (the 4164 that was also used in the Apple IIc, Atari 800xl, Commodore 64, IBM PCjr, and many other computers of the time). I suspect there is a problem in the top 64k of RAM. A problem in the lower 64k would be visible as a noise pattern on the screen.

The QL does a RAM check at startup that consists of writing random values to RAM, then writing 1's, then verifying the contents. The RAM test froze during the verification step, so we can assume that the RAM output will consist of mostly 1's.

To try to figure out which chip is to blame, I started by measuring the voltage at the output of every DRAM chip. A voltage much lower than 5V may indicate the faulty chip (i.e. the one also outputting 0s). I measured close to 5V everywhere so it is not a chip stuck at outputting 0s.

I let the computer run for a while, then touched each chip looking for one particularly hot, but they all felt similar.


Sinclair QL issue 6. View of the DRAM.

In a similar circumstance, Noel from Noel's Retro Lab used the Minerva ROM, which contains a diagnostics routine, to figure out the bad chip. But my Issue 6 board requires a non trivial adapter to be able to use an EEPROM (check Noel's video for instructions on how to build it).

Too bad the faulty chip is not in the lower bank, then the screen would display some error pattern to help me guess the chip. If only there was a way to swap the banks...

I looked at the schematic and figured something... the two banks seem to be connected to the same lines, except for the CAS0 and CAS1 signals from the ULA 8301. This ULA is also responsible for generating the video by reading from video RAM, maybe it can be fooled to use the upper bank...

I switched the RAM banks by swapping pins 9 and 10 of the ULA 8301 these are the CAS0 and CAS1 lines used to select each bank. I build a simple adapter with two stacked 40 pin sockets.

Crude adapter for swapping pins 9 and 10 of the ULA. Building instructions: 1) Cut pins 9 and 10 of top socket; 2) Solder 2 wires from the bottom to the top socket pins, swapping them; 3) Put some tape over the bottom pins 9 and 10 to prevent contact with the top cut pins. 4) Attach the 2 sockets.  Before using, verify all connections with a multimeter and look for shorts.



ULA 8301 on an adapter to swap pins 9 and 10 (CAS line for each 64k RAM Bank).

With the adapter installed, I can now see some scattered noise on the screen. The noise is sporadic explaining why I was unable to measure a significant voltage difference on the chips' output pin.

Once the top and bottom RAM banks are swapped, the the bad RAM is now visible as noise on the screen.

But now, how can I tell which ram chip corresponds to the noise ? The noise seems to be scattered along lines in columns that are probably separated by a multiple of 8 pixels, exactly what we would expect from a single broken chip. I could try to measure the number of pixels of some columns from the border... but it is hard to get an accurate measurement.

I made a simple probe to pull DRAM output pins to ground using a pick and alligator clips.

So, I decided to use a pull-down technique, that consists of briefly bringing each data line to ground, at the output of the RAM chips (PIN-14).

This should show a vertical line corresponding to the bit that is being turned to 0. We'll find the faulty chip once the line overlaps with the noise pattern.

I did this, and found that IC14 is the faulty chip. 


The vertical stripes caused by pulling the output of IC14 to ground overlap with the noise pattern. This means this is the chip responsible for the noise. 

I desoldered IC14, and replaced it with a socketed 4164. This solved the problem, and the computer boots to the starting screen.

View of replaced IC14.

QL Boot Screen.

The pull-down technique is very general and can be used in other computers as long as the fault affects the screen RAM. Some errors may require a pull-up instead. Next time I try this trick I will connect the probe through a resistor to be safer. What value of resistor to use ? Start with a high value, say 10k, if that is not enough to change the signal, try gradually reducing the value i.e. 1k, 100 ohm, etc. 

Resources:

 

No comments:

Post a Comment