Harald Seeley on V-MAX!
=======================


>> >: The disappearing sync bytes
>
>Harald, can you briefly explain why the 'sync bytes' technique (and any
>other stuff that you used) could not a copied with any serial nibbler
>or Burst Nibbler/21 sec backup/other parallel cable? Why was it
>necessary to have the 8k ram expansion to successfully nibble V-MAX?
>

To read a sync byte (and confirm it's existence), you have to test for a bit
which is set by the shift register chip which clocks in each bit, and sets
that bit (and keeps it set) once 12 or more sequential on bits are detected.
This flag only stay up until the first zero bit is clocked in (4 CPU clock
cycles equal one data bit clock, as I recall).    Since my sync bytes were
exactly 12 bits long, that was too short to reliably see using the standard
Bit xxxx, bvc back:   loop, which had a 5 CPU cycle jitter.  Commodore used
extremely long sync bytes, 40 bits as I recall, which wasted a good deal of
space.

But they were necessary for the disk to be writable as well as readable,
since the first few bits could be clobbered by turning on the bias current
which preceded writing new data to the disk.  Every time the current was
turned on or off (at the end of the write sequence), some bits would be
clobbered and rendered indeterminate.   V-MAX! tracks were read-only, so no
extra bits were needed.  We didn't find the start of data by looking for the
sync bytes, they were ignored by our software (unlike commodore's DOS).
They were there purely for hardware reasons, to make sure that the following
data bytes were framed correctly.  We had a unique byte (or two?) for the
start of each sector, which had a bit pattern that real V-MAX! GCR data
would never produce, no matter how it was framed.   We could have gotten
away with putting only one sync pattern at the beginning of each track, but
we didn't want to have to wait for the start of the track every time we
stepped the heads, this would have added an unacceptable worst case delay of
an entire disk rotation, before the data was properly framed and readable.

Another reason for the long standard Commodore sync bytes, was that the
Commodore GCR-to-binary conversion routine was so slow, that the CPU needed
all the time that went by to convert the data, so it could determine if this
was indeed the sector it was looking for.  Ours was real-time (we converted
to binary while waiting for the next byte to clock in).

The only loader we ever saw, that could handle data no matter how it was
framed (i.e. had no sync bytes) was the last VORPAL loader, which gave us
migraines just looking at how it managed to accomplish that task.  It used
self modifying real-time GCR conversion code to do half the job, and the
serial transfer routine to do the remainder of the work, of putting things
back in proper order.  Twisted is the only term that does it justice.  I
never completely analyzed it, it was just too much trouble to figure all the
little details out.

With V-MAX!, we had no padding between sectors, and the sector header info
was combined with the actual data. The end of the data sector was also a
unique bit pattern (actually the start of the next sync).  Therefore, the
data had to be written to disk in one continuous stream.  If you didn't have
a parallel disk drive, or expansion RAM, you couldn't store an entire track
of data in the standard buffers, especially in expanded GCR form.
Therefore, if you tried to copy the track piecemeal, you would end up with
breaks in the bitstream before each sync byte.

On earlier versions of V-Max!, we would check for those breaks, by looking
at the last byte before each of the sync marks. On the last version, that
last byte was the crucial end of sector indicator, it wouldn't load properly
without it.  Our last version had variable sized sectors, so you needed that
marker to tell when you were done.

Also, if you didn't slow the drive down, as we did, you would end up
overwriting the beginning of the track (which we always filled up, what with
our variable sector sizes).


>You, ah, won't happen to have this disk just lying around, would you ?
>;-) For that matter what happened to all your source, tools, work disks
>etc from when you were creating V-MAX?
>

No more C64 stuff, gave it all away when I left Taito.

><grin> I remember reading the F15 manual and was absolutely sure V-MAX
>was named because of F15... ;-)

Yeah, we later met with "Wild Bill", and even tried to get him to license
our stuff.

>Although I never had a Warpspeed cart, there's something I always
>wanted to know... was it more compatible that Epyx's Fastload ? Did it
>have any sort of enhancements or detection in order to optimise (or at
>least stay out of the way) when loading V-MAX protected software?
>

Absolutely!  20/20 hindsight was of course, a very useful thing.

>
>On the back of the TAITO (Rastan, Arkanoid 2) manuals that I have
>there's a short copyright statement something like this: "V-MAX (c)
>Alien Technology Group"
>
>Could you talk a bit about the company? Who were the original founders
>(Marty/Joe/You?) You also mentioned in an earlier post that you worked
>for Taito... what was the relationship bet. Taito & Alien Tech Group?
>

Marty and I started ATG.  Joe was a friend who occassionally did work for
hire for us.
ATG started out as developers for Taito, Cinemaware, and others. Then Taito
hired me, and I Marty.

>And I recall correctly I believe the fast loader/custom disk format in
>"Graphics Transformer" (published by CDA (Complete Data
>Automation)/written by Joe Peter, Scott M Blum (of Di-Sector fame),
>Jeff Spangenberg & Daniel Wolfe) was even faster at loading than V-
>MAX... tracks used to click by faster than a V-MAX load... but I don't
>know if that meant that the load was faster or that tracks held less ;-)


Wouldn't doubt it could move the head faster, but unless they improved the
serial transfer routine, I doubt they could have actually loaded data any
faster (Can't get any quicker than real-time GCR conversion, and I couldn't
get real-time conversion of a GCR that was denser than 75% data, 25% clock).
However, it would have been easy to use the 50% data, 50% clock of alternate
byte real-time conversion (via xor), at the cost of disk space.

By comparison, Commodore was 80% data, 20% clock.  Which couldn't be decoded
in real-time (though I don't know what VORPAL's final throughput was, or
what density of data/clock they were using).  So if VORPAL was using
Commodore's GCR scheme, and Joe & Scott finished reverse engineering it,
they might well have beaten V-MAX!, now that I think of it.  If anyone could
have done it, it would have been JP.


Part II
=======

>How is the data encoded on the disk?  Could you also comment on how
>it compares to the standard DOS scheme which we are all familiar with:
>header, sync bits, track/sector layout, track/sector size, GCR encoding,
>and so on.
>

OK, I touched on that subject in my posting earlier today.  Here's the rest,
or at least as much as I can remember ;-)

Every block started with a unique byte after the 12 bit sync mark.  This
byte, obviously, had the highest bit cleared.  I could be wrong, but I'm
guessing $7E?

The remaining GCR table was designed to substitute 8 bits out for every 6
in. I don't recall whether we worked from nibbles (3 expansing to 4) or
bytes (6 to 8). To make the data read reliably, we were forced to follow
Commodore's own rules regarding how many off bits in a row we could support.
I'm guessing it was a maximum of 2.  You have to understand, that due to wow
and flutter problems, that the 1541 resyncronized itself to each of the
"1"'s (transitions) on the drive surface. You pretty much had to be a
digital hardware engineer to figure this out from the schematics on your
own, I never saw this explained in any books.  Thanks to an early interest
in Ham radio, I was able to work some of these things out.  What would
happen, if you left too many "zeros" (no transition) in sequence on the
surface of the floppy was, that the shift register would overflow, and
apparently clock in a "phantom" on bit.

Early in the V-Max series, we didn't pay close enough attention to this
rule, and this brought on reliability problems.   But, by following
Commodore's own internal rules, we were guaranteed to be as reliable on a
drive as non-protected disks.

The second rule was, that no possible combination of GCR bits could
accidentally generate a string of 12 "on" bits in a row, or we would end up
with an unintended sync byte, which would destroy the framing we had
established earlier.   This set of rules gave us just enough possible
combinations that it proved possible to encode every 6 bit entry with a
proper 8 bit value ( or maybe 3 with 4, as I mentioned earlier, a
disassembly of the loader should make it clear which we used).

With this scheme, it was also (barely) possible to convert each GCR value
read back into it's proper binary form in real time.  This involved bit
shifting and table lookups, then maybe more shifting and xoring (combining)
of the results.  A single byte took, I think, 32 CPU clocks to read in.
Therefore, on average, you had to be done with all your shifting and moving
of data 32 cycles after your first read in the value.  You had some
tolerance for error, however.

With Commodore, all they did was store the GCR value in a buffer to work on
later, then went back in a loop, waiting for the bit to be set that
indicated the next byte was ready.

What I did, was calculate how many cycles each operation took, so that I was
ready to read each subsequent byte about the middle of the time it was
present in the IO port.  I.E, if it took 32 cycles to read in a byte, the
second byte would become available 32 cycles after first byte had been read,
and that second byte would remain available to be read for another 32
clocks, before it would be replaced by the next, and so on.

Now, to that, I added a tolerance for drives that were running faster or
slower than Commodore's spec.  And to that, I added a further allowance for
wow and flutter.  This, then, gave me my timing windows, for when I could
safely and reliably read sequential bytes from the disk, without wasting
cycles (which I didn't have) sitting in a loop waiting for a "byte ready"
signal.  If I was in danger of running too fast, I might have added a
conditional branch to a follow-on instruction, after the "bit" test, to
conditionally slow things down a tad.  But pretty much all of the cycles
were put to use, sometimes I would have to break off what I was doing and
store a value temporarily, so that I could read the next byte.  It was a lot
like juggling, and I re-wrote the routines multiple times, moving operations
earlier or later, till I was satisfied. As a result, we could read in 4 GCR
bytes, convert them to binary, and store them in a buffer, then go back to
the top of the loop and wait for the byte ready signal, then do it all over
again.   This took several weeks to perfect, though we got the original
(flakey) version running up in a couple of days.

We would break out of the loop when the end of sector byte was read, as I
recall.   The start of our load buffer overlapped our read routine by one
byte.  That meant, the first binary value of each sector had to translate to
a $60 (rts).  This was a "future expansion" hook that would allow us to
embed additional copy protection (signature checks) any place/time on the
disk.  We never needed to use it, but it was there for expressly that
purpose.

The remaining header information would contain the sector number, possibly
the track number (not sure), and the number of the file to which the sector
belonged.  Other bytes (or maybe we just used a single bit) would indicate
the last sector of a file, which always got sent last.  The remaining
sectors would get sent in whatever order they were decoded (each sector had
embedded into it the load address where it belonged in C64 memory).  Our
"directory/VMAX DOS" track had embedded information, which told us how many
sectors a given file occupied on each track.  The drive would keep track of
which sectors had already been transfered, and not send them twice. When it
had sent all of the sectors belonging to that file which were to be found on
the current track, it would step to the next track, and start again.  When
it reached the last track, it would skip over the final sector as long as
there were other sectors to send.  The final sector was sent last, as it had
embedded in it, the execution address where control was to be given, (or a
value indicating a normal return via the stack).

With this scheme, files did not have to sit in contiguous memory, they could
be scattered in several places (which would have taken multiple loads to do
otherwise).  Plus any file could be made to autoexecute, a big convenience
since that meant games could safely load multiple modules over themselves
without worrying about maintaining a "core" control loading kernal.  It also
meant, that data could be sent over the serial bus as fast as the C64 could
receive it, that is, we didn't have to "skew" the data on the disk to
optimally intersperse reading with sending.

>The answers to the above might answer the next question: what aspects
>of the encoding make it difficult to copy?  What aspects make the encoding
>reliable on imperfect drives?  In practice, how susceptible was the
>system to errors (I'm sure we've all had various disks that have either
>worked unreliably or rapidly self-destructed, due to the copy protection)?
>

The copy protection question I answered in my earlier post today.  The
reliability of the GCR was such that it was no different than unprotected
disks.  We always did soak tests >24 hours of continuous, error free file
loads, with our oldest and most worn out drives, before we released our
later versions, to test our assumptions.  Because of the format (below), we
didn't find it necessary to add additional "signatures" that might make the
loader less reliable on some drives.

Keep in mind, that was for our last (and longest lived) version, we made our
share of mistakes along the way.

>Related to that, could you describe the fastloading system employed
>in V-max, both at the 1541 side and the 64 side?  Again, some comments
>relating it to standard fastloading schemes (multiple bit transfers,
>custom vs. standard DOS encoding scheme, handshaking vs. non-handshaking,
>interruptable, etc.) would be welcome.
>


Yes, that was our DOS' other big attraction, however I co-wrote that with
Joe, (whereas the stuff above I did myself), so it's not as fresh in my
mind.  Joe would always come up with new, faster ways to do things, then I
would tweak them again to be both faster and more reliable.  He would then
trump me by coming up with yet a better and faster way, then I would feel
compelled to find 3 or 4 wasted cycles in his code just to one-up him.  It
was a very productive collaboration/competition, which went on for as long
as we worked together.  I remember one version of his serial transfer
routine, that would mess up one byte in maybe 1000K.  It took me a (long)
while, but I finally found the timing error, and fixed it.  I never quit
without shaving at least 2 clock cycles from anything he gave me <grin>.

The only top level stuff I remember, was that we found a way to get rid of
the 5 cycle jitter that generally limited the speed at which the 2-bit
transfers could operate, between 2 cpu's running at different clock rates.
We got it down to 1 and a fraction cycles by using additional handshaking
signals at the beginning of the loop, which were then used to condiditonally
adjust the timing (using forward branches) by a smaller and smaller number
of cycles.  The benefit of this tighter synchronization was, we didn't need
to hold the signals for as long, plus we could transfer more than one byte,
before returning to the top of the loop to resynchronize.  I believe that we
only allowed interrupts in-between whole sectors of data, when and if we
allowed them at all.  Of course, all routines were self-modifying so that
they would run on PAL and NTSC machines (as if there weren't enought timing
headaches already!)

>Finally, you've mentioned a few aspects of the protection system, but
>could you give a detailed description of the method, at least a "global"
>picture of the scheme.  For example, the route a byte (or bit) of data
>takes from the disk surface into the C64's memory, how that route can
>vary, how the data might be used once in memory (checksums etc.),
>and comments on aspects that makes it quite tough to beat.  Once again,
>comparisons with other protection schemes (simple encryption, timer-based
>loaders, custom disk formats) would be helpful.
>

In summary, nearly invisibly short sync bytes, coupled with overlong,
continuous tracks, which wouldn't fit on a disk spinning at normal speed and
clock rates.


And that's about it.  The only other info I remember, is that the start of
each track (before the first sync) had a unique and repetitive pattern on
it, for the duplicator to find and use. This would be located immediately
after the index hole...

And that's the way it was...


Notes:
This text document first uploaded around Dec 30, 2002

I did not conduct the interview with Harald Seeley myself. The information
presented here comes from a newsgroup thread on comp.sys.cbm from circa 1999:

https://groups.google.com/d/msg/comp.sys.cbm/gATfQPhxMs4/EbiJX9aukxUJ

The following link from Google's News Archive contains some more information
on how the interview came to be:

https://groups.google.com/d/msg/comp.sys.cbm/QwTMZ_4AAhg/1uj1w-CPJ7oJ