Harald Seeley on V-MAX! ======================= >> >: The disappearing sync bytes > >Harald, can you briefly explain why the 'sync bytes' technique (and any >other stuff that you used) could not a copied with any serial nibbler >or Burst Nibbler/21 sec backup/other parallel cable? Why was it >necessary to have the 8k ram expansion to successfully nibble V-MAX? > To read a sync byte (and confirm it's existence), you have to test for a bit which is set by the shift register chip which clocks in each bit, and sets that bit (and keeps it set) once 12 or more sequential on bits are detected. This flag only stay up until the first zero bit is clocked in (4 CPU clock cycles equal one data bit clock, as I recall). Since my sync bytes were exactly 12 bits long, that was too short to reliably see using the standard Bit xxxx, bvc back: loop, which had a 5 CPU cycle jitter. Commodore used extremely long sync bytes, 40 bits as I recall, which wasted a good deal of space. But they were necessary for the disk to be writable as well as readable, since the first few bits could be clobbered by turning on the bias current which preceded writing new data to the disk. Every time the current was turned on or off (at the end of the write sequence), some bits would be clobbered and rendered indeterminate. V-MAX! tracks were read-only, so no extra bits were needed. We didn't find the start of data by looking for the sync bytes, they were ignored by our software (unlike commodore's DOS). They were there purely for hardware reasons, to make sure that the following data bytes were framed correctly. We had a unique byte (or two?) for the start of each sector, which had a bit pattern that real V-MAX! GCR data would never produce, no matter how it was framed. We could have gotten away with putting only one sync pattern at the beginning of each track, but we didn't want to have to wait for the start of the track every time we stepped the heads, this would have added an unacceptable worst case delay of an entire disk rotation, before the data was properly framed and readable. Another reason for the long standard Commodore sync bytes, was that the Commodore GCR-to-binary conversion routine was so slow, that the CPU needed all the time that went by to convert the data, so it could determine if this was indeed the sector it was looking for. Ours was real-time (we converted to binary while waiting for the next byte to clock in). The only loader we ever saw, that could handle data no matter how it was framed (i.e. had no sync bytes) was the last VORPAL loader, which gave us migraines just looking at how it managed to accomplish that task. It used self modifying real-time GCR conversion code to do half the job, and the serial transfer routine to do the remainder of the work, of putting things back in proper order. Twisted is the only term that does it justice. I never completely analyzed it, it was just too much trouble to figure all the little details out. With V-MAX!, we had no padding between sectors, and the sector header info was combined with the actual data. The end of the data sector was also a unique bit pattern (actually the start of the next sync). Therefore, the data had to be written to disk in one continuous stream. If you didn't have a parallel disk drive, or expansion RAM, you couldn't store an entire track of data in the standard buffers, especially in expanded GCR form. Therefore, if you tried to copy the track piecemeal, you would end up with breaks in the bitstream before each sync byte. On earlier versions of V-Max!, we would check for those breaks, by looking at the last byte before each of the sync marks. On the last version, that last byte was the crucial end of sector indicator, it wouldn't load properly without it. Our last version had variable sized sectors, so you needed that marker to tell when you were done. Also, if you didn't slow the drive down, as we did, you would end up overwriting the beginning of the track (which we always filled up, what with our variable sector sizes). >You, ah, won't happen to have this disk just lying around, would you ? >;-) For that matter what happened to all your source, tools, work disks >etc from when you were creating V-MAX? > No more C64 stuff, gave it all away when I left Taito. > I remember reading the F15 manual and was absolutely sure V-MAX >was named because of F15... ;-) Yeah, we later met with "Wild Bill", and even tried to get him to license our stuff. >Although I never had a Warpspeed cart, there's something I always >wanted to know... was it more compatible that Epyx's Fastload ? Did it >have any sort of enhancements or detection in order to optimise (or at >least stay out of the way) when loading V-MAX protected software? > Absolutely! 20/20 hindsight was of course, a very useful thing. > >On the back of the TAITO (Rastan, Arkanoid 2) manuals that I have >there's a short copyright statement something like this: "V-MAX (c) >Alien Technology Group" > >Could you talk a bit about the company? Who were the original founders >(Marty/Joe/You?) You also mentioned in an earlier post that you worked >for Taito... what was the relationship bet. Taito & Alien Tech Group? > Marty and I started ATG. Joe was a friend who occassionally did work for hire for us. ATG started out as developers for Taito, Cinemaware, and others. Then Taito hired me, and I Marty. >And I recall correctly I believe the fast loader/custom disk format in >"Graphics Transformer" (published by CDA (Complete Data >Automation)/written by Joe Peter, Scott M Blum (of Di-Sector fame), >Jeff Spangenberg & Daniel Wolfe) was even faster at loading than V- >MAX... tracks used to click by faster than a V-MAX load... but I don't >know if that meant that the load was faster or that tracks held less ;-) Wouldn't doubt it could move the head faster, but unless they improved the serial transfer routine, I doubt they could have actually loaded data any faster (Can't get any quicker than real-time GCR conversion, and I couldn't get real-time conversion of a GCR that was denser than 75% data, 25% clock). However, it would have been easy to use the 50% data, 50% clock of alternate byte real-time conversion (via xor), at the cost of disk space. By comparison, Commodore was 80% data, 20% clock. Which couldn't be decoded in real-time (though I don't know what VORPAL's final throughput was, or what density of data/clock they were using). So if VORPAL was using Commodore's GCR scheme, and Joe & Scott finished reverse engineering it, they might well have beaten V-MAX!, now that I think of it. If anyone could have done it, it would have been JP. Part II ======= >How is the data encoded on the disk? Could you also comment on how >it compares to the standard DOS scheme which we are all familiar with: >header, sync bits, track/sector layout, track/sector size, GCR encoding, >and so on. > OK, I touched on that subject in my posting earlier today. Here's the rest, or at least as much as I can remember ;-) Every block started with a unique byte after the 12 bit sync mark. This byte, obviously, had the highest bit cleared. I could be wrong, but I'm guessing $7E? The remaining GCR table was designed to substitute 8 bits out for every 6 in. I don't recall whether we worked from nibbles (3 expansing to 4) or bytes (6 to 8). To make the data read reliably, we were forced to follow Commodore's own rules regarding how many off bits in a row we could support. I'm guessing it was a maximum of 2. You have to understand, that due to wow and flutter problems, that the 1541 resyncronized itself to each of the "1"'s (transitions) on the drive surface. You pretty much had to be a digital hardware engineer to figure this out from the schematics on your own, I never saw this explained in any books. Thanks to an early interest in Ham radio, I was able to work some of these things out. What would happen, if you left too many "zeros" (no transition) in sequence on the surface of the floppy was, that the shift register would overflow, and apparently clock in a "phantom" on bit. Early in the V-Max series, we didn't pay close enough attention to this rule, and this brought on reliability problems. But, by following Commodore's own internal rules, we were guaranteed to be as reliable on a drive as non-protected disks. The second rule was, that no possible combination of GCR bits could accidentally generate a string of 12 "on" bits in a row, or we would end up with an unintended sync byte, which would destroy the framing we had established earlier. This set of rules gave us just enough possible combinations that it proved possible to encode every 6 bit entry with a proper 8 bit value ( or maybe 3 with 4, as I mentioned earlier, a disassembly of the loader should make it clear which we used). With this scheme, it was also (barely) possible to convert each GCR value read back into it's proper binary form in real time. This involved bit shifting and table lookups, then maybe more shifting and xoring (combining) of the results. A single byte took, I think, 32 CPU clocks to read in. Therefore, on average, you had to be done with all your shifting and moving of data 32 cycles after your first read in the value. You had some tolerance for error, however. With Commodore, all they did was store the GCR value in a buffer to work on later, then went back in a loop, waiting for the bit to be set that indicated the next byte was ready. What I did, was calculate how many cycles each operation took, so that I was ready to read each subsequent byte about the middle of the time it was present in the IO port. I.E, if it took 32 cycles to read in a byte, the second byte would become available 32 cycles after first byte had been read, and that second byte would remain available to be read for another 32 clocks, before it would be replaced by the next, and so on. Now, to that, I added a tolerance for drives that were running faster or slower than Commodore's spec. And to that, I added a further allowance for wow and flutter. This, then, gave me my timing windows, for when I could safely and reliably read sequential bytes from the disk, without wasting cycles (which I didn't have) sitting in a loop waiting for a "byte ready" signal. If I was in danger of running too fast, I might have added a conditional branch to a follow-on instruction, after the "bit" test, to conditionally slow things down a tad. But pretty much all of the cycles were put to use, sometimes I would have to break off what I was doing and store a value temporarily, so that I could read the next byte. It was a lot like juggling, and I re-wrote the routines multiple times, moving operations earlier or later, till I was satisfied. As a result, we could read in 4 GCR bytes, convert them to binary, and store them in a buffer, then go back to the top of the loop and wait for the byte ready signal, then do it all over again. This took several weeks to perfect, though we got the original (flakey) version running up in a couple of days. We would break out of the loop when the end of sector byte was read, as I recall. The start of our load buffer overlapped our read routine by one byte. That meant, the first binary value of each sector had to translate to a $60 (rts). This was a "future expansion" hook that would allow us to embed additional copy protection (signature checks) any place/time on the disk. We never needed to use it, but it was there for expressly that purpose. The remaining header information would contain the sector number, possibly the track number (not sure), and the number of the file to which the sector belonged. Other bytes (or maybe we just used a single bit) would indicate the last sector of a file, which always got sent last. The remaining sectors would get sent in whatever order they were decoded (each sector had embedded into it the load address where it belonged in C64 memory). Our "directory/VMAX DOS" track had embedded information, which told us how many sectors a given file occupied on each track. The drive would keep track of which sectors had already been transfered, and not send them twice. When it had sent all of the sectors belonging to that file which were to be found on the current track, it would step to the next track, and start again. When it reached the last track, it would skip over the final sector as long as there were other sectors to send. The final sector was sent last, as it had embedded in it, the execution address where control was to be given, (or a value indicating a normal return via the stack). With this scheme, files did not have to sit in contiguous memory, they could be scattered in several places (which would have taken multiple loads to do otherwise). Plus any file could be made to autoexecute, a big convenience since that meant games could safely load multiple modules over themselves without worrying about maintaining a "core" control loading kernal. It also meant, that data could be sent over the serial bus as fast as the C64 could receive it, that is, we didn't have to "skew" the data on the disk to optimally intersperse reading with sending. >The answers to the above might answer the next question: what aspects >of the encoding make it difficult to copy? What aspects make the encoding >reliable on imperfect drives? In practice, how susceptible was the >system to errors (I'm sure we've all had various disks that have either >worked unreliably or rapidly self-destructed, due to the copy protection)? > The copy protection question I answered in my earlier post today. The reliability of the GCR was such that it was no different than unprotected disks. We always did soak tests >24 hours of continuous, error free file loads, with our oldest and most worn out drives, before we released our later versions, to test our assumptions. Because of the format (below), we didn't find it necessary to add additional "signatures" that might make the loader less reliable on some drives. Keep in mind, that was for our last (and longest lived) version, we made our share of mistakes along the way. >Related to that, could you describe the fastloading system employed >in V-max, both at the 1541 side and the 64 side? Again, some comments >relating it to standard fastloading schemes (multiple bit transfers, >custom vs. standard DOS encoding scheme, handshaking vs. non-handshaking, >interruptable, etc.) would be welcome. > Yes, that was our DOS' other big attraction, however I co-wrote that with Joe, (whereas the stuff above I did myself), so it's not as fresh in my mind. Joe would always come up with new, faster ways to do things, then I would tweak them again to be both faster and more reliable. He would then trump me by coming up with yet a better and faster way, then I would feel compelled to find 3 or 4 wasted cycles in his code just to one-up him. It was a very productive collaboration/competition, which went on for as long as we worked together. I remember one version of his serial transfer routine, that would mess up one byte in maybe 1000K. It took me a (long) while, but I finally found the timing error, and fixed it. I never quit without shaving at least 2 clock cycles from anything he gave me . The only top level stuff I remember, was that we found a way to get rid of the 5 cycle jitter that generally limited the speed at which the 2-bit transfers could operate, between 2 cpu's running at different clock rates. We got it down to 1 and a fraction cycles by using additional handshaking signals at the beginning of the loop, which were then used to condiditonally adjust the timing (using forward branches) by a smaller and smaller number of cycles. The benefit of this tighter synchronization was, we didn't need to hold the signals for as long, plus we could transfer more than one byte, before returning to the top of the loop to resynchronize. I believe that we only allowed interrupts in-between whole sectors of data, when and if we allowed them at all. Of course, all routines were self-modifying so that they would run on PAL and NTSC machines (as if there weren't enought timing headaches already!) >Finally, you've mentioned a few aspects of the protection system, but >could you give a detailed description of the method, at least a "global" >picture of the scheme. For example, the route a byte (or bit) of data >takes from the disk surface into the C64's memory, how that route can >vary, how the data might be used once in memory (checksums etc.), >and comments on aspects that makes it quite tough to beat. Once again, >comparisons with other protection schemes (simple encryption, timer-based >loaders, custom disk formats) would be helpful. > In summary, nearly invisibly short sync bytes, coupled with overlong, continuous tracks, which wouldn't fit on a disk spinning at normal speed and clock rates. And that's about it. The only other info I remember, is that the start of each track (before the first sync) had a unique and repetitive pattern on it, for the duplicator to find and use. This would be located immediately after the index hole... And that's the way it was... Notes: This text document first uploaded around Dec 30, 2002 I did not conduct the interview with Harald Seeley myself. The information presented here comes from a newsgroup thread on comp.sys.cbm from circa 1999: https://groups.google.com/d/msg/comp.sys.cbm/gATfQPhxMs4/EbiJX9aukxUJ The following link from Google's News Archive contains some more information on how the interview came to be: https://groups.google.com/d/msg/comp.sys.cbm/QwTMZ_4AAhg/1uj1w-CPJ7oJ