A SURVEV OF MICROPROCESSOR ARCHITECTURES INFORMATICA 2/86 FOR MEMORV MANAGEMENT B. Furht Department of Electrical and Computer Engineering University of Miami, Coral Gables, Florida 33124 V. Milutinovič School of Electrical Engineering UDK: 681.3.325.6.08 Purdue Universitv, West Lafayette, Indiana 47907 This paper presenlt an overviem of current microprocetstr archilectvree tvhich atip- port memory manogement. Basic requiremcnti for a proctitor to eupport the memorp management are defined, and the hierarehicall]/ organized memory ie intrddueed. Several addrett translation »chetnet, euch at paging, eejmcntation, and eotnbined paging/ ttgmtntation are detcribed, and thtir imptcmentation tn cvrrent microproccsaort ie diicussed. A »peeiai tmphasit ie givtn to tht applitation of tht aeeociative eache memorji. Singlc-tcvel and muiti-lcvtl addrcst mapping Bchemei are analyztd and eom- partd. Fvrthermore, tke paper discussee the eapabilities of cvrrcnt mieropraceisore to evpport virtual mtmorp, tvhich ineludcs obilities to rteognhe an address /ault, to abori the czceution c/ the evrrenl inttnetion and save neccisarg informalion, and the abilitp to rtttort tht taved statc atid resume nermal processing. Tvio methods to restart (he inter- rupted inttruetion, itutruetion restart and inttruction eontinualion, are evalaated, and thtir implcmcntation in cnrrent mieroprocessora u discvsted. Proteetion and teearitp rcquircmcntt art defined, dnd two proteetion tchcmet, hicrarehical and non-hkrarchicai, are evaluated. I. INTRODUCTION New gencration 16-bit and 32-bit microprocessors are extensively used in multiuser and multjtasking environments. Tfaerefore, there is an. iocreased demapd for the sup- port of memory management. Furthermorc, as shown io Figure 1, the capacity of pri- mary and secondary memories in advanccd microprdcessors is iocreasing, which in turn requires an increased virtual memory space, as well as more sophisticated virtual nemory management mechanisms. In the 16-bit microprocessor arena, the techniques applied to solve memory management problems are relatively inadequate, and inefficient. At the 32-bit level, a more standardized approach can be found, and significaDtly more sophisticated srchi- tectures for memory management bave been designed. The paper evaluates various architectures for memory management and virtual memory support, and their imple- mentations in existing microprocessors. Several important issues are addresscd, such as selection of a virtual. memory organization, multi-|evel memory mapping schemes, asso- ciative cacbe memories applied to address translation, virtual memory support tecb- niques, dynamic )nemory allocation algorithms, as well as prptection and security tech- niques. Tbe implementatioD of these tecbniques in eurrent 16-bit and 32-bit microproces- sors, sueh as Intel 286, 386, and 432, Motorola 68010 and 68010, Natiooal 32032, and Zilog Z80.000, is discussed. . Tbe paper is organized in eight sections. Tbe Section 2 discusses tbe requirements for a processor to support memory management. Two main strategies applied in current microprocessors are presented: memory management unit (MMU) on-the-CPU chip versus off-tbe-CPU-cbip. Two memory addressiog schemes, lioear and augmented, are evaluated. Section' 3 deals witb. tbe various address translatioD tecboiques, such as paging, 8cgmentation and coinbined paging/segmeBtation, and their hnplementations io cufrent microprocessors. Both single-level and mulli-level address mapping scbemes are d. Tcchiiiques to support virtua! address mechanism arc prcsented in Section 4. The implemeatations of two methods, vvhich resume operation arter en addrcss fault is detected and oorrected, are discussed. Soetion o describes the securilv aad protoction t"rhniq'j?s applii-d in rurrent microproccssors. Figure 1. Addressing range needs [54] 2. MEMORV MANAGEMENT REQUIREMENTS Advahced microprocessor svstem arcbilecture, which is able to support memorv management, uses tbe bicrarcbically structurcd m-.'inory svstem, as shown in Figure 2. Thc memory svstem consists of lliree levels aad involves the maintaining of a largo address spaeo based OD a.hierarchv of memorv dcvices, which differ in memorv capa- citv, speed. and cost. At tbe first lcvel is the bigb-specd cache memorv, >vbich is tbe most cxpcnsivc and, tbcrtfore of the lowest capacitv. At tht secood !evc! is the real (primary) memorv, whicb is slo\ver, but less expansive than the cache memorv. Tbe L MA!N PROCESSOR DATA CONTROl "~~ 5^. LOGlCAL AODRESS ! MEMORV MANAGEMENT UNlT 1 1 HIOH- SPEEO OCHE h PHTSICJM ir. \ i MAIN MEMORV BACKING STORE 4 A DRESS Figure 2. Microprocessor system arcbitečture with tbree levels of l:icrarcbically organized meniory to support memory manEgcmeai [30] tbird level coosišts of largc capacitv storage dcviccs, such as disks, Mhich bo!d tbe pro- grams ant! data that caLnot Gt in the first two Ievcis. \Vbec a proccss is to be run, ils code ar.d data are brougbt into primarv or cache moniorv, where cacbe mcmory a'ways boids the most ri'cen'!y uscd code or data. In this hierarcbical mcmorv structure, the basic requirements of the mcmorv managemcnt svstcm can bc specified as fo!lows: 1. abilitv to tr3Dslate addresscs and support dvnamif mc-morv allocatioc, 2. abilitv to support \irlual mcmorv, and 3. p.bility to provide memorv protection and securitv. Tbere are two basic slrategies in creating tbc microprocessor svstcm architedutc for mcmorv mabagcmcnt: 1. memory managcment unit is on thc CPU ehip, and 2. memorv manag?mcnl unit oll tbe CPU chip. Both stratogifs, as wcll as tbe list of microproccssor systcms wliich applv thcm, arc indicatcd in Figurir 3. CPU VERSUS CPU MMU Intel 285. 386 Inte) 432 Zilog Z80.000 Zilog Z800 MC68000/IO MC68020 Z8001 Z8003 NS16000 NCR/32 W£32IOO MC5845! MC68851 Z8010 Z80I5 NS16082 NCR/32I01 VVE32101 Kigure 3. On-chip versus o!I-chip memorv maEagement unit The main advantages of having tbc memory managemenl on the CPU chip are: 1. access time improvement, because there is no off-chip MMU-related delays, 2. maximum portabllity of operating systcm and applicalion programs, and 3. parts-couDt reduction. On the other hand, the memory management on the CPU cbip requires additional transistor count, vvhich could be invested iato oth>;r morc frequent!y used resourees. For examp!e, the Motorola 68020, which applies rnemory maaagement ofT the CPU chip, uses the saved transistor count to implement the instruction cache on the chip. Acother important issue related to mernorj' mansgpment is selection of the memorv organization scbemc. Basical!y, there are two tyj>es of memory organization schemes: linear and segmeDled. ln the 'inear addressing schtmes, addrcsses typically start from zero, and proceed linearlv. The memory may later be structured, by software, at the level of address traaslation. In tbe segmented addressing scbemes, tbe programs are not written as a linear sequcnce of instructions and data, but rather as modules of code and data. The logical addrcss space is broke.n into scveral linear address spaccs, cach ot the specified length. /Vn fffcctive logica! address is computed as a combiDatioo of the scgment number, wbicb is a pointer to a block in memory, iind tbe segmcnt (•ffjot, whicb dcGucs the dis- ptacement vitbin the segrnent. Table 1 shows momory addressing scbemes appiied in various advanced micropro- ccssors. Note tbat Intcl aDd Zilog offer both segmented and linear addressing on their 32- bit proccssors i8038G and Z80,000, respective!y, as software programmable options. In gencral, a liiicar addressing scbeme is bettcr suited for tbe applications that manii'ii!:itf large data structures, while the segmented addressing scbemc facilitates prograinming,-ecabiing tbc programmer to structure softvare into segments. In addi- tion, tbe segmented addressing schcme simplifies prolection and relocation of objects in m«nory. As &a exain|ilc oftbe segmented addressing scbcm?-, Intcl's i808C processor eontaias fou^ l&-bit segment rcgisters, which point to four objects in tbe memorv: code, stack, dala, and extra sogment (altcrnate data), as shown io Figure 4a. The address calculation mecbanisin, wbich produees 20-bit phvsica! address for tbe i80SC, is sbovvn in Fig. 4b. 3. ADDRESS TRANSLATION TECHMQL!E£ Rc-gardlcss of tbe memory organization scheme, tbe prccessor must bave aa address translation mccbanism to baodle virtual memorv. Tbe address translatjon mechanism also provides a method of protecting memory objects. Th- .-"idress translation is a process of mapping logica! to phvsical memory TABLEl Memory addressing scbemes in advanced microprocessors PROCESSORS Intel S0S6,S02SG,432 S03S6 Molorola 6S000, 680J0. 6S020 National 16032, 32032 Zilog ZSOOO ramily Z 80,000 . ATkT WE321OO NCR NCR/32 ADDRESSING SCIIEME Linear • * * * a Segmccted • * « • HOOVUA HOOULte pnocess POOCKS OATA eioc«i moccss DATA BlOCKl cooc DATA COOC OAt* STACK CATA COOC SCCMCKT BASE OATA SCCHEVT tASt STACK SCCUCVT OAiC trrnAStCMtWTB*sc SEGULNTRLClSIČRS CPU cs ss Efi « OFKET j SI3MSKT 1 v ooool 7 1 fi OfFSET A0ORESS I SECUEMT AOORtSS JO-BiT ?M UtUOS*ADDHE5S Figure 4 Segmentation end address cajeulatioD a. Segmpnted addressing scbcme of tbe iSOSG [28] b. Addrcss calculation in tbe iS0S5 pSl addresses. The address translation mechanisrn divides the memorv iDto blocks, and . then performs mapping of a block of logical addresses into a block of phvsical memorv addresses. It a!lows programs to be relocated in the primary«memory. It also provides the base for virtua! momory' svstem design, where tbe logical address space caa be largcr than the pbvsical address space. The virtua! memorv mechanism a!!o\vs pro grams to cxecute even vhen only few blocks of a program are in tbe primarv mcmory, while the rest of the program is in the secondarv memorv (on the disk). Tbe other important processor requirements for virtual nicmory support are discusscd in Section 4. Three basic address translation schemes are: 1. paging 2. segmentation, and 3. combincd paging/segmentation. In tbe paging svstems, tbe primary memory is divided into fixed-size blocks (pages). while in the segmentation systems, the blocks are of various size (segments), as shown ID Figure 5. Gencrally, tbe segments can overlap', vvhile pages cannot. so pages are usual!y of a re!atively small size, compared to total memory. Typical page size is betweea 256 and 2048 bvtes, whi!e segments can be 64K bytes or more. The paging/segmentation systems combine the features of botb paging and seg- mentation addresaiug schemes. The segmentation part of tb? scheme rnanages virtual space by dividing tbe programs into segments, whi!e tbe pagirg part manages physical memory, wbich is divided into pages. Each segment consists of a numbcr of pages, as sbown in Figure 6. Selection of the address translation mechanism b:is a crucial impact on tbe memorv managcment tcchDiques, whieh have to be implemected by the opcrating svs- tem, to handle page br scgmcnt fetching, placement, and replacement. For example, the paging address transiation svstem is wel! suited for page placcmcnt and replace- mcnt, because all pages are of uniform sizc, while the segmeatation svstem needs more complicated placemcnt and replacement algoritbms to matcb incoming segments witb available memorv space in the segmentation svstems, a sejrrr.^nt must reside entireiv in phvsica! (primarv) rnemorv in order to be exeeuted, because iL? minimum unit that can be swapped is the segment itself. The availablc memorv space becomes thea frag- mehted into manv small pieces, and there is.not enougb contijuous memory for storing one large segment. Because of the fragmenlation problem a=:ociatcd with tbe segmen- tation svstems, the paging systems are more ellicient witb r^pect to memory uliliza- tion. In the paging sjstems, all pages ar<- of eqyil size, th1:'. pag?s can be swapped v.ithout loaviug unusable fragmentcd spaces. Also, it is aot ncccssary to swap in nll pages of a program at once, in order to execute it, Vut onlv the pagcs requircd ("demand paging"). Tbis significantlv reduces tbe swapping tl.-ne. For all thcse reasons, (be demand paging address tran?!ation sjstcm scems tr> be VIRTUAL (SCC0NDARY) PAOE MAPPINO MECHANISM «. Paglng system PROGRAM-1 PROGRAM-2 b. Seamenlilion syslom SEOHENT MAPPING MECHANISM PRIMARY Mf.MORY 1 PROORAM-1' PR06RAM-2/ i 1 1 [ PAGE 0 PAGE t PASE 2 PAGE 0 PA6E 1 PA6E 2 PA6E 3 V A h / PA6E FRAHE 0 PA6E FRAHE I PA6E FRAHE 2 PA6E FRAHE 3 PAGC FRAHE A PAGE FRAME 5 PAGE FRAHE 6 Figure S. Address translation schemes a. paging • b. segmentation' VIRTUAL (SECONDARY) MEMORV MAPPING MECHANISM PR!MARY MEMORV !—I Pcgs 0 --SE6MEKT B Pege I- SFGfiENI C Pegc 1 Pege 2 ojs 3 | PAGE | PA3E PAGE ?AQE PAGE PAGE FRAME FRAME FRAME FRAHt FRAME FRAME 0 1 3 < 5 6 | PAGE FCAME 7 * PAGE FfAME 8 I Figure 6 Address translation by eombined paging and scgmentatioD (paging/segmentation) tlie way to go. As a nialter of fact, a!l advanced .:2-bit prccessors, as \vcll as sever.il !O-žjit processors, fullv »upport demand paging techniqut, which niay becomc a standard address traaslation mecbanism in future microprocesoors. Furtbe.-rr.ore, when one selects tbe address translation fcbeme (paging, segmenta- tion, or combined svstcm), there arc two sdditiooal issues ubich should be addressed: 1. imp!enK-nta!ion o! the svlected addrcss (ranslation mechanism, and l!. sclortioo of tbe uumbvr of maj)]>ing levcls 3.1 Implementation.of the address translatlon schemes Regardless of tbe address translation organization, the iraplcmcntatioD raetbod is always based on translation tablcs lorated in j)rimary mcmorv: page map tables (PMT) iu thc pagiag svstcms, and segment rnap tables (SMT) in the segrrientatioa svstems [10,11,14,16]. The table cntries contain inforrnation to trans!a(e tbe lo^ical into tlie phvsical acdrcss, as »dl as additional data for protection pnrposes, and to support p';i'cm(;-nt and rtplacemcnt algoritbms. A (ypical furmat cf s translation (able enlry is sho«a in Figure 7. As an exarnplf of tbc address translatioo irnp!'.'mc-Dt;i!i'..p., the virtua! addrcss of ihe i2?6 proces^or consijts of a pair: segment selector and dL;p!acement v=(s,d). Tlie RE5I0LNCE ACCLS5 RI0KT5 & PROTECTION SUPPORT FOR REPLACEMENT PKYSICAt A0DRE55 Figur t 7. Tvpica! format of a page or segnicnt table cntry scgmont sciector poinls to (!ie scgment dcscriplor in thc seg^ient map table, as shown in Figure 8. The svgment d«.'scr plor rontains tho primarv mc.1 iory addrcss s', at wbicb thc scgmciit begins. Tbe dispiaccnient d is adJed. to s" fcrming tbe real pbvsical address, r=d+s', corresponding to the virlu.il addrcss y. »CVCNTfOlOOICUADOKCU { MCfc» V////////////A Figure 8. Address (ranslatioD mechanism of tbe i2S6 [28] Tbe descrihcd addrcss tranflation implemcntation metboi is Jinowo as direct map- ping. Tracslating a logical address to a phjsical address, usiis dirert mappitig, requires an additional momorv accoss opcration to obtain segmont (or page) b.isf addrcss, and tbcrcfore the use cf direct nmpping can cause the compuier sv-stpm to run programs at IOWCT spood. Tbcre aro sevcral solutions applicd in modcra mirroprocessor nrcliitcclurcs to ovcrcome liiis problcm. Thcse solulions are distusscd bclo«. ID thc IntePs i2S0 processor slandard, four sogmcnt regijtprs are ex(cDdcd \vith tbe corr?5pouding four 4S-bit sogmcnt dcšcriptor cache rcgisters, ns sbown in Figure 9 (20,28). SegmcDt regislcrs crc loadcd by the program, wbi!» the CPU loads tbc explicit cache r«?gis!ers, uhkh are invisible to programs. Explic:t cacbe speeds up tht opcration by clirninating thc nced to refcr to a descriptor tab!e for čvcrjr m^inorj refcrcnce instruction. Loadinj tbe explicit cache is pcrformed io four steps: SEGMENTSELECTORS ACCESS SEOMEVT SEGVENT RIGMTS BASEADOSESS SSE 7 CI3 CU 0 cs os ss ES SEGMENTREGISTERS (LO/kDED BY PROGRAM) SEGVENT DESCRIPTOS CACHE REGISTERS (CCU10ADSTMIS CXPL>t:r C/-CHE WHICH IS INVI5I3LE TO PBOCfAMS Figure 9. Descriptor data type in the i28S [2S] . 1. Program places a selector in tbe corresponding sepnent register. 2. Processor adds the selector index to the base address of the descriptor table, to select a descriptor. 3. After the processor verifies segment access rigbts, it copies tbc descriptor to the dala segrnent register in cache. •i. Thc processor uscs the descriptor information to check segnient tvpes and limits, as well as to form the cffective address. The doscribed tcchnique based OD exflicit cacbe registers speeds up tbe direct mapping, biil stil! is not efficient enotigh, tecause it reqv;:res ciche loadiog vihenever coniro! is transfcrrtd from one to anolber sej ment of tbe ?3me t; pe. A much more sophisticaled solution is bssed on a speria! associative cache (32 to 64 locations), \vhich holds tlie most recent!y used se! of trf.p.slatioii values. Then, tbe translation prucess is perfnniM-d in (he follov.ing steps, as sbc«n in Figure 10: 1. First, thf virtual address received from the CPU is searcben the TLB mctbod varv in comp!exity, tbev can be cl.issified in two basic groups: address-acccs^able TLBs, and cODtcnt-addressable TLBs. In thc addross- accessable TLB app-roacb, a logical address Celd idenliGes the register in the- TLB tbat holds thc phvsioal base addrcss. As an example of lh:s tc:hniquc, (be ZSCO \vith on- chip MNil" is sliciun in Figure J! [33j. Tbc virtual adilrrss consists ot a 4-bit TLB pointer, and a 12-bit offsct. The Tl.B pointer si-l'-cts one of i\<.<- ll> tra:is!atioii rcgisters of tlip TLB. Then, tbe 24-l-it physk-al M 10 16 PAOE BASE ADDRESS RESISIERS 15 10GICAL ADDRESS ri 15 0 REGISTER INOEX j PAGE OFFSET 4 blts !2 blts 15 4 3 0 PAOE FRAME ADDRESS PROTECTION FIELO 23 PHVSICAL ADDRESS PHVSICAl MEMORV PAGE FRAME ADDRESS ) OfFSET Figure II. The Z800 address translatioD based on the address-arcessable TLB [33] 32 conlent- j acMressable i registers LOGICAl ADDRESS 16 bits 7bits |i|.. MASic .JOj Determines T| segment sire ASSOCIATIVE V LOOKUP ...|o SEGMEK T BASE ADDS£S5 COMPARISON B!TS Sf6MENT BASf ADDSES3 Figure 12. Content-addressable TLB in the MC58451 NiXfU [44] address is formed, as a selccted 12-bit page base address from the TLB, concateDated with the 12-bit oftsct. Tbe addrcss-acccssable TLB technique is not pracuca! for large systems, because aeccsfing tbe TLB by addresscs recjuires a segmenl rcgister for each logical scgmcnt or page tbat can be relocated. Thc contcnt-addrcssable TLB is more suitablc for large svsiems. Tbis method has been applied in scvera! microprocessors, sucb as the MC6S451 ,\L\fU, Z80I5 PMMU Inte! S0336. and \S IG0S2 M.\JU. To illustrate this meibod, Figure 12 shows the contcnt-addressable TLB aj.plied in the MC6S431 MNRJ. PR00R.V1 RCOUCSTS ACCtSS I0S£W£I ( OS IN5IRUCTS CP'J TO RlAD TKE PAOt fROrlTHE S.1 ?Ri.-MSY MEHORT REPlACtMfNI MCORITKn •"Relurn lo !hs feulte: Instrumon Figure 13. Tbe Do\v-chart of a pa^ing virtual memorj- svstem witb an associative cache niemorv The MNIU receives a logical address (23 bi's), and the ijiask register masks the low order bits to determine the segment size. Then, the MNfl' compares the rest of the most signiSeant bits- wjth the comparison field values of 32 content-addressable regis- tcrs. If a match is found, the MMU performs address translation. If tbere is no matcb, the NtNfU generates a fault condition, and activates a trap rouline. The trap routine wi!! update the TLB from.translation tables stored in primarv memory. The flow-chart in Figure 13 illustrates the ne?essary operations iu a paging-based virtual mcmorj- svstem \vith an associative cacbe memorv for recent!y used pages. The virtual rnemorv is activated whenever prograrn re\j'j«ts aa aecess to a page. Tbe flow-chart in Figure 13 indicates three different control patbs: 1. whcn the pagc descriptor is found in thc associative cache, 2. when the page dcscripfor is not found in thc cache fcacbe miss'), but the page is in the primarv mernory, and 3. \vben the pagc dcscriptor is uot foiind in the associative cache, and thc page is not iri the primary memory ('page miss'). Tben,.the address fault ban- dlir.g routine is activated. Ia addition to address translation mccbanism, the MCC8151 MMU supports dvnnmic mcmory allocation. Tbe dvnamic memorv allocauon mechanism is able to allocate the memory to a process, vshile it is running. The Binary Buddy systcm, an algorithm for d_YDamic rnemory allocation, is implemented in the MC'68451. The algo rithm dividcs the entiri- phvsical address space into buffers. tbc size of whicb varics frorn 256 bvtes !o 25GK bytes (in thc MC68451). Tbc algorilhm mairtains thesc bufTcrs b\ using the buffer lists for all sets of bufTer:; of tbe same size. as well as buffer descrip- tors for cacb buder independentlv [52]. \\ hen a mcinory request is reccived, thc algorithm s-earches (hrough the list of available bufTors in ordcr to fmd the best fitted buffer. If the best-fitled bufler is not available, tbe searcb process is continued for the next larger size buffer. Tbe Dow-chart of tbe Binary.I3uddy algoritbm is sbown in Figure 14. • A detailcd description of the algorithm, as wc!l as additional issues relatcd to it, are discussed in [45,48,52]. 3.2 Single-level verBus multi-level address mapping Tbe second issue closclj- related to addrcss translation architectures is dealing with thc number of mapping levels in address translation schenvjs. Tbe conventiocal address mapping schcme consists of just one mapping level, such as in most of the 1&- bit processors (i28G, Z8010, and Z8015 MMUs). Ob tbe other band,- almost all 32-bit processors use muHi-level mapping schemes, which brings some nevv features in the meniorv managcniont. Thc basic advantagos of multi-lcvcl rnapping scbemes vcrsuš singk^-kvel mapping, fchcmes can be summarized as fo!lows: MEHOar RfOCEST SCARCH or TH[ nznom BUffER ATAILABU TABLE5: (AN APPROPRIAIELV SIZE Burrcp.) 8UFfr» : Ho StARCH FOR THE KEXT l.ARG£R BUFFER AilOCATlN« BUrFER TO tHE RIOMSCTINO TASt: No SPLII IHE LARGE 8UFFER ISTC TW0 PIECtS .ONt PIECt AUOCATE TO !HE R[(X)fSIING r« .ANOIHfR PLACt ON IHE MErl!«r AVAILABLt usr i H. IPUT RCOUESt IN A OUtUE Figure 14. Binary Buddv algorithm for dynamic memorv allocation HOKtKi SEICCTOR Figure 15. Two-level address mapping scheme in the i432 processor [27] 1. tbev provide more sophisticated protection mečbasism, 2. tbey arc able to accommodate !arg>r address spaoe, and 3. they provide page sharing. Several multi-level mapping schemes are evaluated bclow. Intel's 1432 processor iises tvvo-level mapping in order to providc more sophisti- cated protection mochanism, as shown in Figure 15 [27,40,47]. The segment scleetor register poinls to .an entry of the access segment, vihere the access rights are stored and are thus associated witb program module§. Tb, the module A can writc and read lhe selected segment, vbile the module B can onlv LINEAR ADDRES5 DIRECTORY TABLE ROOT OFFSET TRAN5LATI0N L00.ihg schtnic, along with a translalion lookaside luffer, desigDed as a cache memorv. The complete architecture is sbown in Figure 16. Tbe linear virtual address coDsists of three fieldš (directory, table, offset), and address (ranslatioo is performed in the following steps: ° 1. first, tbe address is searched tbrough the TLB. V tbe address is found, tbe translation is pcrformed in the TLB, and the primary memory is accessed direcllv. •';!'•. 2. if the address is not found in the TLB, tbe miss «igna! is generated, and tbe (ranslation is performed tbrough the (wo-leve! rr.apping built on the CPU chip, as sbovvn in Figure i5. The (wo-1eve! on-chip mapping scheme enablcs fašt address' translation, and page tables can be sharcd and/or swapped. A similar two-level niappinf scbeme bas been imp!emer:!e.l in the NSI6082 MMU [6.25,38]. The tolal phvsical addrcss space is divided into 32,768 fixed pagcs of 512 bytes cach. The virtual address consist of 24 bits divided icio Ihrce Gelds: index-l and indcx-2 of thc page sclcctor, and thc offsct, as shown in Figure 17. LOGICAL ADORESS 8 BITS 7BITS 1sllNDEX SBITS 2nd INDEX BANK l/ L 255 PTEs 15 BYTE ADDRESS 123 PTEs 9 11 RESERVEO (PAGE A00RESSV512 PROT. & USE PAGE PTE (32 BITS) MEMORV Figure 17. Tvvo-level mappiag scheme of tbe NS160S2 MMU [35] The index-l (8 bits) of the page selector is used to lorate ope of tlie 256 entries of the page table. The contents of the page table PTE-1 points to the beginning of one of 256 pointer tables, each of \vhich contains 128 entries. Thcn. the pointer to tbe pointcr table is combined with the index-2 (7 bits) of the page selector, to locate one of the entries within tbe pointer table. The selccted entrv contains tbe actual page number in primarv memory. The offset field is tben uscd to locate data within the page. The NS 1S0S2 MMU contains the associative cache to hold 32 recently used page address entries, as well. Tbe Z80,000 processor uses three-level mapping scbeme bafi!(J on H)? SCt Of tbtee translaUon tables located in primary. memorj' [2,33,53]. ll alsc .i^ntaius an ošsoeiativ« rr.emorv for tbe TLB, vvhere 16 most rccently referenced pages are storcd. The CPU automatioallv loads the TLB from translation tables, when a logical address is missing. The NCR/32 processor uses an address translation chip (ATC) for address transla- tion based cm pagicg svstem with one-level mappiug [22]. The chip contains 16 associa- tive memories for recently used pages. Tbe ZS010 MMU, vvhich is used with' the ZS001 processor, applies one-level seg- mentatioD sjstem, based ou 64 content-addressable segment descriptor registers. For more details see [33,50]. The ZS015 MMU differs from the Z8010 MMU in that the logical address is translated into page frames ratlier than segments. It applies one-levcl mapping scbetne and uses 64 page descriptor registers, which are also content-addressable [33,56]. The \VK32100 32-bit processor uses oG-chip £2101. MMU, whicb supports both demand paging and segmentation svstems, which are user iseiectable [15, 17, 18]. The MMU contains an on-chip cache memory: a 32-entrv segment d<>scripi.or cache, and a 64-cntry page descriptor eache, to hold recently used segment and pagc dcscriptors, respcctivelv. Table 2 summarizes address translation features of sonie 16- and 32-bit micropro cesjors. 4. VTRTUAJL ADDRESS SUPPORT TECHNIQUES A virtual memory svstem a!lows the user to execute programs on a very large mcmorv of virtual address space, much larger than the actua! phvsical memorv. Tbis is accomplished by the capabilitv of a microprocessor to detect access to memory pages (or segments) wbich are not present in the phvsical rnemorv. \Vhen the virtual memory svstcm delects such a reference, it will fetch' the required page from tbe secondary memory into the primary memory. In order to support virtual memory capabilities, besides the address translation, a microprocessor must provide tbe follovving attributes: 1. to rccognize a page or segment fault, if the page or segment is not pres?nt iD the primary memory. The memory manager must then iaform the pro- cessor 50 that the missing page or segment caa bc fetcbed from the secon- dary memorj', and eventuallv one of the current pages or scgments can be w t r+ **< yi 9O • O v> r> * (-• a cr P t—• 0 0 + r™ ^_ 0 i—• n »—< cs Q/ [7 O 31 ZZL ^-3 >• 1—j 1—1 0 c> 1 3 •"( »•< •o a- 0 w n • £> cr p5 + n 70 (^ 2: 13 C") »—4 n "O cra 01 t— M O u 5 0 (~j ^. 1; r* < j 0 z z OJ ^^ 0 c + 1—* o* 0 ao 2; 0 Ci O t—» 3^ •a V) )—' N n •^* rsj - 0 rt rt 1 Dl a. "i \ n • CJ o D" 00 C: • O O O O> O __ 5> ^H Z O /—^ •O Dl cra Ul M" (sl 0 V) 0 n » 0 to r> 0 r~ 00 0 0 : 1- S. ..en _, 5> 0 0 •u u 00 U) t_l. ^J •« n _. 0. < 1 0) n 0 0 t/i trt Oi zr ^_> 0 00 oc c tn •:- &> Dc t/i M f ISJ d. t/) n oc tn r-» O M Ul r«) — X — Mll D 2 H O 2£ «Q 3 O r* ra rt 1 Zi C- ^ 3: 0 o^ co O ts* O + 00 00 VI 1 0 -0 c Cfi cr ?3 01 ra r- rr 0 -H r- ro o* 1 n 0 3 r* re 3 19 O. C • "1 * C 00 _« 0 ( ( n Ov 0 0 0 + Ck 1 C rn 50 cn ra ™ ro n r** t-i M 1 O: O^ 3 r* O 3 (^ 1 09 O. o. 0 • S tn c in ra cn ra j~ n -a a r" cn • n 0 9 r» a rf 1 a. 0. • O • 3 O t—' OO O Ut OO O^ cn o> H *c •—< W d ~i ^> -5 «—* 0 z (A U> O n • 0 n tr re H C3 ra i—• Lr* (— O H Cft ra 0 R5 -i O z U) t/> O O n w 0 n -J c- ra r» *—* OO O 00 c* 0 m rr ~zr H O z- U) 0Q •3 3 a. 0 0 • re 00 • •0 O rr. C/5 O -?3 • rr > > 0 r- 0 rri < tn c: > 0 0 53 m 3: 50 ro > cn r- •—j O z m ^ < 13 r^ •"* 0 0 > 0 D" rn "^ m a. o o 3 » 3 ui' replaced, 2. to abort execution of the current instruction (inttruction abort capobi!ity), 3. to save necessary information needed later to recover from the fault, 4. to call and execute tbe fault service routine in the operating system, vbich \vill swap the required page(s) OT segments(s), from seeoi5dary memory to primarv memory, 5. to provide necessarv information for tbc operating svstem, in order to sup- port page (or segment) placemenfand replacement algorithms (indicalion of occess aclivities), and 6. to restore the saved state and resume the nortnal processing (inslruciion rtstart capabilitits), Although very different in complexity, all advaneed microprocessors provide instruction abort acd restart capabilities. Some soiutions are presentcd belo«\ Recog- nizing the access fault can be performed internal!y onlv, if the MMU is on the CPU chip, or both interually and externa!Iy, if the MMU is off tbe CPU chip. When an acccss is madc to an instruction or data whi:h is noi present in primary memory, an address error is interna!ly detected, and it iniliates the address error fault handling routine (intcrnally dctectcd fauttj. If the off-chip MNfL' detects a fault situa- tion, it v,il! scnd a signal to the CPU, whicb v.-ili in turn activate the fauH handling routine (cztcrnaHy dttccted fault). \Vhen the GPU recognizes an access fault, it saves (be state information needed to recover from the fault. The informatioD fs usua!ly saved on the stack. Thc typical information wbich mast be saved ID the program counte: (rtarting address of tbe instruciion), the status register, tbe fault address, the trap-speciilc paramcters, the access type, the internal temporary registers, various internal statuses, etc. For i!!us- tration, the MCGS010 processor which supports virtua! memorv, savcs 2G words versus tho MC6S0O0 process, wbich saves only seven vvords, which is not enougb to provide the user w;th thc slatc of the machine after tbe. fault bas been occurrcd. Figure 18 sbows ihe informatioD saved on the stack for these two prooe^sors. Th« MC 6S010 address stack is divided into two parts: a user visible section, and a non-user visible section in whicb Ihe interual status and tbe temporary dat.i are saved. Tbe memory managemcnt unit also has to provide tbe information related to access activities needed by tbe operating svstem (placement and replacement a!go rithms). This information is usually stored in the transitioa table entries. Tberc are three information bits \vbicb are present in typical systems: 1. the valid bit - which is controllsd by tbe opcraticr; system, and specifies vvhetber or not a block (page or segment) is in the pr.mary memory. 2. tht rcfcrenees bit - where the MMU typically sets tbis bit (o indicate if access to the corresponding block in primarv n)?ny>ry is on. The operating svstem mav reset this bit to keep track of the acct-ss bistory. 3. the modified bil • wbich is sct by any write operžtion to tbe corresponding STATUS REOISTER fP.0G!UM COUKTER HIGK PnOGRAM C0UK7ER IW FCRMATJvECTORC^fSET SP£C:AL SIATU3 W0S0 FAULT ADDRESS HIGH FAUU ADDSESS ICW RESERVEO DMAOUTPUT BUfFER DATA ISPUT 8U=f£R RESESVEO INSIR IhPU! BUFfER MO.V USEfl ViSlSLE ' IMERNAl IVFORU.AriON • SP 0? 04 06 03 0A OC CE 10 i: 14 16 18 IA 33 S^ECUi STATUS W0RD FAULT A33SESS HIGH • FAUlT ADDRESS IOW I.SSTF.UCTION REGISIER STATfS M5'SIER PR?:c.tU CC-MER H.GK • $!> 02 04 OS 03 0A OC b. 4- Figure 18. Address error stack [36] a. XfC68010 b. MCG8000 block. Tbis bit indieates whether tbe block must be written back to the secondarv memory, before being replaced from tbe primary memory. For illustraiion, thc i432 proc»ssor contains four access activity bits JD ils segment descriptor, as shown in Figure 19. The valid bit (V) indicates uhctber or Dot the segment is in the nicmory. The storage allooated bil (S) indicates ubether acv memorv has been associatccj ^ith this descriptor. The accesscd bit (AC) indicates wbctbcr the setment has becn acccssed, whi!e tbe altcred bit (.AL) indicates wbcthcr tbe iDformation ctJDtained in the bas bcen modified. co AC AL Figure 10. Access activity bits of the i432 processor contained in the segment descriptor [27] The operatiog system uscs V and S bits to detect wb»n a phvsica! segmcnt iš not present in memorv, uhile the AC and AL bits are used by the replacement algorithm to decide which of the currentlv present segments sbould be swapped out by tbe new seg- ment. . . In additioD, several fields in the segment descriptor can be used by the operating svstem to record other useful information about the segment (frequency of use, etc). The other advanced proccssors epntain similar information on aceess activities used by the operating svstem. Commoalj' used page replacement tcchniques are Least Recentlv Used (LRU), Least Frequently Used (LFU), and First-In-First-Out (FIFO) [1,5,8.9.55]. The described information maintained by the CPU (referenced and modiSed bits), as well as some additional user-defined Selds, can be used to design the page replacement algorithm in the operating svstem. ' One of the popular schemes for the LRU algorithm classiGes the pages into four groups: . • • Group 1: unreferenced (R — 0) and unmodified (M-0) Group 2: unreferenced (R=0) and modified (M = l) Group 3: refurenced (R^l) and unmodified (M=0) Group 4: referenced (R=l) and modificd (M —1) The pages from the louest gioups are replaced first. and tbe pages from the highest groups are replaced last. The referenced bit is set by the CPU •vvhenever the page is referenced. The operating svstern (OS) pcriodica!!y elears thc referenced bit. A sophisticated LRU algorithm, "software cacbing." has been implernented in the VAX/\"M3 operating system [31]. The LFU algorithm can also be incorporated into this scheme. Whenever the refcrenced bit is cleared, the OS can count the frequency vvith which the pages were used. Tbe modified bit is set by the CPU.wbenever the page is writt.cn. When the page is swapped, the OS checks this bit to see if there is a need to update tbe copy of the page in the secondarv memorv. The last attribute of a proccssor to support virtual memorv is the most complex, and refefs to reloading of the state of tbe program, and resuming the operation, after the address fault routine is completed. Two methods of. implcmenting the resume opcration on a proccssor are: 1. instruction restart method, and 2. instruction continuation method. Advantages and drawbacks of tbese two metbods are discussed in the fo!!owing two subsections. 4.1 Instructlon restart method Ia this metbod, after tbe address fault error handlitig routine has completed all activities, the instructioD in which fault occurred is restarted from the beginning. Fig- ure 20 illustrates Ihe executioD of the microcode in the as$. wheD no address fault is present (Fig. 20a), and in tbe case when the restart method fe applied, with an address fault occurred (Fig. 20b). In Figure 20 il is assumed Ihat a machine instructioa consisU of severa! microin- structions [ml, m2, m3, m4j. If there is no address fault, th«e instructions will exeeute sequentially, as sbown in Fig. 20a. If tbe MMU detects an address fault in the microin- struction m2, tbe contro! will be transferred to the address error routine. The addrcss error routine wi!l Grst save the information state, and tben tae routine will baDdle tbe address error (the requircd page or scgment will be fetcb?d from tbe sccondarv mcmorv). Finallv, tbe saved information state \vill be resto;?-! and tlie fauhcd instruc- tion will be restarted from the bcginning - at the macbine ic>tructioD level. Therefore, the scquence [ml, m2, m3, m4j \vill be executed again. The main problern in the instruction restart method is that tbe processor niust reconstruct tbe statc of the machine, as it was at tbe begiDLisg 6f the machine instruc- tion, whi!e he faulted instruction was iaterrupted in the midii« of its execution. Thcre are some situations when this is verv comp!ex, such as v-hei a rtsource is used both as input and output parameter in the same instruction. For «xample, in extended preci- sioD arithmetic operations, a carrv (or borrow) bit froni tht previous oporation is uscd in the instruction as an input parameter, but the instruclion itsolf also sets the saino bit as the result pf the current operation. If the address fi':\t is detccted after this bit is updated, the original value must be restored before the iistruction is reitarted. A similar casc is with autoiriCremcnt and autodecrcment addre;.;bg modos. FAULI 5E0UENCE NORMAL MICR0IN5TRUCTI0N SEOUENCE FAULT ml ADDRESS FAULT .. ROUTINE ml m2 m4 Figurc 20. Microinstruction sequence [36] a. No address fault b. Icstruction restart method Several techniques have been proposed to solve this problem, and are discussed below: 1. The processor may postpone the modification of u-er-visible resomces (such as rarry bit), until the end of the instruction. Thea, if the address fault has not occurred, the resoufces will be updated. 2. AJl modiGcations of the user-visible resources wil) be recorded by tbe proces- sor if the address faull occurs. On the basis of thb information, tbe proces- sor will be ablc to restore the original values of the modified resources. 3. The processor maintains tbe copies of a!l user-visible resources, that are modiSed. Because the copy a!ways contains the origioal value, if tbe addrcss fault occurs, it'will be easy to restore the original state. 4.2. Instructlon continuatlon method In the instruction coDtinuation method, vvben the address error routine has been completed, tbe machine instruction will not be resumed from the beginning, but from the same locatioo within the instruction at wbich the execution vvas suspended. Tbe execution of the same sequence of microipstructions |ml, m2, m3, m4), in tbe case of tbe eontinuation method, is shovvn in Figure 21. Tbe address fault was detected in tbe microinstruction m2, and the control was transferred to the address crror handling loutine. Aftcr tbe routiuc has been com- plited, the proccssor will resume oporation, jy exe.'S16000 Arcbitccture," IEEEMicro, Apri! ISS-Ž, pp. 26-47. '' 26. "IAPX 286 Operating Sj/slems Writer's Gvide,* Intel Corporation, Santa Clara, 1983. 27. "Inlroduetion to the iAPX 432 Architeeture," Intel Corporation, Santa Clara, 1981. 28. "Introduction to the iAPX 286," Intel Corporation. Saota Clara, 1982. 29. Kaminker, A., et al, "A 32-bit Microprocessor witb Virtual Mcmory Sup- port," IEEE Jovrnal of Solid-State Circuits, October 1981, pp. 230-231. 30. KaowltoD, K.C., "A Fast Storage Allocator," Commvniealions of tht ACM, Vol. 8, No. 10, October 1065, pp. 623-625. 31. Levy, H.M., and LipmaD, P.H., "Virtual Memory Management in the VAX/VMS Operating Svstem," IEEE Computtr, ^Jarcb 1982, pp. 35-41. 32. Linden, T.A., "Operating System Structures to Support Securitv and Reli- able Softuare," ACM Computing Suneys, Vol. 8, No. 4, Dec. 1976, pp. 410. 33. Look, H., "Virtual Memory for Zilog's 8-, 15-, and 32-bit Microprocessors," Procccdings of the 1EEE Mini/Micro Southcasl. Orl.indo, Florida, January 1084, paper 3.3. 34. MacGregor, D., "Hardware and SofUvare Strat?gies for the MC68020," EDN, June 20, 1985, pp. 89-98. 35. MacGregor,- D., Mothersole, D., and Mover, B.. "The Motorola MC68020," IEEE Micro, August 1084, pp. 101-118. 36. MacGregor, D., and Mothersole, D.S., "Virtua! Memory and the MC68010," IEEE Micro, June 1983, pp. 24-38. 37. Martin, G., "Virtua! Memory Managcment Expands Microprbcessors," Com- puter Deaign, June 1983, pp. 169-178. 38. Mateosian, R., "Elegance is Ever-ything in NS 16000 Memory Management," Proecedings of the IEEE Mini/Micro Sovthcast, Orlando, Florida, January 19S4, paper 3.2. 39. Mateosian, R., "Operating System Support - tbe Z8000 \Vay," Compuler Disign,\lay 1982, pp. 255-261. 40. Mazor, S., \Vharton, S., "Compact Code iAPX 432 Addrcssing Techniques," Computer Design, Mav 1982, pp. 249-253. •4li Mazor, S., Wharton, S., "Promote User Privacv Through Secure Memory Areas," Compvtcr Dcsign, October 1982, pp.' 89-92. 42. "MC680S0 SS-bit Microprocetsor Uter'e Manual,' Prentice-Hall, 1984. 43. Mvers, G.J., "Advancee in Compuler Architcctvrc," John Wilcy & Sons, 1978. • 44. Philips, D., "Memory-Management Strategies Suit Diffarent Application Areas," EDN, September 1984, pp. 135-143. 45. Peterson, J.L., Theodore, N., "Buddy Systems,* Communications o/ tkt ACM, June 1977, Vol. 20, No. 6, pp. .421-431. 46. Pohn, A.V., and Smay, T.A., "Computer Memorv Systems,* IEEE Com- puter, Oetober 1981, pp. 93-110. 47. Pollack, F. J., et al, "Supporting ADA, Memory Management in the IAPX- 432," ACM 1982, pp. 117-130. 48. Purdom, PAV., and Stigler, S.M., "Statbtical Projerties of the Buddv Svs- tem," Jovrnal of the ACM, Vol. 17, No. 4, October 1970, pp. 6S3-697. 49. Saltzer, J.H., and Schroeder, M., "Tbe Protectio: of Information in Com- puter Svstems," Proccedings of the IEEE, Vol. 63, No. 0, September 1975, pp. 1278. • /' 50. Skoog, S.K., ".\!emory Management with the NCR/32 Scipset," Proetedings of thc IEEE Mini/MicTO Southeatt, Orlando, Floriia, January 19S4, paper 3.1. • • 51. Stockton, J.F., "A Virtual Breaktbrough for Micro«, Computer Design, pp. 153-162. 52. Slockton, J.F., "The M68451 Memo^ Management Unit," Elccironit Engineering, Vol. 54, Mav 1982, pp. 54-73. 53. Timms, B., "ZSO.OOO Mainframe Resources Optirnize' tbe Soflware Environ- ment," Proceedinga of tht IEEE, Mini/Micro Southtast, Orlando, Florida, January 1984, pp. 4.4.1-4.4.13. 54. "Touch the Futvre," Intel Design Seminar, Miami, Florida, 1985. 55. Turker, R., and Levy, H.t "Segmented FIFO Page Replacement," Procecd- ings of the ACM Confcreitit on Measurement artd Modcling oj Compuler Sys- tems, Las Vegas, Ncvada, Scptember 19S1, pp. 4S-51. 56. Wallers, S., "Memory Management Made Easy with Ibe Z8000," Procttdings ofthe \Veseon, 1981, p;.. 9.3.1-9.3.9. 36 9. ABOUT THE AUTHORS Dr. B. P. Furht is on the faculty of thc Dcpartment of Eloctrical and Com- puter Engincering. Universitv of Miami, Cora! Gables, Florida. He bas published over 60 technica! papers, and 2 books. Hc is tbe autbor ot Microprocessor Inter- faeing and Communication (Rcston 19S5), and coeditor pf tbe Tvtoria! ©n Advanced Topics tn Computer Archilecture (IEEE Press, 1085). His current rcaearch activities include bigb-level language computer architcctures, multiproces- sor systems, and architectures for virtual mcmory managemcnt. Hc prescnled over 30 invited leetures in Europe, North, and Latin America on various topics related lo computcr arcbitecture. Hc has bcen involved in consulting activitics for a Dumber of companies surh as KASA, RCA, Cordis, HoDtvvvell, and othcrs. lle is a member of tlie IEEE, and a chief cditor of the Intercational Jouroal of Mini and Microcornputers. Dr. V. M. Milutinovicf is on tbe faculty of the School of Elrctrical Enginecr- ing, Purdue University. He bas published over 60 technical papers, 2 original books, and 4 tdited books. His research papprs Lave been published in IEEE Traiisactions, 1EE froctfdings, IEEE Computer, and othcr rcforecd journals. One of bis books has bppn republishtd (in various forms) in ss-veral langunges. He is the editor of tbc IELI2 Press Tutorial on Advanccd Micreproccisors and lligh-Ltvcl Language Computcr Arehitccturc, and (hc coeditor of ibe IEEE Press Tulorial on Adtoneed Topics in Computer Architccture. Ile is the editor and thc conlributing author for l\vo multiauthor books on cotnputcr arcbitecture. His pioncering papcr on Ga.\s computtr arcbitccture for \1>SI has bcen schtduled to appear irt (hp Scp- tembor issue of IELE Cotnputer. Ile prcscntcd ovtr 40 invited Itcturcs in Europc, North, and Latin Amcrira. His currcnt inlcrcsls includc MJSI cotnputcr ftrcbitecture for GaAs, high-!evcl language computcr architccture, and microproccs- sor systcms for AI. His currcnt researcb supporl is equal to about $250k per year, predominantl)' in tbe area of VLSI computer architccture for Ga.^. He has con- sulted for a nuinbcr of high-tech companies, including Intel, Honcywc!l, NASA, RCA, and others. llc is currcotly involved iu the industria! implcmentation of a 32-bit VLSI microprocessor in tbe GaAs lcchnolog)', witb responsibilities in the microarchitecture domain. Hc is a mcmbcr of the IEEE, and is on thc EURONO- CRO Board of Dirertors.