New Heap Management algorithm (Coursera version)

classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view

New Heap Management algorithm (Coursera version)

Coursera part 2 introduces a slight variation of the Heap Management algorithm presented in the book.

The significant changes are

  • The size field in the segment headers has changed to be the size of the segment's data block instead of the size of the segment itself.
  • Segments are always split into two segments when an allocated segment is returned from Memory.alloc().
  • The returned block contains a complete header. (In the book's algorithm, allocated segments have size as a single-word header.)
Also note the trivial change that the order of the link and size fields has been swapped.

Before alloc(5)
freeList next next null
2 8 3
After alloc(5)
freeList next next null
2 1 3
returned block    
* See Allocated block header
  section, below.
Figure 1: New Heap Management algorithm

Memory.alloc() changes

The search loop in Memory.alloc() remains the same—using either first-fit or best-fit, find a segment on the free list whose size is greater than or equal to the requested block size.

In the Book algorithm, once a large enough segment is found, the segment is either used in its entirety, or it is split into two segments. If the entire segment is used, the segment must be removed from the free list. This is a non-trivial operation because the search loop must keep track of two pointers during the linked-list search so that the segment that links to the segment to be removed can have its next pointer updated.

The Coursera algorithm vastly simplifies this by always spliting the found segment and using the newly created segment for the allocated block. The found segment is reduced in size but is never removed from the free list. Another useful effect of always splitting segments for allocation is that the free list will never be empty; the freeList pointer will never be null.

Memory.deAlloc() changes

Memory.deAlloc() is simplified because the size field is already in the correct location and has the correct value. The deallocated segment just needs to be returned to the free list.

Allocated block header

The Coursera algorithm sets the next pointer in the block header to null. An improvement would be to set it to a signature, a known value that is neither null nor a legal pointer value. This will allow dealloc() to check if the block being deallocated is an allocated block. (Double deallocation is a common programming error.)

The simplest signature scheme is to set the next field to an easily recognizable value like -31416.

An alternative is to set the next field to a value derived from the block size. This would help prevent allocated blocks that had their header overwritten from being placed on the free list and corrupting the heap. Setting the block size to not(block size) will form a suitable value that is an illegal address.


The Coursera algorithm is more susceptible to fragmentation than the Book algorithm because it always splits the heap segment that is used for a block allocation. This means that a block that is deallocated will be too small to satisfy a request for another block of the same size. A program that is repeatedly creating and disposing the same object will inevitably run out of memory.


Defragmentation is not required by the course, but may be necessary if you want to run your game using your OS. Here is a fairly easy way to do defragmentation in Memory.deAlloc().
  • Insert the deallocated segment into the free list in ascending segment address order.
  • After insertion, check if either or both the preceding segment or next segment is contiguous with the newly inserted segment; if so, combine the segments.
Another useful property of the Coursera Memory.alloc() algorithm is that because segments are always split and the unallocated portion remains on the free list, the initial segment—the segment at 2048—will always be present on the free list. By maintaining the free list in address order, the first free segment will always be the 2048 segment and there will be no need for a special case to handle insertion at the head of the list.

To find the highest addressed segment that is less than the segment being deallocated, search the list for the first segment that has a next pointer that null or is greater that the deallocated segment's address.

deAlloc(Array o)
    segment = o-2   // Get heap segment from data block address
    prev_seg = free_list
    next_seg =
    while next_seg != null and next_seg < segment
        prev_seg = next_seg
        next_seg =
At this point, prev_seg is the segment before the insertion point and next_seg is the segment after the insertion point. WARNING: next_seg may be null.

Before deAlloc(block)
2048 3000 4000
freeList next next null
2 1 3
* See Allocated block header
  section, above.
Insert into free list
2048 3000 3003 4000
freeList next next next null
2 1 5 3
After defragmentation
2048 3000 4000
freeList next next null
2 8 3
Figure 2: Deallocation with defragmentation

It is simple to insert the deallocated segment into the free list between prev_seg and next_seg = segment = next_seg

There are now three defragmentation cases that need to be handled.

  1. The deallocated segment immediately follows prev_seg.
  2. next_seg immediately follows the deallocated segment.
  3. All three segments are contiguous.
These three cases can be handled with two sequential operations
    // Combine segment with next_seg if contiguous.
    if (segment + segment.blockSize + 2) == next_seg
        segment.blockSize = segment.blockSize + next_seg.blockSize + 2; =;

    // Combine segment with prev_seg if contiguous.
    if (prev_seg + prev_seg.blockSize + 2) == segment
        prev_seg.blockSize = prev_seg.blockSize + segment.blockSize + 2; =;


I wrote a test program that allocates 20 blocks randomly sized 200..600 words. It then runs 10,000 passes randomly deallocating 5 of the blocks then reallocating 5 new blocks. Increasing the block size by 25% (to 250..750) causes an allocation failure.[1]

Block size 400 ± 50%
Heap after 10,000 passes.
   Block size 500 ± 50%
Heap after allocation failure in pass 703.

 2048: free    (17) ->  2524
 2067: alloc   455
 2524: free    (32) ->  3147
 2558: alloc   222
 2782: alloc   363
 3147: free    (71) ->  3465
 3220: alloc   243
 3465: free   (257) ->  4130
 3724: alloc   404
 4130: free   (205) ->  4555
 4337: alloc   216
 4555: free    (60) ->  4870
 4617: alloc   251
 4870: free   (217) ->  5308
 5089: alloc   217
 5308: free    (24) ->  5858
 5334: alloc   522
 5858: free    (81) ->  7122
 5941: alloc   317
 6260: alloc   445
 6707: alloc   413
 7122: free   (316) ->  8062
 7440: alloc   307
 7749: alloc   311
 8062: free    (64) ->  8896
 8128: alloc   381
 8511: alloc   383
 8896: free   (457) ->  9790
 9355: alloc   433
 9790: free   (475) -> 10860
10267: alloc   591
10860: free   (577) -> 11939
11439: alloc   498
11939: free   (981) -> 13394
12922: alloc   470
13394: free  (2962) ->     0
16358: alloc    20
16380: alloc     2

Allocation failure: 742 requested, largest available block 704.
 2048: free    (43) ->  2960
 2093: alloc   515
 2610: alloc   348
 2960: free   (175) ->  4100
 3137: alloc   574
 3713: alloc   385
 4100: free    (74) ->  5243
 4176: alloc   334
 4512: alloc   729
 5243: free   (539) ->  6396
 5784: alloc   610
 6396: free   (161) ->  9199
 6559: alloc   635
 7196: alloc   680
 7878: alloc   731
 8611: alloc   586
 9199: free   (287) -> 10723
 9488: alloc   525
10015: alloc   706
10723: free    (65) -> 11408
10790: alloc   616
11408: free   (452) -> 13714
11862: alloc   699
12563: alloc   607
13172: alloc   540
13714: free   (706) -> 15760
14422: alloc   692
15116: alloc   642
15760: free   (596) ->     0
16358: alloc    20
16380: alloc     2

[1] If I did my math right, the mean and standard deviation for the sum of 20 uniform random integers is:
RangeMeanStd. Dev.+3 S.D.
[200, 600] 8000 517.687 9553.06
[250, 750] 10000 646.787 11940.36
  (Note that the failed test is trying to allocate a total of 11896 words which is about +2.9 sigma.)
Reply | Threaded
Open this post in threaded view

Re: New Heap Management algorithm (Coursera version)

Thank you SO VERY MUCH for posting this. Ever since I heard the dreadful news that we would not be implementing a defragmentation algorithm in the course, I've been banging my head against the wall, trying to figure out how to do it myself! By the way, I must admit that your solution is quite neat. Well done.
Reply | Threaded
Open this post in threaded view

Re: New Heap Management algorithm (Coursera version)

My few cents worth, cadet1620 is a smart person ! (There are a few on this forum )
Reply | Threaded
Open this post in threaded view

Re: New Heap Management algorithm (Coursera version)

In reply to this post by cadet1620
Thank you for the detailed explanation!

I have implemented this for visualization purposes inside the screen memory segment (by using only those 8k in my Memory.jack). Obviously I cannot corrupt my memory now by writing/drawing to the 'screen', so there is no Screen.jack in this.

What you see in the following asm/hack files (run them in the CPU Emulator) is this:
I make an Array with 15 pointer entries and allocate a memory block to each with a 'random' size of up to 500 words.
After this (and looping forever) I deallocate a random one of those 15 blocks and immediately allocate a new one with random size up to 500.

I fill those blocks with the number representing their size (there is also one filling them with random numbers, which looks more aethetically pleasing).

The allocation is done in first-fit fashion (see ...firstfit... asm file) or in best-fit fashion, leading to visually different results.
The deallocation will combine adjacent blocks. To distinguish this visually, a combination with previous will fill the empty space black and with next will fill white.
Side note: Donald Knuth recommends first-fit in TAOCP Vol1 Chapter 2.5, since no method is clearly superior and first-fit is faster.

Note that the random number generator I am using here is a quick and dirty hack of a LCG (x_i+1= (75*x_i)+74). Should be modulo 2^16+1 (which I cannot do quickly), but I use modulo 2^16, which is free on 16bit, and it seems to work reasonably well for the purpose.

I thought the effect was cool enough to post it here and I am happy I actually made it run on CPU Emulator. Enjoy.