Mary Montgomery - Optimized Programming

Memory Management

Wrote my own free and malloc functions as part of a custom memory management system. This system ended up being over 6.5 times faster than the default (Microsoft) memory system. The classes were laid out by my professor, but all code for the malloc and free functions is my own.

Below is a code snippet of my malloc function:

// ----------------------------------------------------
// Malloc
// Find a free block that fits
// Change it to used (may require subdivision)
// Correct the heap Links (used,free) headers
// Update stats
// Return pointer to block
//-----------------------------------------------------
void* Mem::malloc(const uint32_t _size)
{
   // find a free block
   assert(this->poHeap);
   // returnUsed is the Used pointer we will return (points to the start of the block, past the header)
   Used* returnUsed = nullptr;
   // we start the search for a free block to malloc at the start of pNextFit list, NOT pFreeHead
   Free* freeTraverser = this->poHeap->pNextFit;
   // the looping constraint is not nullptr, it's if we start at pNextFit and loop back to pNextFit
   // without finding any free block big enough
   bool fullLoop = false;
   if (freeTraverser)
   {
       while (fullLoop != true)
       {   // we have not fully looped through the pNextFit list
           if (freeTraverser->mAllocSize == _size)
           {   // we just move the whole Free block to be a Used block
               // check if there are other Free blocks that we need to re-order
               removeFreeBlock(freeTraverser);
               // took care of next and prev pointers for the free block, free to remove it
               Used* newUsedBlock = (Used*)freeTraverser;
               // update the data in the newUsedBlock
               updateNewUsed(newUsedBlock);
               // add the Used block to the pUsedHead list
               addUsedBlockToHead(newUsedBlock);
               // we want to return below the header, so we need to add 1 to the pointer
               returnUsed = (this->poHeap->pUsedHead + 1);
               // increment the current number of used blocks and the used memory size
               increaseUsed(_size);
               // decrement the number of free blocks and free memory size
               decreaseFree(_size);
               // get to the bottom of newUsed to make sure bAboveFree is false
               setFalseBAboveFree(newUsedBlock);
               // break out of the loop since we successfully malloc'ed
               break;
           }

           else if (freeTraverser->mAllocSize > _size)
           {
               // store a pointer to the end of the full free block
               Free* endFreeBlock = (Free*)((uint32_t)freeTraverser + (uint32_t)freeTraverser->mAllocSize);
               // store the next and prev pointers of the free block
               Free* ogFreepNext = freeTraverser->pNext;
               Free* ogFreepPrev = freeTraverser->pPrev;
               // check if the free block is pointed to by pFreeHead
               bool isFreeHead = false;
               // set isFreeHead by checking if the original free block was pFreeHead
               checkIfFreeHead(freeTraverser, isFreeHead);
               // create the Used portion of the free block
               Used* newUsed = (Used*)freeTraverser;
               newUsed = new(newUsed) Used(_size);
               // add the used block to the start of the heaps pUsedHead
               addUsedBlockToHead(newUsed);
               // add a Used Block to the count, increase memUsed size by size of the block, not block+header
               increaseUsed(_size);
               // we want to return below the header, so we need to add 1 to the pointer
               returnUsed = (this->poHeap->pUsedHead + 1);

               // find where the Used block ends by going past the header, store in freeStart pointer
               Free* freeStart = nullptr;
               findEndOfUsedBlock(newUsed, freeStart);
               // find the difference in sizes of Free pre-subdivision and Free post-subdivision
               // by subtracting the initial end of the Free block pointer minus the start of
               // the subdivided Free block pointer (casting both pointers to char *s so that we
               // find the difference in bytes)
               uint32_t freeSizeDifference = (uint32_t)((char*)endFreeBlock - (char*)(freeStart));
               // find the full size of the Used block (since with the header it will be
               // larger than _size
               uint32_t usedSizeDifference = (uint32_t)((char*)freeStart - (char*)(newUsed));
               // subract the size of the Used Block + the size of the Used Block's header
               this->poHeap->currFreeMem -= usedSizeDifference;
               // create the subdivided Free block
               Free* newerFree = new(freeStart) Free(freeSizeDifference);
               // update the newerFree data
               updateNewFree(newerFree, ogFreepPrev, ogFreepNext, isFreeHead);
               // set pNextFit to be the newerFree block, since we just inserted into it
               // it should be the next place we look to insert again
               this->poHeap->pNextFit = newerFree;
               // add a secret pointer to the end of this free block
               adjustSecretPointer(newerFree);
               // break out of the loop since we successfull malloc'ed
               break;
           }
           // this free block (freeTraverser) could not fit the desired malloc, so go to the next one
           freeTraverser = freeTraverser->pNext;
           // check if we have made a fullLoop through the pNextFitList
           checkpNextFitList(freeTraverser, fullLoop);
       }
   }
   return returnUsed;
}

Look at the full memory management code on my github: https://github.com/marymonty/optimized_cpp/tree/main/Homework/Memory_Management

Check out the rest of my optimized c++ repo to see my most recent C++ code style and skill: https://github.com/marymonty/optimized_cpp

SIMD

Utilizing Single Instruction / Multiple Data operations as well as proxies, I optimized a project containing arithmetic such as 4x4 matrix multiplication, vector multiplications, and LERPs. Re-implementing with SIMD yielded speed improvements such as 2.4 times faster matrix multiplication, 3.7 times faster LERP, and 4.3 times faster column major multiple matrix multiplication. All basic arithmetic code was written by a professor, I received the code base and performed all SIMD and proxy optimizations to significantly improve the code base.

Below is a snippet of the Matrix_Col_SIMD.h file where I started to create structs for proxies for the multiple matrix multiplication. The proxies are just a way to delay the matrix multiplication by creating these structs, so it gives us a big time boost:

    // MM
   //       Matrix_Col_SIMD * Matrix_Col_SIMD
   //
   struct MM
   {
       const Matrix_Col_SIMD& M1;
       const Matrix_Col_SIMD& M2;

MM(const Matrix_Col_SIMD& t1, const Matrix_Col_SIMD& t2)
: M1(t1), M2(t2) {};

       //conversion operator implemented in cpp
       operator Matrix_Col_SIMD();
   };

    friend inline MM operator * (const Matrix_Col_SIMD& m1, const Matrix_Col_SIMD& m2)
   {
       return MM(m1, m2);
   };

   // MMM
   //       MM * Matrix_Col_SIMD
   //
   struct MMM
   {
       const Matrix_Col_SIMD& M1;
       const Matrix_Col_SIMD& M2;
       const Matrix_Col_SIMD& M3;

MMM(const MM& t1, const Matrix_Col_SIMD& t2)
: M1(t1.M1), M2(t1.M2), M3(t2) {};

       //conversion operator implemented in cpp
       operator Matrix_Col_SIMD();
   };

    friend inline MMM operator * (const MM& m1, const Matrix_Col_SIMD& m2)
   {
       return MMM(m1, m2);
   };

Look at the full SIMD code on my github: https://github.com/marymonty/optimized_cpp/tree/main/Homework/SIMD