Memory Management
Wrote my own free and malloc functions as part of a custom memory management system. This system ended up being over 6.5 times faster than the default (Microsoft) memory system. The classes were laid out by my professor, but all code for the malloc and free functions is my own.
Below is a code snippet of my malloc function:
// ----------------------------------------------------
// Malloc
// Find a free block that fits
// Change it to used (may require subdivision)
// Correct the heap Links (used,free) headers
// Update stats
// Return pointer to block
//-----------------------------------------------------
void* Mem::malloc(const uint32_t _size)
{
// find a free block
assert(this->poHeap);
// returnUsed is the Used pointer we will return (points to the start of the block, past the header)
Used* returnUsed = nullptr;
// we start the search for a free block to malloc at the start of pNextFit list, NOT pFreeHead
Free* freeTraverser = this->poHeap->pNextFit;
// the looping constraint is not nullptr, it's if we start at pNextFit and loop back to pNextFit
// without finding any free block big enough
bool fullLoop = false;
if (freeTraverser)
{
while (fullLoop != true)
{ // we have not fully looped through the pNextFit list
if (freeTraverser->mAllocSize == _size)
{ // we just move the whole Free block to be a Used block
// check if there are other Free blocks that we need to re-order
removeFreeBlock(freeTraverser);
// took care of next and prev pointers for the free block, free to remove it
Used* newUsedBlock = (Used*)freeTraverser;
// update the data in the newUsedBlock
updateNewUsed(newUsedBlock);
// add the Used block to the pUsedHead list
addUsedBlockToHead(newUsedBlock);
// we want to return below the header, so we need to add 1 to the pointer
returnUsed = (this->poHeap->pUsedHead + 1);
// increment the current number of used blocks and the used memory size
increaseUsed(_size);
// decrement the number of free blocks and free memory size
decreaseFree(_size);
// get to the bottom of newUsed to make sure bAboveFree is false
setFalseBAboveFree(newUsedBlock);
// break out of the loop since we successfully malloc'ed
break;
}
// Malloc
// Find a free block that fits
// Change it to used (may require subdivision)
// Correct the heap Links (used,free) headers
// Update stats
// Return pointer to block
//-----------------------------------------------------
void* Mem::malloc(const uint32_t _size)
{
// find a free block
assert(this->poHeap);
// returnUsed is the Used pointer we will return (points to the start of the block, past the header)
Used* returnUsed = nullptr;
// we start the search for a free block to malloc at the start of pNextFit list, NOT pFreeHead
Free* freeTraverser = this->poHeap->pNextFit;
// the looping constraint is not nullptr, it's if we start at pNextFit and loop back to pNextFit
// without finding any free block big enough
bool fullLoop = false;
if (freeTraverser)
{
while (fullLoop != true)
{ // we have not fully looped through the pNextFit list
if (freeTraverser->mAllocSize == _size)
{ // we just move the whole Free block to be a Used block
// check if there are other Free blocks that we need to re-order
removeFreeBlock(freeTraverser);
// took care of next and prev pointers for the free block, free to remove it
Used* newUsedBlock = (Used*)freeTraverser;
// update the data in the newUsedBlock
updateNewUsed(newUsedBlock);
// add the Used block to the pUsedHead list
addUsedBlockToHead(newUsedBlock);
// we want to return below the header, so we need to add 1 to the pointer
returnUsed = (this->poHeap->pUsedHead + 1);
// increment the current number of used blocks and the used memory size
increaseUsed(_size);
// decrement the number of free blocks and free memory size
decreaseFree(_size);
// get to the bottom of newUsed to make sure bAboveFree is false
setFalseBAboveFree(newUsedBlock);
// break out of the loop since we successfully malloc'ed
break;
}
else if (freeTraverser->mAllocSize > _size)
{
// store a pointer to the end of the full free block
Free* endFreeBlock = (Free*)((uint32_t)freeTraverser + (uint32_t)freeTraverser->mAllocSize);
// store the next and prev pointers of the free block
Free* ogFreepNext = freeTraverser->pNext;
Free* ogFreepPrev = freeTraverser->pPrev;
// check if the free block is pointed to by pFreeHead
bool isFreeHead = false;
// set isFreeHead by checking if the original free block was pFreeHead
checkIfFreeHead(freeTraverser, isFreeHead);
// create the Used portion of the free block
Used* newUsed = (Used*)freeTraverser;
newUsed = new(newUsed) Used(_size);
// add the used block to the start of the heaps pUsedHead
addUsedBlockToHead(newUsed);
// add a Used Block to the count, increase memUsed size by size of the block, not block+header
increaseUsed(_size);
// we want to return below the header, so we need to add 1 to the pointer
returnUsed = (this->poHeap->pUsedHead + 1);
{
// store a pointer to the end of the full free block
Free* endFreeBlock = (Free*)((uint32_t)freeTraverser + (uint32_t)freeTraverser->mAllocSize);
// store the next and prev pointers of the free block
Free* ogFreepNext = freeTraverser->pNext;
Free* ogFreepPrev = freeTraverser->pPrev;
// check if the free block is pointed to by pFreeHead
bool isFreeHead = false;
// set isFreeHead by checking if the original free block was pFreeHead
checkIfFreeHead(freeTraverser, isFreeHead);
// create the Used portion of the free block
Used* newUsed = (Used*)freeTraverser;
newUsed = new(newUsed) Used(_size);
// add the used block to the start of the heaps pUsedHead
addUsedBlockToHead(newUsed);
// add a Used Block to the count, increase memUsed size by size of the block, not block+header
increaseUsed(_size);
// we want to return below the header, so we need to add 1 to the pointer
returnUsed = (this->poHeap->pUsedHead + 1);
// find where the Used block ends by going past the header, store in freeStart pointer
Free* freeStart = nullptr;
findEndOfUsedBlock(newUsed, freeStart);
// find the difference in sizes of Free pre-subdivision and Free post-subdivision
// by subtracting the initial end of the Free block pointer minus the start of
// the subdivided Free block pointer (casting both pointers to char *s so that we
// find the difference in bytes)
uint32_t freeSizeDifference = (uint32_t)((char*)endFreeBlock - (char*)(freeStart));
// find the full size of the Used block (since with the header it will be
// larger than _size
uint32_t usedSizeDifference = (uint32_t)((char*)freeStart - (char*)(newUsed));
// subract the size of the Used Block + the size of the Used Block's header
this->poHeap->currFreeMem -= usedSizeDifference;
// create the subdivided Free block
Free* newerFree = new(freeStart) Free(freeSizeDifference);
// update the newerFree data
updateNewFree(newerFree, ogFreepPrev, ogFreepNext, isFreeHead);
// set pNextFit to be the newerFree block, since we just inserted into it
// it should be the next place we look to insert again
this->poHeap->pNextFit = newerFree;
// add a secret pointer to the end of this free block
adjustSecretPointer(newerFree);
// break out of the loop since we successfull malloc'ed
break;
}
// this free block (freeTraverser) could not fit the desired malloc, so go to the next one
freeTraverser = freeTraverser->pNext;
// check if we have made a fullLoop through the pNextFitList
checkpNextFitList(freeTraverser, fullLoop);
}
}
return returnUsed;
}
Free* freeStart = nullptr;
findEndOfUsedBlock(newUsed, freeStart);
// find the difference in sizes of Free pre-subdivision and Free post-subdivision
// by subtracting the initial end of the Free block pointer minus the start of
// the subdivided Free block pointer (casting both pointers to char *s so that we
// find the difference in bytes)
uint32_t freeSizeDifference = (uint32_t)((char*)endFreeBlock - (char*)(freeStart));
// find the full size of the Used block (since with the header it will be
// larger than _size
uint32_t usedSizeDifference = (uint32_t)((char*)freeStart - (char*)(newUsed));
// subract the size of the Used Block + the size of the Used Block's header
this->poHeap->currFreeMem -= usedSizeDifference;
// create the subdivided Free block
Free* newerFree = new(freeStart) Free(freeSizeDifference);
// update the newerFree data
updateNewFree(newerFree, ogFreepPrev, ogFreepNext, isFreeHead);
// set pNextFit to be the newerFree block, since we just inserted into it
// it should be the next place we look to insert again
this->poHeap->pNextFit = newerFree;
// add a secret pointer to the end of this free block
adjustSecretPointer(newerFree);
// break out of the loop since we successfull malloc'ed
break;
}
// this free block (freeTraverser) could not fit the desired malloc, so go to the next one
freeTraverser = freeTraverser->pNext;
// check if we have made a fullLoop through the pNextFitList
checkpNextFitList(freeTraverser, fullLoop);
}
}
return returnUsed;
}
Look at the full memory management code on my github: https://github.com/marymonty/optimized_cpp/tree/main/Homework/Memory_Management
Check out the rest of my optimized c++ repo to see my most recent C++ code style and skill: https://github.com/marymonty/optimized_cpp
SIMD
Utilizing Single Instruction / Multiple Data operations as well as proxies, I optimized a project containing arithmetic such as 4x4 matrix multiplication, vector multiplications, and LERPs. Re-implementing with SIMD yielded speed improvements such as 2.4 times faster matrix multiplication, 3.7 times faster LERP, and 4.3 times faster column major multiple matrix multiplication. All basic arithmetic code was written by a professor, I received the code base and performed all SIMD and proxy optimizations to significantly improve the code base.
Below is a snippet of the Matrix_Col_SIMD.h file where I started to create structs for proxies for the multiple matrix multiplication. The proxies are just a way to delay the matrix multiplication by creating these structs, so it gives us a big time boost:
// MM
// Matrix_Col_SIMD * Matrix_Col_SIMD
//
struct MM
{
const Matrix_Col_SIMD& M1;
const Matrix_Col_SIMD& M2;
// Matrix_Col_SIMD * Matrix_Col_SIMD
//
struct MM
{
const Matrix_Col_SIMD& M1;
const Matrix_Col_SIMD& M2;
MM(const Matrix_Col_SIMD& t1, const Matrix_Col_SIMD& t2)
: M1(t1), M2(t2) {};
: M1(t1), M2(t2) {};
//conversion operator implemented in cpp
operator Matrix_Col_SIMD();
};
operator Matrix_Col_SIMD();
};
friend inline MM operator * (const Matrix_Col_SIMD& m1, const Matrix_Col_SIMD& m2)
{
return MM(m1, m2);
};
{
return MM(m1, m2);
};
// MMM
// MM * Matrix_Col_SIMD
//
struct MMM
{
const Matrix_Col_SIMD& M1;
const Matrix_Col_SIMD& M2;
const Matrix_Col_SIMD& M3;
// MM * Matrix_Col_SIMD
//
struct MMM
{
const Matrix_Col_SIMD& M1;
const Matrix_Col_SIMD& M2;
const Matrix_Col_SIMD& M3;
MMM(const MM& t1, const Matrix_Col_SIMD& t2)
: M1(t1.M1), M2(t1.M2), M3(t2) {};
: M1(t1.M1), M2(t1.M2), M3(t2) {};
//conversion operator implemented in cpp
operator Matrix_Col_SIMD();
};
operator Matrix_Col_SIMD();
};
friend inline MMM operator * (const MM& m1, const Matrix_Col_SIMD& m2)
{
return MMM(m1, m2);
};
{
return MMM(m1, m2);
};
Look at the full SIMD code on my github: https://github.com/marymonty/optimized_cpp/tree/main/Homework/SIMD