Memory Management
Wrote my own free and malloc functions as part of a custom memory management system. This system ended up being over 6.5 times faster than the default (Microsoft) memory system. The classes were laid out by my professor, but all code for the malloc and free functions is my own.
Below is a code snippet of my malloc function:
// ----------------------------------------------------
// Malloc
//  Find a free block that fits
//  Change it to used (may require subdivision)
//  Correct the heap Links (used,free) headers
//  Update stats
//  Return pointer to block
//-----------------------------------------------------
void* Mem::malloc(const uint32_t _size)
{
    // find a free block
    assert(this->poHeap);
    // returnUsed is the Used pointer we will return (points to the start of the block, past the header)
    Used* returnUsed = nullptr;
    // we start the search for a free block to malloc at the start of pNextFit list, NOT pFreeHead
    Free* freeTraverser = this->poHeap->pNextFit;
    // the looping constraint is not nullptr, it's if we start at pNextFit and loop back to pNextFit
    // without finding any free block big enough
    bool fullLoop = false;
    if (freeTraverser)
    {
        while (fullLoop != true)
        {    // we have not fully looped through the pNextFit list
            if (freeTraverser->mAllocSize == _size)
            {    // we just move the whole Free block to be a Used block
                // check if there are other Free blocks that we need to re-order
                removeFreeBlock(freeTraverser);
                // took care of next and prev pointers for the free block, free to remove it
                Used* newUsedBlock = (Used*)freeTraverser;
                // update the data in the newUsedBlock
                updateNewUsed(newUsedBlock);
                // add the Used block to the pUsedHead list
                addUsedBlockToHead(newUsedBlock);
                // we want to return below the header, so we need to add 1 to the pointer
                returnUsed = (this->poHeap->pUsedHead + 1);
                // increment the current number of used blocks and the used memory size
                increaseUsed(_size);
                // decrement the number of free blocks and free memory size
                decreaseFree(_size);
                // get to the bottom of newUsed to make sure bAboveFree is false
                setFalseBAboveFree(newUsedBlock);
                // break out of the loop since we successfully malloc'ed
                break;
            }
            else if (freeTraverser->mAllocSize > _size)
            {
                // store a pointer to the end of the full free block
                Free* endFreeBlock = (Free*)((uint32_t)freeTraverser + (uint32_t)freeTraverser->mAllocSize);
                // store the next and prev pointers of the free block
                Free* ogFreepNext = freeTraverser->pNext;
                Free* ogFreepPrev = freeTraverser->pPrev;
                // check if the free block is pointed to by pFreeHead
                bool isFreeHead = false;
                // set isFreeHead by checking if the original free block was pFreeHead
                checkIfFreeHead(freeTraverser, isFreeHead);
                // create the Used portion of the free block
                Used* newUsed = (Used*)freeTraverser;
                newUsed = new(newUsed) Used(_size);
                // add the used block to the start of the heaps pUsedHead
                addUsedBlockToHead(newUsed);
                // add a Used Block to the count, increase memUsed size by size of the block, not block+header
                increaseUsed(_size);
                // we want to return below the header, so we need to add 1 to the pointer
                returnUsed = (this->poHeap->pUsedHead + 1);
                // find where the Used block ends by going past the header, store in freeStart pointer
                Free* freeStart = nullptr;
                findEndOfUsedBlock(newUsed, freeStart);
                // find the difference in sizes of Free pre-subdivision and Free post-subdivision
                // by subtracting the initial end of the Free block pointer minus the start of
                // the subdivided Free block pointer (casting both pointers to char *s so that we
                // find the difference in bytes)
                uint32_t freeSizeDifference = (uint32_t)((char*)endFreeBlock - (char*)(freeStart));
                // find the full size of the Used block (since with the header it will be
                // larger than _size
                uint32_t usedSizeDifference = (uint32_t)((char*)freeStart - (char*)(newUsed));
                // subract the size of the Used Block + the size of the Used Block's header
                this->poHeap->currFreeMem -= usedSizeDifference;
                // create the subdivided Free block
                Free* newerFree = new(freeStart) Free(freeSizeDifference);
                // update the newerFree data
                updateNewFree(newerFree, ogFreepPrev, ogFreepNext, isFreeHead);
                // set pNextFit to be the newerFree block, since we just inserted into it
                // it should be the next place we look to insert again
                this->poHeap->pNextFit = newerFree;
                // add a secret pointer to the end of this free block
                adjustSecretPointer(newerFree);
                // break out of the loop since we successfull malloc'ed
                break;
            }
            // this free block (freeTraverser) could not fit the desired malloc, so go to the next one
            freeTraverser = freeTraverser->pNext;
            // check if we have made a fullLoop through the pNextFitList
            checkpNextFitList(freeTraverser, fullLoop);
        }
    }
    return returnUsed;
}

Look at the full memory management code on my github: https://github.com/marymonty/optimized_cpp/tree/main/Homework/Memory_Management
Check out the rest of my optimized c++ repo to see my most recent C++ code style and skill: https://github.com/marymonty/optimized_cpp
SIMD
Utilizing Single Instruction / Multiple Data operations as well as proxies, I optimized a project containing arithmetic such as 4x4 matrix multiplication, vector multiplications, and LERPs. Re-implementing with SIMD yielded speed improvements such as 2.4 times faster matrix multiplication, 3.7 times faster LERP, and 4.3 times faster column major multiple matrix multiplication. All basic arithmetic code was written by a professor, I received the code base and performed all SIMD and proxy optimizations to significantly improve the code base.

Below is a snippet of the Matrix_Col_SIMD.h file where I started to create structs for proxies for the multiple matrix multiplication. The proxies are just a way to delay the matrix multiplication by creating these structs, so it gives us a big time boost:
    // MM
    //        Matrix_Col_SIMD * Matrix_Col_SIMD
    //
    struct MM
    {
        const Matrix_Col_SIMD& M1;
        const Matrix_Col_SIMD& M2;
        MM(const Matrix_Col_SIMD& t1, const Matrix_Col_SIMD& t2)
            : M1(t1), M2(t2) {};

        //conversion operator implemented in cpp
        operator Matrix_Col_SIMD();
    };

    friend inline MM operator * (const Matrix_Col_SIMD& m1, const Matrix_Col_SIMD& m2)
    {
        return MM(m1, m2);
    };

    // MMM
    //        MM * Matrix_Col_SIMD
    //
    struct MMM
    {
        const Matrix_Col_SIMD& M1;
        const Matrix_Col_SIMD& M2;
        const Matrix_Col_SIMD& M3;
        MMM(const MM& t1, const Matrix_Col_SIMD& t2)
            : M1(t1.M1), M2(t1.M2), M3(t2) {};
        //conversion operator implemented in cpp
        operator Matrix_Col_SIMD();
    };
    friend inline MMM operator * (const MM& m1, const Matrix_Col_SIMD& m2)
    {
        return MMM(m1, m2);
    };
Back to Top