Code Whisperer

2025-08-10


1. Canary Values in C/C++: Catching the Memory Scribblers

In low-level C/C++ programming, memory corruption is like silent killer. You don’t see it, you don’t hear it, but one day your app crashes in production and you wasted whole weekend debugging. One of the ways to detect such bugs is using canary values. Canary is special value placed in memory, usually between buffer and sensitive data like return adress or heap meta. Idea is simple: if someone overwrite the buffer and modify the canary, we know something went wrong. Like (coal) miners used real canary birds to detect toxic gas, we can use memory canary to detect toxic writes.

char buffer[64];
uint32_t canary = 0x12344321;

Before function return, we check if canary still equals 0x12344321. If not, someone scribbled over it. Boom. We catch the bug.

Where it’s Used

  • Stack protection: GCC and Clang have -fstack-protector which insert canary between local buffer and saved frame pointer.
  • Heap debugging: Tools like Valgrind or AddressSanitizer use redzones and canaries around allocations.
  • Custom allocators: You can implement own malloc/free with canary bytes before and after user block.

Limitations

Canary values are designed to detect write overflows, but they do not catch every type of memory issue. For example, if a program reads out of bounds memory, the canary remains unchanged and the problem may go unnoticed. Also, if the canary value i sstatic and known, accidental overwrites that happen to match original value won’t trigger detection. This can lead to false negatives during debugging process. To improve reliability, some tools and compilers use randomized / varying canary values to reduce the chance of silent corruption.

Example

auto vulnerable() -> void {
    char buf[16];
    uint32_t canary = 0x12344321;

    gets(buf); // unsafe!

    if (canary != 0x12344321) {
        printf("Memory corruption detected!\n");
        exit(1);
    }
}

This is naive example, but shows the idea.

Final Thoughts

Canary values are not a complete solution, but they provide a simple and effective method for detecting memory corruption. They are especially useful in low level C/C++ code where buffer overflows and stack violations can lead to unpredictable behavior. While canaries cannot prevent all types of bugs, they help identify overwrite issues early in the execution process. For developers concernd with application stability and reliability, implementing canary checks is a practical and low cost measure worth considering.


2. Optimizing with Fast Integer Types in C/C++

When writing C or C++ code, sometimes ;) we want to make it faster. One way is to use the right types for variables. Especially for integers, choosing the best type can help performance, specially on embedded systems or low level code.

What are fast integer types?

C++ (and C since C99) gives us some special types in stdint.h or cstdint headers. These types are designed to be fastest for the platform.

Examples:

#include <stdint.h>

int_fast8_t  a;  // fastest type that can hold at least 8 bits
int_fast16_t b;  // fastest type for 16 bits
int_fast32_t c;  // fastest type for 32 bits
int_fast64_t d;  // fastest type for 64 bits

These types are not always exactly 8, 16, 32 or 64 bits. They are at least that size - but compiler choose the fastest type that is big enough. On ARM64, this usually maps like:

Fast Type Likely Actual Type
int_fast8_t int (32-bit)
int_fast16_t int (32-bit)
int_fast32_t int (32-bit)
int_fast64_t long or long long

Even if you ask for fast 8-bit, the compiler may use 32-bit because ARM64 prefers aligned 32-bit access.

Why use them?

Using int_fastXX_t types can help:

  • make code run faster on some platforms
  • avoid slow memory access
  • improve performance in tight loops / embedded code

But it’s not always needed. For most desktop apps, normal int or int32_t is fine enough. Fast types are more useful when you care about performance and portability.

Example: loop optimization

Let’s say we have a loop that runs many times:

#include <stdint.h>

auto process() -> void {
    int_fast16_t i;
    for (i = 0; i < 10000; i++) {
        // do something
    }
}

Here, int_fast16_t might be faster than int16_t beacuse compiler can choose better type for the CPU.

Compare with fixed-width types

Fixed-width types like int8_t, int16_t, int32_t are exact size, which is good for binary formats or hardware registers.

Fast types are better when you don’t care about exact size, but want speed.

What about unsigned types?

Same idea works for unsigned types:

uint_fast8_t  ua;
uint_fast32_t ub;

Use unsigned when you don’t need negtive numbers. Sometimes compiler can optimize better with unsigned.

Conclusions and … Pitfalls

Fast integer types are nice, but they are not always perfect. There are some traps you should know:

Not always same behavior as small types

If you use uint8_t, operations are done modulo 256. So 255 + 1 becomes 0.

But if you use uint_fast8_t, and compiler choose unsigned int (32-bit), then operations are modulo 2^32. So 255 + 1 becomes 256.

Example:

#include <stdint.h>
#include <stdio.h>

auto foo() -> void {
    uint_fast8_t a = 0xff;  // 255
    ++a;
    printf("%x\n", a);      // prints 100 (hex for 256)
}

If you expect wraparound at 255, this can break your logic.

Manual masking needed

To simulate 8 bit behavior with fast type, you must mask the data manually:

a = (a + 1) & 0xff;

But this adds extra instruction, and maybe slow down performance. So now you must choose:

  • Use real 8 bit type (uint8_t) — maybe slower on some platforms
  • Use fast type (uint_fast8_t) — but mask bits manually

It’s hard to say which is faster. Depends on CPU, compiler (type, version…), and how often you do masking.

Optimizing with types is not always portable

If you optimize your code using fast types, it may behave different on other platforms. For example:

  • On x86, uint_fast8_t might be 8-bit
  • On ARM64, uint_fast8_t is usually 32-bit

So logic based on overflow or wraparound may break.

Debugging becomes tricky

When you expect 8-bit overflow and it doesn’t happen, bugs are damn hard to find. Especially if you mix fast types with fixedwidth ones.

Final thoughts

Fast integer types are small trick, but can help in some cases. Especially in embedded programming, game engines, or performnce critical code.

If your code depends on exact overflow behavior (like wrap-around at 255), use fixed-width types like uint8_t. If you want speed and don’t care about exact size or overflow, fast types are okay.

Just remember:

  • Use fast types when you want speed and don’t care about exact size
  • Use fixed width types when you need predictable size and overflow
  • Don’t mix them blindly - behavior may change across platforms
  • Always test and benchmark your code
  • Check results on real hardware, not only in simulator or desktop build

3. Constexpr and Function Templates with Value Parameters

In modern C++ we can use constexpr and templates to make code faster. Especially when we know some values at compile time, we can use them to remove branches and make code smaller and more optimized.

Compile-Time Value Selection

When we write code like this:

template<int n>
auto  foo() -> void {
    if constexpr (n == 1) {
        // code for case 1
    } else if constexpr (n == 2) {
        // code for case 2
    } else {
        // default code
    }
}

This is not normal if-ology. It is compile time check. Compiler knows value of n and will keep only the correct branch. Other code is removed during compilation.

This is good for performance because:

  • No runtime branching
  • No if / switch at runtime
  • Code is smaller and faster

Example

template<int bank>
void setBank(int data) {
    if constexpr (bank == 0) {
        dataBankA |= (1 << data);
    } else if constexpr (bank == 1) {
        dataBankB |= (1 << data);
    } else {
        static_assert(bank < 8, "Invalid bank");
    }
}

When you call setBank<1> (3), compiler will generate only code for dataBankB |= (1 << data);. No other branches are included.

Compare with Runtime Switch

void setBank(int bank, int data) {
    switch (bank) {
        case 0: dataBankA |= (1 << data); break;
        case 1: dataBankB |= (1 << data); break;
        default: break;
    }
}

This version is slower because it needs runtime check. Compiler cannot remove unused cases unless bank is constant.

Notes and Cautions

  • if constexpr reqs C++17 or newer
  • template parameter must be known at compile time
  • each value makes new function — can increase code size if there’stoo many variants
  • works best when number of cases is small

Mixed Logic Example

template<int N>
void process() {
    static_assert(N >= 0 && N <= 3, "Unsupported value");

    if constexpr (N == 0) {
        // the fastest path
    } else if constexpr (N == 1 || N == 2) {
        // shared logic
    } else {
        // fallback
    }
}

This way we can group cases and still keep compile-time optimization.

Code Size Consideration

When you use function templates with value parameter like template <int n>, each call with different n creates separate version of function. This is good for speed, because compiler can optimize each version very well. But it also means theer’s more code in final binary.

For example:

foo<1>();
foo<2>();
foo<3>();

This will generate three different function bodies. Each one is fast and branchless, but total size of code will grow. In embedded systems with limited memory, this could be problem.

So you must balance between speed and size. If you have many values, maybe better to use runtime switch or shared logic.

Final thoughts

Using constexpr with template value parameters is good way to make code faster and cleaner. It helps remove branches and generate only required code. For embedded systems, this is very useful because we want small and predictable code.

But be careful with too many template instantiations — it can make binary bigger. Also, alwaysdo tests on real hardware to see if performance is really better.

© 2025 Tomasz Slanina. All rights reserved.