The problem with pointers

There are 2 major problems with pointers in C: You don't know if they point to anything, and you don't know how many elements they point to. A bit of a bold claim, but what I mean by "you don't know if they point to anything" is that NULL is a valid value for any pointer - the so-called "Billion dollar mistake".  There is no syntactic way to enforce that a pointer does not contain NULL, so any function that uses pointers must either manually check for it or rely on higher-level guarantees that NULL pointers will not be passed in.

"You don't know how many elements they point to" means that, syntactically, a pointer to one thing looks exactly the same as a pointer to many things - they're both {% c-line %}T*{% c-line-end %} for a given type {% c-line %}T{% c-line-end %}. Knowing how many elements are pointed to must be handled manually, either via social convention (such as sentinel values at the end of an array or predefined maximum sizes à la {% c-line %}FILENAME_MAX{% c-line-end %}), or by manually passing sizes around with pointers. And if the past 50 years of software engineering has taught us anything, it's that relying on developers to manually check things does not work at scale.

So how does Zig improve this situation?

First, let's talk about {% c-line %}NULL{% c-line-end %} pointers. Zig has a concept of "optional types", which means a type that *optionally* holds a value. The syntax for this is {% c-line %}?T{% c-line-end %} for a given type {% c-line %}T{% c-line-end %}. For example:

{% c-block language="zig" %}
var foo: u32 = 0;  // `foo` can be any 32-bit unsigned integer
var bar: ?u32 = 0; // `bar` can be any 32-bit unsigned integer, *or* it can contain "no value"
{% c-block-end %}


Conveniently, Zig uses the keyword {% c-line %}null{% c-line-end %} to represent the "no value" case. So in the example above, {% c-line %}bar = null;{% c-line-end %} would be a valid assignment, even though {% c-line %}bar{% c-line-end %} is not a pointer. If the underlying type is a pointer, the declaration looks like this:

{% c-block language="zig" %}
var maybe_pointer: ?*u32 = null;
{% c-block-end %}

So how does this help us? The Zig type system doesn't let us use {% c-line %}maybe_pointer{% c-line-end %} as a pointer unless we validate that it was not null first. Consider the following program:

{% c-block language="zig" %}
fn foo(arg: *u32) u32 {
   return arg.*;
}
pub fn main() void {
   var x: u32 = 42;
   var y: ?*u32 = &x;
   var z: u32 = foo(y);
}
{% c-block-end %}

If we try to run it, we get a compile error:

{% c-block language="console" %}
➜  zig run pointer.zig
./pointer.zig:7:22: error: expected type '*u32', found '?*u32'
   var z: u32 = foo(y);
                    ^
./pointer.zig:7:22: note: '?*u32' could have null values which are illegal in type '*u32'
   var z: u32 = foo(y);
{% c-block-end %}

There are two ways to turn optional pointers back into regular ones: {% c-line %}if{% c-line-end %} statements, and the {% c-line %}.?{% c-line-end %} (unwrap) operator:

{% c-block language="zig" %}
fn foo(arg: *u32) u32 {
   return arg.*;
}
pub fn main() void {
   var x: u32 = 42;
   var y: ?*u32 = &x;
   var z: u32 = undefined;
   var z: ?u32 = if (y) |ptr| foo(ptr) else null;
}
{% c-block-end %}

When an optional value is used as the condition of an if statement ({% c-line %}if (y){% c-line-end %} above), then the unwrapped, non-optional value can be captured using the {% c-line %}|variable|{% c-line-end %} syntax, where {% c-line %}variable{% c-line-end %} is the name of the variable that will hold the unwrapped value. The {% c-line %}else{% c-line-end %} branch will be executed if {% c-line %}y{% c-line-end %} is in fact {% c-line %}null{% c-line-end %}.

Alternatively, there is the {% c-line %}.?{% c-line-end %} operator. This is syntactic sugar for {% c-line %}orelse unreachable{% c-line-end %}. In other words, it unwraps the optional pointer, and invokes safety-checked undefined behavior if it is actually null:

{% c-block language="zig" %}
fn foo(arg: *u32) u32 {
   return arg.*;
}
pub fn main() void {
   var x: u32 = 42;
   var y: ?*u32 = null;
   var z: u32 = foo(y.?);
}
{% c-block-end %}

{% c-block language="console" %}
➜  ~ zig run ptr.zig
thread 1297377 panic: attempt to use null value
/Users/ehaas/ptr.zig:7:23: 0x10ed7e4fd in main (ptr)
   var z: u32 = foo(y.?);
                     ^
/Users/ehaas/source/zig/lib/std/start.zig:410:22: 0x10ed806ec in std.start.callMain (ptr)
           root.main();
                    ^
/Users/ehaas/source/zig/lib/std/start.zig:362:12: 0x10ed7e6d7 in std.start.callMainWithArgs (ptr)
   return @call(.{ .modifier = .always_inline }, callMain, .{});
          ^
/Users/ehaas/source/zig/lib/std/start.zig:332:12: 0x10ed7e615 in std.start.main (ptr)
   return @call(.{ .modifier = .always_inline }, callMainWithArgs, .{ @intCast(usize, c_argc), c_argv, envp });
          ^
???:?:?: 0x7fff20329f3c in ??? (???)
???:?:?: 0x0 in ??? (???)
[1]    55260 abort      zig run ptr.zig
{% c-block-end %}

As you can see, we get a nice panic trace which shows us where exactly the illegal behavior occurred. This is better than a random segfault, but ideally we don't want our program to crash at all. Thus, {% c-line %}.?{% c-line-end %} should only be used when you are absolutely sure that an optional pointer cannot contain {% c-line %}null{% c-line-end %}.

That about covers {% c-line %}null{% c-line-end %} pointers in Zig. In our next post we'll talk about single-item vs many-item pointers.

Want to stay up to date on the future of firmware? Join our mailing list.

Section
Chapter
Published