SoFunction
Updated on 2025-04-07

C# writes a sscanf using interpolation string processor

Preface

What? Use C# interpolation string processor to write an inputsscanf? You're sure it's not for outputsprintf

I guess many readers will probably have the above thoughts after seeing the title. However, we are really doing it here.sscanf, notsprintf

Interpolation string processor

C# has a feature called interpolation string. Using interpolation strings, you can naturally insert the value of a variable into the string, such as:$"abc{x}def", This change has been passed in the pastThe way to format strings is to no longer need to pass a string template first and then pass parameters one by one, which is very convenient.

Going further on the basis of interpolated strings, C# supports interpolated string processors, which means you can customize the interpolation behavior of strings. For example, a simple example:

[InterpolatedStringHandler]
struct Handler(int literalLength, int formattedCount)
{
    public void AppendLiteral(string s)
    {
        ($"Literal: '{s}'");
    }

    public void AppendFormatted<T>(T v)
    {
        ($"Value: '{v}'");
    }
}

When using it, you only need to pass itstringAll parameters are changed to this oneHandlerTypes can handle interpolated strings in the way you customize. Our interpolated strings will be automatically transformed into   by the C# compiler.HandlerThe constructs and calls of   are then passed in:

void Foo(Handler handler) { }
var x = 42;
Foo($"abc{x}def");

For example, in the above example, you will get the output:

Literal: 'abc'
Value: '42'
Literal: 'def'

This greatly facilitates the processing of various structured log frameworks. You only need to simply pass the interpolated strings in. The log framework can perform structured parsing according to the way you interpolate, thus completely avoiding manual formatting of strings.

Interpolation string processor with parameters

In fact, the interpolation string processor of C# also supports additional parameters:

[InterpolatedStringHandler]
struct Handler(int literalLength, int formattedCount, int value)
{
    public void AppendLiteral(string s)
    {
        ($"Literal: '{s}'");
    }

    public void AppendFormatted<T>(T v)
    {
        ($"Value: '{v}'");
    }
}

void Foo(int value, [InterpolatedStringHandlerArgument("value")] Handler handler) { }
Foo(42, $"abc{x}def");

So,42Will be transmittedhandlerofvalueAmong the parameters, this allows us to capture the context from the caller. After all, in the log scenario, it is common to determine different formats based on different parameters.

sscanf?

As we all know, there is a very commonly used function in C/C++.sscanf, it accepts a text input and a formatted template, and then passes a reference to the variable in the formatted part, and parses the value of the variable:

const char* input = "test 123 test";
const char* template = "test %d test";
int v = 0;
sscanf(input, template, &v);
printf("%d\n", v); // 123

So can we copy one in C#? sure! It only takes a little bit of black magic.

Implement sscanf with C#

First we make an interpolation string processor with parameters:

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
    private ReadOnlySpan<char> _input = input;

    public void AppendLiteral(ReadOnlySpan<char> s)
    {
    }

    public void AppendFormatted<T>(T v) where T : ISpanParsable<T>
    {
    }
}

Here we put allstringAll changed toReadOnlySpan<char>Reduce allocation.

FollowsscanfWe should use it in theory to make something like this:

void sscanf(ReadOnlySpan<char> input, ReadOnlySpan<char> template, params object[] args);

But obviously, what we need here is(ref object)[], because we need to pass references to update external variables, rather than directly treating the value of the variable asobjectPassed in. So what should I do?

You will find that the interpolation string processor of C# already contains the values ​​of each variable, so we don't need to pass similar things like C/C++%dPlaceholders like this to insert variables! Relative to"test %d test"We can write directly$"test {v} test", and then pass this by referencev

A very natural idea is that we just need toAppendFormatted<T>(T v)Change toAppendFormatted<T>(ref T v)It's not enough.

However, after actually doing this, you will find that this does not work:

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
    private ReadOnlySpan<char> _input = input;

    public void AppendLiteral(ReadOnlySpan<char> s)
    {
    }

    public void AppendFormatted<T>(ref T v) where T : ISpanParsable<T>
    {
    }
}

void sscanf(ReadOnlySpan<char> input, [InterpolatedStringHandlerArgument("input")] TemplatedStringHandler template);

When we try to callsscanfWhen:

int v = 0;
sscanf("test 123 test", $"test {ref v} test"); // error CS1525: Invalid expression term 'ref'

An error has been reported! Write in the value part of the interpolated stringrefKeywords are invalid!

Note that this error is from the parser of the C# compiler, which means that as long as we syntactically take thisrefKill it, and it can be compiled.

At this moment, we had a sudden inspiration, we didn’t have C#inTo pass read-only references? C# forinPassing read-only references will automatically help us create references and pass them in, without explicitly specifying them in syntax.refSo let's use this feature to transform it:

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan<char> input)
{
    private ReadOnlySpan<char> _input = input;

    public void AppendLiteral(ReadOnlySpan<char> s)
    {
    }

    public void AppendFormatted<T>(in T v) where T : ISpanParsable<T>
    {
    }
}

Then you will find that the following code can be successfully compiled:

int v = 0;
sscanf("test 123 test", $"test {v} test");

At this time, we only have the last step to success: the read-only reference is passed in, but in order to extract the variable, we need to update the referenced value, what should we do?

Fortunately, we haveConvert read-only references to variable references, and then the last problem is solved, we can start our implementation.

[InterpolatedStringHandler]
ref struct TemplatedStringHandler(int literalLength, int formattedCount, ReadOnlySpan&lt;char&gt; input)
{
    private int _index = 0;
    private ReadOnlySpan&lt;char&gt; _input = input;

    public void AppendLiteral(ReadOnlySpan&lt;char&gt; s)
    {
        var offset = Advance(0); // Skip consecutive whitespace characters first        _input = _input[offset..];
        _index += offset;
  
        if (_input.StartsWith(s)) // Remove the non-variable part of the template string from the input string        {
            _input = _input[..];
        }
        else throw new FormatException($"Cannot find '{s}' in the input string (at index: {_index}).");

        _index += ;
        literalLength -= ;
    }

    public void AppendFormatted&lt;T&gt;(in T v) where T : ISpanParsable&lt;T&gt;
    {
        var offset = Advance(0); // Skip consecutive whitespace characters first        _input = _input[offset..];
        _index += offset;

        var length = Scan(); // Calculate the length until the next whitespace character        if ((_input[..length], null, out var result)) // Analysis!        {
            (in v) = result; // Change read-only reference to variable reference and update the reference value            _input = _input[length..];
            _index += length;
            formattedCount--;
        }
        else
        {
            throw new FormatException($"Cannot parse '{_input[..length]}' to '{typeof(T)}' (at index: {_index}).");
        }
    }

    // Scan backward until the blank character stops    private int Scan()
    {
        var length = 0;
        for (var i = 0; i &lt; _input.Length; i++)
        {
            if (_input[i] is ' ' or '\t' or '\r' or '\n') break;
            length++;
        }
        return length;
    }

    // Skip all whitespace characters    private int Advance(int start)
    {
        var length = start;
        while (length &lt; _input.Length &amp;&amp; _input[length] is ' ' or '\t' or '\r' or '\n')
        {
            length++;
        }
        return length;
    }
}

Then we provide asscanfExpose our interpolation string processor:

static void sscanf(ReadOnlySpan<char> input, [InterpolatedStringHandlerArgument("input")] TemplatedStringHandler template) { }

use

int x = 0;
string y = "";
bool z = false;
DateTime d = default;
sscanf("test 123 hello false 2025/01/01T00:00:00 end", $"test{x}{y}{z}{d}end");
(x);
(y);
(z);
(d);

Get the output:

123
hello
False
January 1, 2025 0:00:00

andscanfIt's justsscanf((), template)It's just abbreviation, so here we havesscanfIt's completely enough.

in conclusion

The interpolation string processor of C# is very powerful. Using this feature, we have successfully achieved better than in C/C++.sscanfThere are also many string parsing functions that need to be used better. Not only does it not require formatting string placeholding, but it can also automatically deduce types. Even the need to pass variable references one by one in the subsequent parameters is directly eliminated. On this basis, we have achieved zero allocation.

This is the end of this article about C# using interpolated string processor to write a sscanf. For more related C# interpolated string content, please search for my previous articles or continue browsing the related articles below. I hope everyone will support me in the future!