-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
EDITED by @stephentoub 1/13/2023:
Latest API proposal at #50330 (comment)
Background and Motivation
The strides in .NET around string formatting over the last few years have been significant, and .NET 6 with C# 10 promise further substantial improvements to string interpolation performance. However, string interpolation has a serious constraint: it only works when the format strings are known at compilation time.
For any scenario where the text to format is not known at compilation time, such as for localized content being pulled from a resource file, then classic String.Format
and StringBuilder.AppendFormat
remain the workhorses.
This proposal is designed to deliver substantial performance improvements to classic composite string formatting by hoisting the expensive process of parsing the composite format string to initialization time, enabling the hot path of performing string formatting to execute up to 5x faster than using classic functions.
The model is logically similar to how you can compile a regex a priori and use it repeatedly to seek matches. Here, you parse and process the format string once, and then use it to repeatedly format strings.
Please see here for a fully functional implementation, with complete test coverage and a banchmark.
Proposed API
The primary API surface proposed here is fairly minimal. The CompositeFormat
type has a primary constructor, then various overloads for Format
and TryFormat
.
/// <summary>
/// Provides highly efficient string formatting functionality.
/// </summary>
/// <remarks>
/// This type lets you optimize string formatting operations common with the <see cref="string.Format(string,object?)" />
/// method. This is useful for any situation where you need to repeatedly format the same string with
/// different arguments.
///
/// This type works faster than <c>string.Format</c> because it parses the composite format string only once when
/// the instance is created, rather than doing it for every formatting operation.
///
/// You first create an instance of this type, passing the composite format string that you intend to use.
/// Once the instance is created, you call the <see cref="Format{T}(IFormatProvider?,T)"/> method with arguments to use in the
/// format operation.
/// </remarks>
public readonly struct CompositeFormat
{
/// <summary>
/// Initializes a new instance of the <see cref="CompositeFormat"/> struct.
/// </summary>
/// <param name="format">A classic .NET format string as used with <see cref="string.Format(string,object?)" />.</param>
/// <remarks>
/// Parses a composite format string into an efficient form for later use.
/// </remarks>
public CompositeFormat(ReadOnlySpan<char> format);
/// <summary>
/// Initializes a new instance of the <see cref="CompositeFormat"/> struct.
/// </summary>
/// <param name="format">A template-based .NET format string as used with <c>LoggerMessage.Define</c>.</param>
/// <param name="templates">Holds the named templates discovered in the format string.</param>
/// <remarks>
/// Parses a composite format string into an efficient form for later use.
/// </remarks>
public CompositeFormat(ReadOnlySpan<char> format, out IList<string> templates)
/// <summary>
/// Formats a string with a single argument.
/// </summary>
/// <typeparam name="T">Type of the single argument.</typeparam>
/// <param name="arg">An argument to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T>(T arg);
/// <summary>
/// Formats a string with a single argument.
/// </summary>
/// <typeparam name="T">Type of the single argument.</typeparam>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg">An argument to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T>(IFormatProvider? provider, T arg);
/// <summary>
/// Formats a string with two arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T0, T1>(T0 arg0, T1 arg1);
/// <summary>
/// Formats a string with two arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T0, T1>(IFormatProvider? provider, T0 arg0, T1 arg1);
/// <summary>
/// Formats a string with three arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <typeparam name="T2">Type of the third argument.</typeparam>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <param name="arg2">Third argument to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T0, T1, T2>(T0 arg0, T1 arg1, T2 arg2);
/// <summary>
/// Formats a string with three arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <typeparam name="T2">Type of the third argument.</typeparam>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <param name="arg2">Third argument to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T0, T1, T2>(IFormatProvider? provider, T0 arg0, T1 arg1, T2 arg2);
/// <summary>
/// Formats a string with arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <typeparam name="T2">Type of the third argument.</typeparam>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <param name="arg2">Third argument to use in the formatting operation.</param>
/// <param name="args">Additional arguments to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T0, T1, T2>(T0 arg0, T1 arg1, T2 arg2, params object?[]? args);
/// <summary>
/// Formats a string with arguments.
/// </summary>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <typeparam name="T2">Type of the third argument.</typeparam>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <param name="arg2">Third argument to use in the formatting operation.</param>
/// <param name="args">Additional arguments to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format<T0, T1, T2>(IFormatProvider? provider, T0 arg0, T1 arg1, T2 arg2, params object?[]? args);
/// <summary>
/// Formats a string with arguments.
/// </summary>
/// <param name="args">Arguments to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format(params object?[]? args);
/// <summary>
/// Formats a string with arguments.
/// </summary>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="args">Arguments to use in the formatting operation.</param>
/// <returns>The formatted string.</returns>
public string Format(IFormatProvider? provider, params object?[]? args);
/// <summary>
/// Formats a string with one argument.
/// </summary>
/// <typeparam name="T">Type of the single argument.</typeparam>
/// <param name="destination">Where to write the resulting string.</param>
/// <param name="charsWritten">The number of characters actually written to the destination span.</param>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg">An argument to use in the formatting operation.</param>
/// <returns>True if there was enough room in teh destination span for the resulting string.</returns>
public bool TryFormat<T>(Span<char> destination, out int charsWritten, IFormatProvider? provider, T arg);
/// <summary>
/// Formats a string with two arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <param name="destination">Where to write the resulting string.</param>
/// <param name="charsWritten">The number of characters actually written to the destination span.</param>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <returns>True if there was enough room in teh destination span for the resulting string.</returns>
public bool TryFormat<T0, T1>(Span<char> destination, out int charsWritten, IFormatProvider? provider, T0 arg0, T1 arg1);
/// <summary>
/// Formats a string with three arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <typeparam name="T2">Type of the third argument.</typeparam>
/// <param name="destination">Where to write the resulting string.</param>
/// <param name="charsWritten">The number of characters actually written to the destination span.</param>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <param name="arg2">Third argument to use in the formatting operation.</param>
/// <returns>True if there was enough room in teh destination span for the resulting string.</returns>
public bool TryFormat<T0, T1, T2>(Span<char> destination, out int charsWritten, IFormatProvider? provider, T0 arg0, T1 arg1, T2 arg2);
/// <summary>
/// Formats a string with arguments.
/// </summary>
/// <typeparam name="T0">Type of the first argument.</typeparam>
/// <typeparam name="T1">Type of the second argument.</typeparam>
/// <typeparam name="T2">Type of the third argument.</typeparam>
/// <param name="destination">Where to write the resulting string.</param>
/// <param name="charsWritten">The number of characters actually written to the destination span.</param>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="arg0">First argument to use in the formatting operation.</param>
/// <param name="arg1">Second argument to use in the formatting operation.</param>
/// <param name="arg2">Third argument to use in the formatting operation.</param>
/// <param name="args">Additional arguments to use in the formatting operation.</param>
/// <returns>True if there was enough room in teh destination span for the resulting string.</returns>
public bool TryFormat<T0, T1, T2>(Span<char> destination, out int charsWritten, IFormatProvider? provider, T0 arg0, T1 arg1, T2 arg2, params object?[]? args);
/// <summary>
/// Formats a string with arguments.
/// </summary>
/// <param name="destination">Where to write the resulting string.</param>
/// <param name="charsWritten">The number of characters actually written to the destination span.</param>
/// <param name="provider">An optional format provider that provides formatting functionality for individual arguments.</param>
/// <param name="args">Arguments to use in the formatting operation.</param>
/// <returns>True if there was enough room in teh destination span for the resulting string.</returns>
public bool TryFormat(Span<char> destination, out int charsWritten, IFormatProvider? provider, params object?[]? args);
/// <summary>
/// Gets the number of arguments required in order to produce a string with this instance.
/// </summary>
public int NumArgumentsNeeded { get; }
}
In addition to the feature set shown above, it's also desirable to add extension methods to the StringBuilder
which leverage the same model for enhanced formatting performance in that context. See here for the prototype API for these.
Usage Examples
// preprocess the format string statically at startup
private static readonly _cf = new CompositeFormat(MyResources.MyHelloFormatString);
public void Foo(string name)
{
// format the string
var str = _cf.Format(name); // logically equivalent to String.Format(MyResources.MyHelloFormatString, name);
Console.WriteLine(str);
str = new StringBuilder().AppendFormat(_cf, name).ToString();
Console.WriteLine(str);
}
Alternative Designs
Rather than implementing the Format/TryFormat
methods as instance methods on the CompositeFormat
type, they could be integrated directly into the String
type as first class methods. This would provide a clean model. Similarly, the extensions methods for StringBuilder
could be baked in directly to the StringBuilder
type.
Using such an approach doesn't yield additional performance, it's really just a matter of how the API is suraced.
Benchmarks
Here are benchmarks for the current prototype implementation:
| Method | Mean | Error | StdDev | Gen 0 | Gen 1 | Gen 2 | Allocated |
|------------------------ |----------:|----------:|----------:|----------:|----------:|---------:|-----------:|
| ClassicStringFormat | 45.036 ms | 26.859 ms | 1.4723 ms | 3083.3333 | 1333.3333 | 416.6667 | 16800613 B |
| Interpolation | 43.340 ms | 3.324 ms | 0.1822 ms | 3000.0000 | 1187.5000 | 312.5000 | 16799929 B |
| StringBuilder | 51.144 ms | 3.827 ms | 0.2098 ms | 2900.0000 | 1100.0000 | 200.0000 | 16799922 B |
| CompositeFormat | 26.178 ms | 16.042 ms | 0.8793 ms | 2062.5000 | 781.2500 | 156.2500 | 11999929 B |
| CompositeFormatWithSpan | 10.382 ms | 1.172 ms | 0.0642 ms | - | - | - | 2 B |
| StringMaker | 9.573 ms | 1.488 ms | 0.0816 ms | 5546.8750 | - | - | 23199601 B |
| StringMakerWithSpan | 8.550 ms | 6.963 ms | 0.3817 ms | 2671.8750 | - | - | 11199680 B |
Risks
None that I'm aware.
Implementation Notes
The prototype implementation I linked to above requires .NET Standard 2.1 as a baseline.
The implementation as presented depends on an internal StringMaker
type, which is a highly efficient replacement for StringBuilder
packaged as a ref readonly
struct. It would be easy to reimplement this code on top of equivalent data types being created to support the C# compiler's enhanced interpolation features in C# 10.