```.NET (也就是 C#) 终于支持联合类型了```
.NET (OK, C#) finally gets union types

原始链接: https://andrewlock.net/exploring-the-dotnet-11-preview-2-dotnet-gets-union-types/

C# 15 在 .NET 11 预览版中首次亮相,引入了期待已久的**联合类型(union types)**支持。与 F# 或 Rust 等函数式语言类似,联合类型允许单个类型表示多个可能互不相关的状态(例如 `Result` 或多种不同记录类型的选择)。 **主要亮点包括:** * **简洁语法:** 使用 `public union TypeName(TypeA, TypeB);` 进行定义,编译器会自动生成必要的结构,包括构造函数和 `IUnion` 实现。 * **穷尽性匹配(Exhaustive Switching):** 完全支持通过 switch 表达式进行模式匹配。编译器会强制要求处理所有可能的联合状态,从而提高类型安全性,并消除对默认弃元情况(default discard cases)的需求。 * **性能:** 虽然默认实现会将值装箱为 `object`,但开发人员可以实现 `TryGetValue` 模式来创建“无装箱”联合类型,从而在对性能敏感的场景中避免堆分配。 * **向后兼容性:** 通过手动实现 `[Union]` 特性和 `IUnion` 接口,开发人员即使在针对旧版 .NET 运行时,也能启用联合类型功能。 随着该功能的不断演进,预计未来加入的封闭枚举(closed enums)和封闭层级(closed hierarchies)将进一步提升 C# 中模式匹配的穷尽性。

C# 终于获得联合类型(Union Types)支持的消息,在 Hacker News 上引发了热烈讨论。支持者认为这是一项期待已久的功能,将显著改善代码的易用性,更简洁地建模复杂领域,并减少对冗长类层次结构的依赖。许多贡献者指出,作为 F#、Rust 和 TypeScript 等语言的基本构建模块,联合类型将有助于开发者消除无效状态。 然而,这次讨论也反映出人们对 .NET 生态系统更广泛的不满。批评者认为 C# 正变得过于复杂,其功能覆盖面日益扩大,导致不同代码库之间难以保持一致性。反对者还表达了对具体实现的担忧,尤其是值类型“装箱”(boxing)的可能性,以及与新兴的函数式优先语言相比,该功能缺乏一种完美的“原生感”。 除了这一特定功能外,讨论帖还强调了关于微软开发者工具策略的持续分歧,包括人们对 Java 在大规模数据处理方面成熟生态的认可,以及在 .NET 中寻找“足够强大”的 UI 框架的困境。归根结底,尽管许多开发者欢迎 C# 的进化,但也有人认为,随着该语言不断累积各种迥异的功能,其开发难度正变得越来越大。
相关文章

原文

Unions are one of those features that have been requested for years, and in .NET 11 (or rather, C# 15) they're finally here. In this post I describe what that support looks like, how you can use them, how they're implemented, and how you can implement your own custom types.

This post was written using the features available in .NET 11 preview 4. Many things may change between now and the final release of .NET 11.

Unions are one of those basic data structures which are used all the time in the functional programming world; they're available in F#, TypeScript, Rust…pretty much any functional-first language. There are many different types of union, but at their core they allow having a type that can represent two different things.

Some of the simplest union types are the Option<T> and Result<TSuccess, TError> types. There's no "standard" version of these, but it's super common to see custom implementations. Result<> is one of the easiest to explain as it can be in one of two states:

  • Success—in this case the Result<> object contains a TSuccess value representing the "success" result for an operation that succeeded.
  • Error—in this case the Result<> object contains a TError value representing the "error" for an operation that failed.

You return a Result<> object from your method, and then the caller has to explicitly handle both cases instead of assuming success.

This pattern is often called the result pattern and it has both pros and cons in C#. I wrote a series about using this pattern, as well as considering whether it's worth it here.

Union types don't have to be the super generic form like this though. They can be used to represent any arbitrary combined set of types.

In the previous section I used the classic Result<> type as an example of a union, but unions are far more versatile than that. They're ideal whenever you want to deal with data that could be one of several potentially unrelated types.

For example, imagine we have three different record types, containing different properties, representing Operating Systems:

public record Windows(string Version);
public record Linux(string Distro, string Version);
public record MacOS(string Name, int Version);

Note that these types don't have any values in common. Prior to C# 15, the main options for handling something which could be a Windows or Linux or MaxOS object would be:

  • Try to create a base class from which all the types derive. That might work, but what if you don't control these types because they come from a library?
  • Store the type in an object instance. This works, but you lose all the safety of working with types in this case.
  • Use some "tag" value for keeping track of which type your object contains, e.g. using an enum to track this.

In C# 15, we get direct support for this scenario with the union keyword, as shown below:


public union SupportedOS(Windows, Linux, MacOS);

You can create an instance of the SupportedOS type in a couple of ways:


SupportedOS os = new SupportedOS(new MacOS("Tahoe", 25));


SupportedOS os = new MacOS("Tahoe", 25);

The generated union type implements the IUnion interface:

public interface IUnion
{
    object? Value { get; }
}

so you can always get the "inner" case value back out as an object? if you need to:


Console.WriteLine(os.Value); 

However, the canonical way to work with unions is to use a switch expression:

string GetDescription(SupportedOS os) => os switch
{
    Windows windows => $"Windows {windows.Version}",
    Linux linux => $"{linux.Distro} {linux.Version}",
    MacOS macOS => $"MacOS {macOS.Name} ({macOS.Version})",
}; 

The switch expression automatically extracts the inner case type, and a very neat thing is that you don't need to include the _ => "discard" case either: the compiler enforces that you check for each of the allowed values, but you only need to check these values. And if you forget one, you'll get a warning:

warning CS8509: The switch expression does not handle all possible values of its input type
(it is not exhaustive). For example, the pattern 'MacOS' is not covered.

Note that if one of your case types is nullable, e.g. MacOS? then you'll need to handle null in your switch expressions too.

To come full circle, we could perhaps implement the Result<> type as the following (just an example, there's lots of different implementations we could choose!)

public union Result<T>(T, Exception);

or to show another classic, the Option<T> type:

public record class None;
public union Option<T>(None, T);

That's the basics of the union types in C# 15, so next we'll look at how you can use them today, before we look behind the scenes at how they're implemented.

To use union types you need to do two things:

  • Install .NET 11 preview 2+ SDK. The initial union support was added in preview 2, but you'll have a smoother experience if you install preview 4+.
  • Enable preview language support in your .csproj files, by adding <LangVersion>preview</LangVersion>
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>

    
    <LangVersion>preview</LangVersion>

    <TargetFrameworks>net11.0;net8.0;net48</TargetFrameworks>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

</Project>

Note that although you need to use the .NET 11 SDK, you can target earlier versions of the runtime, such as I'm doing in the above .csproj file. The union support is implemented as a compiler feature, so it's available on earlier runtimes (even if it's not technically supported on them).

However, if you're targeting earlier runtimes (or you're using .NET 11 preview 2 or preview 3), then you'll also need to add some helper types to your project:

#if !NET11_0_OR_GREATER
namespace System.Runtime.CompilerServices;

[AttributeUsage(Class | Struct, AllowMultiple = false, Inherited = false)]
public sealed class UnionAttribute : Attribute;

public interface IUnion
{
    object? Value { get; }
}

These were added to .NET 11 in preview 4, so they'll be available automatically if you're using a newer SDK, but you'll need to include them if you're targeting earlier runtimes, regardless.

As you might have guessed, when the compiler creates the union types, it uses this attribute and implements this interface. In the next section we'll take a look at what the generated code looks like, to understand how the union types are implemented.

In terms of IDE support, if you're using either Visual Studio Preview, or VS Code's C# DevKit Insiders, then you should have initial support. Support for JetBrains Rider is still pending.

You can see the full spec for union types here, but the standard generated code is really pretty simple:

using System.Runtime.CompilerServices;

[Union]
public struct SupportedOS : IUnion
{
    public object? Value { get; }

    
    public SupportedOS(Windows value) => this.Value = (object) value;
    public SupportedOS(Linux value) => this.Value = (object) value;
    public SupportedOS(MacOS value) => this.Value = (object) value;
}

As you can see, the generated SupportedOS type:

  • Is a struct, decorated with the [Union] attribute.
  • Has a single, readonly, object? Value property, implementing the IUnion interface.
  • Has a constructor for each of the case types it supports.

I was somewhat surprised to find there was no implicit conversion from the case types to the SupportedOS type, given that we can write code like this:

SupportedOS os = new MacOS("Tahoe", 25);

However it looks like the compiler simply rewrites this to use the [Union] constructor:




SupportedOS os = new SupportedOS(new MacOS("Tahoe", 25));

This implicit conversion is all driven by the [Union] attribute. You can see this in action if we rewrite our example to not use the union keyword, and instead use the implementation code shown previously but we "forget" to include the [Union] attribute:

using System.Runtime.CompilerServices;

SupportedOS os = new MacOS("Tahoe", 25); 

var description = os switch
{
    Windows windows => $"Windows {windows.Version}",        
    Linux linux => $"{linux.Distro} {linux.Version}",       
    MacOS macOS => $"MacOS {macOS.Name} ({macOS.Version})", 
};

public record Windows(string Version);
public record Linux(string Distro, string Version);
public record MacOS(string Name, int Version);




public struct SupportedOS : IUnion
{
    public object? Value { get; }

    public SupportedOS(Windows value) => this.Value = (object) value;
    public SupportedOS(Linux value) => this.Value = (object) value;
    public SupportedOS(MacOS value) => this.Value = (object) value;
}

The code above fails to compile with the following, demonstrating how the [Union] attribute drives the implicit conversions and switch expressions:

error CS0029: Cannot implicitly convert type 'MacOS' to 'SupportedOS'
error CS8121: An expression of type 'SupportedOS' cannot be handled by a pattern of type 'Windows'.
error CS8121: An expression of type 'SupportedOS' cannot be handled by a pattern of type 'Linux'.
error CS8121: An expression of type 'SupportedOS' cannot be handled by a pattern of type 'MacOS'.

If you re-instate the [Union] attribute, everything compiles and runs just fine, which shows how you can create your own custom union types.

Given we're just getting support for union types, why might you want to create custom Union types? One reason is that you might already be using custom union types, such as provided by OneOf, or Sasa (two packages I've used in the past). In these cases, the libraries could benefit from built-in language support (e.g. switch expression support) by simply implementing the IUnion interface and adding the [Union] attribute.

Another case is when the "store the case type in an object instance" just isn't good enough for you. The generated union type is always a struct with a single object field. That means that if you're creating a union of multiple struct types, those types are going to be boxed onto the heap.

For example, imagine you need this union, which can represent either an int or a bool:

public union IntOrBool(int, bool);

The problem is that the int or bool passed into the constructor of IntOrBool is immediately boxed to an object and stored in the Value property:

[Union]
public struct IntOrBool : IUnion
{
    public object? Value { get; }

    
    public IntOrBool(int value) => this.Value = (object) value;
    public IntOrBool(bool value) => this.Value = (object) value;
}

This allocates on the heap, which is generally undesirable, as union types are intended to be largely transparent performance-wise. Any switch expressions using this implementation will similarly use the Value property. For example, with the basic built-in union implementation, the following expression:

IntOrBool intOrBool;
var description = intOrBool switch
{
    int i => "integer",
    bool b => "bool",
};

would lower to code similar to this:

IntOrBool unmatchedValue = new IntOrBool(23);
object obj = unmatchedValue.Value; 
string str;
if (obj is int _)
{
    str = "integer";
}
else if (obj is bool _)
{
    str = "bool";
}
else
{
    ThrowSwitchExpressionException((object) unmatchedValue); 
}

In many cases, the boxing allocation won't really matter, but in other places, such as in hot paths, the boxing is undesirable. To account for this, the union feature allows for a "non-boxing" implementation, using a TryGetValue pattern. This requires that you implement:

  • bool HasValue { get; } which returns true if the stored value is non-null
  • bool TryGetValue(out T value) for each case type, T

For example, the following is a potential implementation of the IntOrBool type above that avoids boxing

[Union]
public struct IntOrBool : IUnion
{
    private readonly bool _isBool;
    private readonly int _value;

    public IntOrBool(int value)
    {
        _isBool = false;
        _value = value;
    }

    public IntOrBool(bool value)
    {
        _isBool = true;
        _value = value ? 1 : 0;
    }

    public bool HasValue => true; 
    public bool TryGetValue(out int value) 
    {
        value = _value;
        return !_isBool;
    }
    public bool TryGetValue(out bool value) 
    {
        value = _isBool && _value is 1;
        return _isBool;
    }
    
    
    
    public object Value => _isBool ? _value is 1 : _value;
}

When you implement the TryGetValue() methods, the compiler automatically uses them in switch expressions instead of the Value property, so the switch expression above becomes the following:

IntOrBool unmatchedValue = new IntOrBool(23);
string str;

if (unmatchedValue.TryGetValue(out int _)) 
{
    str = "integer";
}
else if (unmatchedValue.TryGetValue(out bool _))
{
    str = "bool";
}
else
{
    ThrowSwitchExpressionException((object) unmatchedValue); 
}

Depending on your code paths and use-cases, it may or may not be worth creating custom non-boxing implementations like this, it depends on what you're using the union types for in your code base.

The union implementation is usable as currently shipped, but there's even more to the language proposal than I've covered. Here are some of the related features that are yet to come:

  • Union member providers. These provide a way to define the members that are part of the union type on a different type to the union itself.
  • Closed enums. These are enums in which you don't need to include a "catch-all" expression (_ =>) in the switch expression for the enum.
  • Closed hierarchies. This allows adding the closed modifier on a class to prevent derived classes from being declared outside the defining assembly, which then similarly allows exhaustive switch expressions without a catch-all expression.

These features may or may not make it into .NET 11, but I'll be sure to cover them if they do!

In this post I described the support for union types introduced in .NET 11 preview 2. I described the steps you need to implement them, as well as how to deconstruct union types using switch expressions. I showed the union declaration syntax, how they're implemented behind the scenes, as well as how to implement a non-boxing version of a union type. Finally I discussed some of the plans and roadmap for union types and for exhaustiveness improvements in C# that are yet to be released.

联系我们 contact @ memedata.com