苹果飞地
Apple Exclaves

原始链接: https://randomaugustine.medium.com/on-apple-exclaves-d683a2c37194

XNU 使用两级内核表来管理安全世界(飞地)资源。域对资源进行分类,“com.apple.kernel”域包含诸如 ConclaveLauncherControl、ExclaveIndicatorController、LogServer_XNUProxy、FrameMint、存储后端和 Conclave 管理器等服务。每个飞地(安全环境)都有其自己的域(“com.apple.conclave.name”)。 飞地允许共享资源访问,并由内核域内的 Conclave 管理器进行管理。launchd 或特权任务可以将任务附加到 Conclave 管理器,并使用 `_exclaves_ctl_trap()` Mach 陷阱启动飞地,该陷阱处理各种飞地操作(启动、服务查找、缓冲区管理等)。 安全世界代码执行通过对飞地服务端点的“下行调用”实现,由 XNU 管理。线程可以对 XNU 执行“上行调用”以获取有限的服务,例如内存分配、文件存储或 DriverKit 功能,然后返回安全世界。XNUProxy 促进与特定飞地的通信。 飞地启动涉及多个阶段,首先是加载和验证 SK(安全内核),然后初始化上行调用服务器、飞地调度程序、XNUProxy 和资源发现。SPTM(安全页表管理)定义安全世界隔离和共享访问的内存类型。

这篇 Hacker News 讨论串探讨了苹果的“飞地”(exclaves)——在其操作系统中设计的安全、隔离的软件组件,旨在通过处理敏感操作来增强安全性。这些“飞地”旨在即使主内核(XNU)被破坏也能保护用户数据和系统完整性。讨论探索了 ARM 的 TrustZone、安全页表监视器 (SPTM) 和受保护执行 (GXF) 的潜在用途。 评论者们辩论了安全性和用户自由之间的权衡,一些人赞扬苹果专注于用户体验、便利性和隐私,而另一些人则批评其生态系统的“封闭”性质。讨论中也提到了史蒂夫·乔布斯的遗产以及蒂姆·库克继续推动用户隐私。一些评论者提到存在更好的替代方案,但这些方案需要更多的手动配置和维护,而他们不愿意这样做。其他人则提到了权衡,例如以苹果利润为代价的安全,与更大的自由之间的权衡。

原文

XNU initialises a two-level kernel table structure to hold information on exclave resources discovered during boot. Each resource is exclusively of one resource type and holds information on a resource that exists in the secure world, or in both worlds.

The root_table identifies domains by name, with each domain referencing a second level table holding all the resources for that domain. Domains and their resources discovered so far include:

com.apple.kernel — this domain contains many resources used by the kernel including:

  • com.apple.service.ConclaveLauncherControl — conclave launcher service
  • com.apple.service.ConclaveLauncher_Debug — debug service
  • com.apple.service.ExclaveIndicatorController — service for secure indicator lights
  • com.apple.service.LogServer_XNUProxy — service for logging
  • com.apple.service.FrameMint — service used to boot ExclaveKit
  • com.apple.storage.backend — Shared memory buffer used by exclave services to do file IO from XNU space via upcalls (more details below)
  • Conclave Manager x — One per conclave, used to control a conclave
  • Conclave Manager y …

com.apple.darwin — No open-source components use this domain

com.apple.conclave.name — There is one domain per conclave.

  • service_x
  • service_y
  • audio buffer
  • shared memory buffer
  • etc

com.apple.driver.name — One domain per device driver — existence of these domains is based on comments, not actually seen in open-sourced code. I suspect these are just per-driver conclaves.

A conclave is a type of resource that itself can contain multiple resources. However, it is much more than just a container of resources. Conclaves allow a group of services and other resources to have shared access to each other, and Mach tasks are limited in what (if any) conclaves they can call upon.

Each conclave has a Conclave Manager (another type of exclave resource), located in the kernel domain.

Conclaves have a lifecycle, whereby their Conclave Manager is first attached to a Mach task, and are then launched. They can also be stopped and detached. States such as launching and stopping exist during transitions in the lifecycle.

The XNU posix_spawn() function can call task_add_conclave() to attach a task and a conclave manager resource together. This is a 1:1 relationship — only one task can be attached to a conclave manager and vice versa. Only launchd and tasks with the com.apple.private.exclaves.conclave-spawn entitlement may spawn a conclave. The com.apple.private.exclaves.conclave-host entitlement is largely similar, but I believe only entitles a task to attach itself, rather than being able to spawn a new task for this purpose.

The kernel looks up the associated conclave manager resource for the targeted conclave in the com.apple.kernel domain. It then saves a tightbeam endpoint to the conclave manager’s endpoint in the conclave’s resource struct. This endpoint is where all future control of the conclave is directed. Tightbeam appears to be an RPC framework for communication between exclave components.

Note this attachment is to a task — not a thread. Execution of services will be covered later.

Conclave manager tasks are not allowed to have kernel domain privileges.

Once attached, a conclave may be launched. The launch attempt must be performed from the conclave manager task attached to the conclave. Attempts to launch conclaves also wait until exclaves have fully booted (into state EXCLAVES_BS_BOOTED_EXCLAVEKIT — more on this later).

A new mach trap (ie system call) for exclave functionality has been added to XNU and ends up in the _exclaves_ctl_trap() function. This call is overloaded and can perform different operations passed in as parameters. The relevant operation to launch a conclave is EXCLAVES_CTL_OP_LAUNCH_CONCLAVE.

The launch operation calls a redacted function, conclave_launcher_conclavecontrol_launch() and passes it the tightbeam connection to the conclave manager to perform the launch. I suspect this requests the initialisation of executable code and resources for the conclave within the secure world.

In production, conclave hosts can be tainted when launched, and an exit() may then cause a kernel panic.

As mentioned, the _exclaves_ctl_trap() function handles a new Mach trap for exclave functionality. The call is overloaded, with its action dependent on an operation parameter, and it generally verifies entitlements to the operations called. The operations are:

  • EXCLAVES_CTL_OP_BOOT — Called twice during the system boot process — firstly to start exclaves boot stage 2, and then to boot stage ExclaveKit. The caller must be launchd or have the com.apple.private.exclaves.boot entitlement.

All operations below, at minimum, require the current task to have the com.apple.private.exclaves.kernel-domain entitlement, or be the relevant conclave manager task

  • EXCLAVES_CTL_OP_LAUNCH_CONCLAVE — launch a conclave, discussed earlier
  • EXCLAVES_CTL_OP_LOOKUP_SERVICES — lookup an exclave service and copy its struct to a userspace buffer. First it looks in the exclave domain of the current task, if that fails it checks the Darwin domain followed by kernel domain — if it is entitled to do so
  • EXCLAVES_CTL_OP_ENDPOINT_CALL — calls the endpoint for an exclave service in the current task’s domain — this will result in the current thread switching from kernel mode to the secure world and executing specific code there
  • EXCLAVES_CTL_OP_NAMED_BUFFER_CREATE — create a named buffer resource
  • EXCLAVES_CTL_OP_NAMED_BUFFER_COPYIN — copy data from a userspace buffer to a kernel buffer (that is shared with exclaves)
  • EXCLAVES_CTL_OP_NAMED_BUFFER_COPYOUT — copy data from a kernel buffer (that is shared with exclaves) to a userspace buffer
  • EXCLAVES_CTL_OP_AUDIO_BUFFER_CREATE — can an audio buffer
  • EXCLAVES_CTL_OP_AUDIO_BUFFER_COPYOUT — copy data from audio buffer to userspace buffer
  • EXCLAVES_CTL_OP_SENSOR_CREATE — create a sensor resource (eg. camera, microphone)
  • EXCLAVES_CTL_OP_SENSOR_START
  • EXCLAVES_CTL_OP_SENSOR_STOP
  • EXCLAVES_CTL_OP_SENSOR_STATUS
  • EXCLAVES_CTL_OP_NOTIFICATION_RESOURCE_LOOKUP — create a notification resource — TBD, but likely for coordination/scheduling

Downcalls are calls to exclave Services’ endpoints in the secure world — this is where secure world code execution happens.

There is a great deal of complexity in these calls, primarily around managing thread/IPC contexts and scheduling the current thread to execute code in the secure world.

  1. Downcalls switch the current thread into the secure world and start executing at an entry point in secure code, rather than asking some other thread to perform work on behalf of the current thread.
  2. Calling tasks must have kernel domain entitlements or be the conclave manager task attached to the service’s conclave.
  3. Conclaves have a maximum of 128 services that can be called
  4. It appears that threads are scheduled into the secure kernel (via the sk_enter() function) by XNU. XNU appears to handle the scheduling of all threads in the secure world, with SK potentially not having any independent threads of its own.
  5. A thread executing in the secure world can perform a temporary upcall to XNU, which returns the thread to kernel mode for the upcall, before a mandatory return back to the secure world context. More detail on upcalls will be provided further below.
  6. Threads executing in the secure world can do normal scheduler type things like yield, wait, be suspended, or be interrupted. When this happens, the thread leaves the secure world and returns to the XNU kernel context. From there it must be rescheduled back into the secure world by exclave scheduling code in XNU. The thread will continue to be rescheduled into the secure world as necessary until the downcall is completed.
  7. If a secure world thread is panic()ing on a CPU core (which will call on XNU to panic via SPTM), fresh tasks are no longer scheduled into the secure world on other cores and they wait for a timeout period. If everything goes correctly, the waiting threads will never finish their wait. However if the timeout expires, the waiting threads will then … panic() :)
  8. XNU appears to handle all interrupt processing, rather than SK. When XNU is finished handling an interrupt, the interrupted thread is returned to the secure world if it was executing there. Directing interrupts to either the insecure or secure kernel is an ARM TrustZone feature.
  9. IPC structures for the downcall are setup with request and response buffers before entering the secure world through the redacted sk_enter() call.
  10. Interrupts and pre-emption are disabled while finalising the IPC request structure and calling sk_enter(). This is because there is only one of these structures per core. I suspect the redacted path travelled after calling sk_enter() and entering the secure world copies the request from the per-cpu structure into secure world memory, and then re-enables interrupts and pre-emption on the core. The alterative would be ugly. A similar process happens in reverse for protecting the per-cpu response structure.
  11. Disconcertingly, the downcall response can come back via a different CPU’s per-core response buffer, as the downcall may have been interrupted, upcalled, or yielded and needed rescheduling.
  12. Coordination of a thread’s exclave status (to avoid SK re-entry etc) occurs via th_exclaves_state — a bitfield in the thread structure.

A thread running in the secure world due to a downcall may need assistance from XNU and this can be achieved through an upcall to the exclaves upcall handler via the Tightbeam framework. Upcalls are limited to specific functions within XNU. A thread desiring an upcall returns to the insecure world where the specific upcall handler is called. While in this state, the thread cannot return to user mode (for obvious reasons) nor perform another downcall to the secure world, ie it is not allowed to “re-enter” exclaves. Instead the thread will be returned to the secure world at the point where it performed the upcall.

Allowed upcalls discovered in the source end up inside the following functions:

Memory
exclaves_memory_upcall_alloc(npages, kind, completion);
exclaves_memory_upcall_free(pages, npages, kind, completion);

File storage
exclaves_storage_upcall_root(exclaveid, completion);
exclaves_storage_upcall_open(fstag, rootid, name, completion);
exclaves_storage_upcall_close(fstag, fileid, completion);
exclaves_storage_upcall_create(fstag, rootid, name, completion);
exclaves_storage_upcall_read(fstag, fileid, descriptor, completion);
exclaves_storage_upcall_write(fstag, fileid, descriptor, completion);
exclaves_storage_upcall_remove(fstag, rootid, name, completion);
exclaves_storage_upcall_sync(fstag, op, fileid, completion);
exclaves_storage_upcall_readdir(fstag, fileid, buf, length, completion);
exclaves_storage_upcall_getsize(fstag, fileid, completion);
exclaves_storage_upcall_sealstate(fstag, completion);

DriverKit
exclaves_driverkit_upcall_irq_register(id, index, completion);
exclaves_driverkit_upcall_irq_remove(id, index, completion);
exclaves_driverkit_upcall_irq_enable(id, index, completion);
exclaves_driverkit_upcall_irq_disable(id, index, completion);
exclaves_driverkit_upcall_timer_register(id, completion);
exclaves_driverkit_upcall_timer_remove(id, timer_id, completion);
exclaves_driverkit_upcall_timer_enable(id, timer_id, completion);
exclaves_driverkit_upcall_timer_disable(id, timer_id, completion);
exclaves_driverkit_upcall_timer_set_timeout(id, timer_id, duration,completion);
exclaves_driverkit_upcall_timer_cancel_timeout(id, timer_id, completion);
exclaves_driverkit_upcall_lock_wl(id, completion);
exclaves_driverkit_upcall_unlock_wl(id, completion);
exclaves_driverkit_upcall_async_notification_signal(id, notificationID, completion);
exclaves_driverkit_upcall_mapper_activate(id,mapperIndex, completion);
exclaves_driverkit_upcall_mapper_deactivate(id, mapperIndex, completion);
exclaves_driverkit_upcall_notification_signal(id, mask, completion);

DriverKit Apple Neural Engine
exclaves_driverkit_upcall_ane_setpowerstate(id, desiredState, completion);
exclaves_driverkit_upcall_ane_worksubmit(id, requestID, taskDescriptorCount, submitTimestamp, completion);
exclaves_driverkit_upcall_ane_workbegin(id, requestID, beginTimestamp, completion);
exclaves_driverkit_upcall_ane_workend(id, requestID, completion);

Conclaves
exclaves_conclave_upcall_suspend(flags, completion);
exclaves_conclave_upcall_stop(flags, completion);
exclaves_conclave_upcall_crash_info(shared_buf, length, completion);

References to XNUProxy abound, yet I haven’t been able to definitely pin down exactly what and where it is. Options I have considered include:

  • It’s an exclave domain of its own, something like com.apple.xnuproxy
  • It’s an exclave service or bunch of services that runs in the com.apple.kernel domain, serving particular types of downcalls.
  • It’s a subsystem in SPTM for making downcalls to the secure world…

Comments in Exclaves_L4.h state that the XNU Proxy makes the following exclaves reachable (aside from testing ones, usually featuring the word “HELLO” in them):

  • EXCLAVES_XNUPROXY_EXCLAVE_USERAPP/2/3 (templated user app…)
  • EXCLAVES_XNUPROXY_EXCLAVE_AUDIODRIVER
  • EXCLAVES_XNUPROXY_EXCLAVE_EXCLAVEDRIVERKIT
  • EXCLAVES_XNUPROXY_EXCLAVE_SECURERTBUDDY_AOP (RT Buddy for Always On Processor)
  • EXCLAVES_XNUPROXY_EXCLAVE_SECURERTBUDDY_DCP (for Display Coprocessor)
  • EXCLAVES_XNUPROXY_EXCLAVE_CONCLAVECONTROL (conclave launcher control)
  • EXCLAVES_XNUPROXY_EXCLAVE_CONCLAVEDEBUG
  • EXCLAVES_XNUPROXY_EXCLAVE_SECURERTBUDDY_AOP_EDK (ExclaveDriverKit connection for Always On Processor)
  • EXCLAVES_XNUPROXY_EXCLAVE_SECURERTBUDDY_DCP_EDK (ExclaveDriverKit connection for Display CoProcessor)

Note RTBuddys are for communicating with RTKit, yet another Apple Operating System, that runs on the Display Coprocessor, Apple Neural Engine, NVMe controller, SMC Controller, Smart Keyboards, Siri Remote, Apple Pencil, AirPods, AirTags… and I assume the AOP.

Booting exclaves when the system is starting requires a delicately coordinated dance between the insecure and secure worlds. Anything going wrong usually ends up in a panic().

Booting occurs in three stages. Stage one is not visible in the open-source, however is likely a secure boot process where SK is loaded into memory and its code signatures are verified before being made executable. At the end of a successful stage one boot, the boot status is EXCLAVES_BS_NOT_STARTED.

  1. Initialises upcall server by creating a tightbeam endpoint for upcalls
  2. Enters secure world with a special call to collect boot information from secure kernel
  3. Enters secure world again with normal endpoint call but not sure why… possibly to trigger the kernel domain to start
  4. Initialises the exclave scheduler
  5. Initialises the XrtHostedXNU kext
  6. Initialises callbacks (I think into the above kext)
  7. Boots the scheduler — sets up per-cpu request&response for the boot CPU core only, and binds to the boot core
  8. Loops, calling into the secure world to see if it needs memory allocations, until it responds that all exclaves are booted
  9. Initialises multicore by setting up per-cpu request&response memory for all cores
  10. Initialises XNU Proxy — creates a cache of buffers for IPC calls, creates some thread contexts, sets up a tightbeam endpoint for downcalls to the xnuproxy
  11. Initialises an exclaves panic kernel thread
  12. Discovers all static exclave resources and builds the root_table of domains and resources.
  13. Creates tightbeam endpoints for all Conclave Manager resources and calls an initialisation process for each one.
  14. Populates a bitmap of valid conclave service ids (from 0 to 127) for each conclave.
  15. At kernel build time, a list of boot tasks was stored in the __DATA_CONST segment. These are now sorted by priority and each boot task function is called. I likely only have a very partial picture here, but these tasks include creating an endpoint for each of the exclave indicator controller service, the storage backend service, the logserver, and for stackshots.
  16. Boot status is now EXCLAVES_BS_BOOTED_STAGE_2

The stage makes multiple calls regarding “framemint”. This is suggestive of the SK being based on seL4.

  1. The “com.apple.service.FrameMint” service is looked up and a tightbeam endpoint is created for it
  2. A redacted function, framemint_framemint__init() is called
  3. A redacted framemint_framemint_populate() function is called but I guess this will be triggering all sorts of exciting activity to happen in the secure world
  4. Boot status is now EXCLAVES_BS_BOOTED_EXCLAVEKIT

SPTM “types” memory pages to control access to them via its different subsystems. Existing types included:

  • XNU_USER_EXEC
  • XNU_USER_DEBUG
  • XNU_USER_JIT
  • XNU_ROZONE
  • XNU_KERNEL_RESTRICTED
  • +Types for TXM, DART, etc

Exclaves have added:

  • SK_DEFAULT (exclusive to SK — inaccessible to XNU)
  • SK_IO (also exclusive to SK — inaccessible to XNU)
  • SK_SHARED_RO (memory shared between SK and XNU (read only for XNU)
  • SK_SHARED_RW (memory shared between SK and XNU (read+write for XNU)
联系我们 contact @ memedata.com