visionOS18 min readMay 30, 2026

Getting Started with visionOS: Your First Spatial Computing App

visionOS marks a new era in spatial computing, offering groundbreaking ways to interact with digital content. This guide provides a clear path for developers to begin building immersive experiences for Apple Vision Pro. You'll learn the essentials, from environment setup to deploying your first app.

Getting Started with visionOS: Your First Spatial Computing App

Introduction to visionOS and Spatial Computing

Welcome to the exciting world of visionOS, Apple's operating system for spatial computing. Designed from the ground up for Apple Vision Pro, visionOS seamlessly blends digital content with your physical space, opening up entirely new paradigms for app development. Unlike traditional platforms, visionOS applications can exist as 'windows' within a shared space, 'volumes' with 3D content, or fully 'immersive spaces' that transport users to new environments.

Developing for visionOS means thinking spatially. You'll work with familiar frameworks like SwiftUI and Xcode, but you'll also adopt new concepts from RealityKit and SwiftData, specifically tailored for spatial experiences. This platform empowers you to create apps that naturally coexist with the user's environment, allowing for intuitive interactions that feel more like touching and manipulating real objects. The potential applications are vast, ranging from entertainment and productivity to education and healthcare.

Before you dive into code, it's crucial to understand the core components that make visionOS unique. RealityKit handles the rendering of 3D content and spatial interactions, while SwiftUI extends its powerful declarative syntax to arrange 2D and 3D elements within shared spaces. You'll also encounter new metaphors like 'personas,' which are spatial representations of users during FaceTime calls, emphasizing the social aspect of spatial computing. Understanding these foundational ideas will set you up for success as you embark on your visionOS development journey.

Setting Up Your Development Environment

To start building visionOS apps, you'll need a Mac running macOS Sonoma (14.0 or later) and the latest version of Xcode. Xcode 15 or later includes the visionOS SDK and the visionOS simulator, which is essential for testing your apps without a physical Apple Vision Pro device. Ensure your Xcode installation is up to date by checking for updates in the Mac App Store or through Xcode's preferences.

  1. Download and Install Xcode: If you don't have Xcode 15+ already, download it from the Mac App Store. It's a large download, so ensure you have a stable internet connection and sufficient disk space.
  2. Launch Xcode: After installation, launch Xcode. This will prompt it to install any necessary components.
  3. Install visionOS Simulator: Xcode 15 includes the visionOS simulator by default. You can verify its presence or add additional simulators by navigating to Xcode > Settings > Platforms. Here, you should see 'visionOS' listed. If not, you might need to click the '+' button and add it.
  4. Create a New Project: From the Xcode welcome screen, select 'Create a new project.' Choose the 'visionOS' tab, and then select the 'App' template. This sets up a basic project structure with the necessary configurations for a visionOS application.

The project wizard will then ask for your product name, organization identifier, and importantly, the 'Initial Scene.' You'll typically start with 'Window' for a 2D experience, but you can also choose 'Volume' for a 3D object or 'Immersive Space' for a fully immersive environment. For our first app, we'll stick with 'Window,' as it's the simplest entry point.

Xcode will generate a new project with a ContentView and an App file. This familiar SwiftUI structure is where you'll begin crafting your spatial experiences. The simulator allows you to interact with your app using your Mac's mouse and keyboard, mimicking head and hand movements, and providing a powerful way to test your UI and spatial layouts.

Understanding Windows, Volumes, and Immersive Spaces

visionOS offers three primary ways to present content: Windows, Volumes, and Immersive Spaces. Understanding their differences is crucial for designing effective spatial experiences.

1. Windows: These are familiar 2D SwiftUI views, much like windows on macOS or iPadOS, but they exist within the user's shared space. They can be resized, repositioned, and stacked. Windows are ideal for traditional app interfaces, displaying text, images, and standard UI controls. They integrate seamlessly with the user's physical environment, appearing to float in space. You'll primarily use SwiftUI to construct content within windows, just as you would for other Apple platforms. They respect the user's surroundings and can fade into the background when not actively focused upon.

2. Volumes: A volume is a 3D container that can host 3D digital objects. Unlike windows, which are flat, volumes have depth and are defined by a specific width, height, and depth. They're perfect for showcasing 3D models, interactive diagrams, or small-scale 3D experiences that don't require taking over the entire environment. You'll often use RealityKit within volumes to render and animate 3D assets. You can define the size of a volume explicitly using the .frame modifier in SwiftUI combined with new depth parameters.

3. Immersive Spaces: These environments fully immerse the user in a digital world. An immersive space can range from a subtle augmentation of the user's physical room (a 'shared' immersive space) to a completely different digital environment (a 'full' immersive space). This is where you create truly transformative experiences, like games, virtual tours, or detailed 3D simulations. When an immersive space is active, content from other apps might be hidden, and the user's perception of their physical surroundings can be completely replaced. RealityKit is central to creating and managing content within immersive spaces, allowing for complex 3D scenes, physics, and advanced visual effects.

Choosing the right type of presentation for your app is a fundamental design decision. Most apps will likely start with windows and gradually introduce volumes or immersive spaces as needed. For example, a productivity app might use windows for task lists, volumes for interactive 3D charts, and an immersive space for a focused, distraction-free work environment.

Building Your First Windowed App

Let's create a simple visionOS app that displays text and a 3D model within a standard window. This will introduce you to basic SwiftUI components for visionOS and how to incorporate 3D assets.

Create a new visionOS app project in Xcode and ensure 'Initial Scene' is set to 'Window'. Xcode will generate YourApp.swift and ContentView.swift.

First, we'll modify ContentView.swift to display some text and a Model3D view. The Model3D view is a powerful new component in visionOS that allows you to load and display 3D models directly from your app's bundle or a URL. For this example, we'll assume you have a 3D model named cup.usdz in your project's Assets.xcassets catalog. If you don't have one, you can easily drag any .usdz file into your asset catalog.

When working with SwiftUI for visionOS, you'll find many familiar modifiers available. However, some modifiers, like .padding() or .frame(), might behave slightly differently in a spatial context, influenced by the surrounding environment. Experimentation in the simulator is key to understanding these nuances. The ZStack is particularly useful here for layering content within a window, and VStack and HStack continue to provide effective layout primitives.

Compatibility Note: This code requires visionOS 1.0 or later.

swift
import SwiftUI
import RealityKit

struct ContentView: View {
    var body: some View {
        VStack {
            Text("Welcome to visionOS!")
                .font(.largeTitle)
                .padding(.bottom)

            // Display a 3D model from the app bundle
            // Ensure 'cup.usdz' is added to your Asset Catalog or Bundle
            Model3D(named: "cup", bundle: realityKitBundle)
            {
                model in
                model
                    .resizable()
                    .aspectRatio(contentMode: .fit)
                    .frame(depth: 100)
                    .rotation3DEffect(.degrees(20), axis: (x: 1, y: 0, z: 0))
            } placeholder: {
                ProgressView()
            }
            .frame(width: 300, height: 200, depth: 100)
            .padding()

            Text("This is your first spatial app.")
                .font(.title2)
        }
        .padding()
        .glassBackgroundEffect()
    }
}

// Helper to get the bundle for RealityKit assets, if bundled separately.
// For assets in the main app bundle, `nil` or `Bundle.main` can often be used.
// `realityKitBundle` is typically used when assets are in a separate Swift Package.
// For assets directly in the main app's Asset Catalog, `nil` for bundle usually suffices.
// However, explicitly referencing the main bundle can be clearer.
private var realityKitBundle: Bundle? {
    // In most simple visionOS projects, assets are in the main bundle.
    // If your project structure (e.g., Swift packages) requires a specific bundle,
    // you might need to adjust this.
    return Bundle.main
}

Working with Input and Gestures

Interaction on visionOS primarily relies on your eyes, hands, and voice. Apple Vision Pro doesn't have physical controllers; instead, it uses sophisticated eye-tracking and hand-tracking to interpret user intent. This means you'll design UIs that are easily targetable with gaze and activatable with simple hand gestures.

Gaze: The user's gaze is crucial. Elements that are 'looked at' can highlight, providing visual feedback that they are targetable. You don't directly program 'gaze events' in the same way you do 'tap events,' but you design your UI so that it naturally responds to indirect selection via gaze.

Indirect Hand Gestures: The primary way users activate UI elements is through indirect hand gestures, such as a 'tap' (pinching thumb and index finger) or 'long press' while looking at an element. These gestures are automatically mapped to standard SwiftUI controls like Button and Toggle.

Direct Hand Gestures (for specific environments): In fully immersive spaces, or for certain types of interactions within volumes, direct hand gestures (like grabbing or swiping directly at virtual objects) become relevant. RealityKit provides tools for handling these more complex, physics-based interactions.

Let's enhance our ContentView with a button and a toggle to demonstrate basic interaction:

Notice how the Button and Toggle naturally respond to gaze and the 'tap' gesture. The onChange() modifier is used here to detect changes in the showModel state, which would then trigger a UI update. This reactive programming model is a cornerstone of SwiftUI development across all Apple platforms. You don't need to write explicit code to handle the gaze-tap interaction for standard SwiftUI controls; the system handles it for you, giving you more time to focus on your app's core logic and spatial arrangement.

Compatibility Note: This code requires visionOS 1.0 or later.

swift
import SwiftUI
import RealityKit

struct ContentView: View {
    @State private var showModel = true
    @State private var message = "Interact with me!"

    var body: some View {
        VStack {
            Text("Welcome to visionOS!")
                .font(.largeTitle)
                .padding(.bottom)

            if showModel {
                Model3D(named: "cup", bundle: realityKitBundle)
                {
                    model in
                    model
                        .resizable()
                        .aspectRatio(contentMode: .fit)
                        .frame(depth: 100)
                        .rotation3DEffect(.degrees(20), axis: (x: 1, y: 0, z: 0))
                } placeholder: {
                    ProgressView()
                }
                .frame(width: 300, height: 200, depth: 100)
                .padding()
            }
            
            Button("Toggle Model") {
                showModel.toggle()
                message = showModel ? "Model is visible!" : "Model is hidden!"
            }
            .padding()
            .buttonBorderShape(.capsule)
            
            Toggle(isOn: $showModel) {
                Text("Show 3D Cup")
            }
            .fixedSize()
            .padding(.horizontal, 50)

            Text(message)
                .font(.title2)
                .padding(.top)
        }
        .padding()
        .glassBackgroundEffect()
    }
}

private var realityKitBundle: Bundle? {
    return Bundle.main
}

Integrating RealityKit for Immersive Experiences

While SwiftUI is excellent for 2D UI and arranging 3D content within defined bounds (like Model3D), RealityKit is your go-to framework for creating truly interactive 3D scenes and fully immersive experiences. RealityKit provides powerful capabilities for rendering, physics, animations, and spatial audio in real-time.

To create an immersive experience, you'll define an ImmersiveSpace within your App file and then use a RealityView to host your 3D content. RealityView acts as a bridge between SwiftUI and RealityKit, allowing you to compose a RealityKit scene declaratively.

First, modify your app's entry point (YourApp.swift) to include an ImmersiveSpace:

swift
import SwiftUI

@main
struct YourApp: App {
    @State private var showImmersiveSpace = false

    var body: some Scene {
        WindowGroup {
            ContentView()
        }
        .windowStyle(.volumetric)

        // Define an ImmersiveSpace
        ImmersiveSpace(id: "ImmersiveSpace") {
            ImmersiveView()
        }
    }
}

Next, create ImmersiveView.swift. This is where we'll use RealityView to load and interact with 3D models. RealityView gives you access to an attachment and content closure. The content closure is where you'll add entities and configure your RealityKit scene. The attachment closure, if used, is for integrating 2D SwiftUI views within your 3D scene (e.g., showing a SwiftUI label attached to a 3D object).

In this example, we'll load a 3D model and apply a simple rotation. We're also using an EnvironmentalLight to ensure the scene is well-lit. RealityKit automatically handles PBR (Physically Based Rendering) materials, which means your 3D models will often look realistic with minimal setup, given good quality assets.

Compatibility Note: This code requires visionOS 1.0 or later.

swift
import SwiftUI
import RealityKit
import RealityKitContent // This module is automatically generated for your Reality Composer Pro project

struct ImmersiveView: View {
    var body: some View {
        RealityView {
            content, attachments in
            // Load and add a 3D model from your Reality Composer Pro project
            if let scene = try? await Entity(named: "ImmersiveScene", in: realityKitContentBundle) {
                content.add(scene)

                // Add an environmental light source to illuminate the scene
                let environment = try? await EnvironmentResource(named: "studio_small_03")
                if let environment {
                    let iblComponent = ImageBasedLightComponent(source: .resource(environment))
                    scene.components.set(iblComponent)
                    scene.components.set(ImageBasedLightReceiverComponent(imageBasedLight: scene))
                }

                // You could further modify `scene` here, e.g., position it
                scene.position = [0, 0, -2] // Position 2 meters in front of the user
            }
        }
        update: {
            content, attachments in
            // This update closure is called when specific dependencies change.
            // For dynamic updates to the scene, you might have @State variables
            // here that trigger changes to entities.
            print("RealityView updated")
        }
    }
}

// Helper to get the bundle for Reality Composer Pro content.
// This is typically the `RealityKitContent` bundle generated by Xcode.
private var realityKitContentBundle: Bundle? {
    return Bundle.module // 'Bundle.module' is used when RealityKitContent is a Swift Package.
                         // If it's directly in the app bundle, it might be Bundle.main
}

Understanding Scene Management and Best Practices

Developing for visionOS requires a mindful approach to scene management, performance, and user experience. Here are some best practices:

  1. Performance is paramount: Spatial computing is computationally intensive. Optimize your 3D models (polygon count, texture size), minimize draw calls, and use appropriate rendering techniques. Profile your app frequently using Xcode's Instruments.
  2. Comfort and accessibility: Design experiences that are comfortable for users. Avoid excessive head movement, sudden changes in velocity within immersive spaces, or content that's too close or too far away. Provide options for users to customize their experience, such as scaling models or adjusting environment brightness.
  3. Spatial Audio: Incorporate spatial audio to enhance immersion. RealityKit allows you to attach audio sources to 3D entities, so sounds emanate from their perceived location in space. This significantly improves the realism and presence of your app.
  4. Haptics (Limited): While Apple Vision Pro doesn't feature built-in haptics, you can provide visual and audio feedback for interactions. In scenarios involving external input devices, haptics might become relevant.
  5. Respect the user's space: When using shared spaces, ensure your app's content doesn't aggressively obstruct the user's view of their physical environment. Provide mechanisms to move, resize, or dismiss windows and volumes easily.
  6. Progressive Immersion: Start with less immersive experiences (windows) and allow users to opt into more immersive ones (volumes, full immersive spaces). Don't force users into full immersion unless it's critical to your app's core functionality.
  7. Error Handling: Be robust in handling the loading of 3D assets. Provide ProgressView placeholders and informative error messages if models fail to load.
  8. Asset Management with Reality Composer Pro: For complex 3D scenes and interactions, utilize Reality Composer Pro (a separate application from Apple) to build your scenes, apply materials, set up animations, and preview your content. It seamlessly integrates with Xcode, generating a RealityKitContent module that you can easily import into your Swift code.

By adhering to these guidelines, you'll create polished, performant, and delightful spatial experiences that truly leverage the unique capabilities of visionOS.

Frequently Asked Questions

What hardware do I need to develop for visionOS?
You need a Mac running macOS Sonoma (14.0 or later) and Xcode 15 or later. While an Apple Vision Pro device is ideal for testing, the visionOS simulator included with Xcode allows you to develop and test most app functionalities effectively without physical hardware.
Can I use Swift/SwiftUI for visionOS development?
Yes, Swift and SwiftUI are the primary languages and frameworks for visionOS development. SwiftUI is used for constructing your app's 2D and 3D UI, while RealityKit is used for managing and rendering advanced 3D scenes and immersive experiences. Familiar SwiftUI concepts translate directly.
What's the difference between a Window, Volume, and Immersive Space?
A 'Window' is a 2D SwiftUI view that floats in the user's physical space. A 'Volume' is a 3D container for displaying interactive 3D content within a defined cuboid region. An 'Immersive Space' fully takes over the user's field of view, transitioning them into a digital environment, which can range from subtle augmentation to complete replacement of their surroundings.
How do users interact with visionOS apps without controllers?
Users interact primarily through their eyes ('gaze') and indirect hand gestures (e.g., pinching thumb and index finger to 'tap'). The system intelligently interprets gaze as selection intent, and the pinch gesture as activation. Direct hand gestures, like grabbing, are used in certain immersive contexts.
How do I add 3D models to my visionOS app?
You can add 3D models (typically in USDZ format) by dragging them into your Xcode project's Asset Catalog or creating a dedicated Reality Composer Pro project. Once added, you can display them using `Model3D` in SwiftUI or by loading `Entity` objects directly into a RealityKit `RealityView`.
Is performance a major concern for visionOS apps?
Yes, performance is critical. Spatial computing applications are demanding. You should prioritize optimized 3D assets, efficient rendering techniques, and frequent profiling with Xcode's Instruments to ensure a smooth and comfortable user experience on Apple Vision Pro.
What is Reality Composer Pro's role in visionOS development?
Reality Composer Pro is a companion app that helps you create, prepare, and preview 3D content and scenes for RealityKit. It allows you to compose complex scenes with lights, audio, animations, and behaviors graphically, then integrate them seamlessly into your Xcode project via a generated `RealityKitContent` module.
Can I use SwiftData or Core Data in visionOS?
Yes, you can use SwiftData and Core Data for persistent storage in visionOS applications, just as you would on other Apple platforms. These frameworks are fully compatible and can be integrated into your spatial computing apps for managing local data.
#visionOS#Spatial Computing#SwiftUI#RealityKit#visionOS Development