AI
Dex Scanner
The Dex scanner is ShipThatApp's flagship AI feature: point the camera at anything — a plant, a fish, a rock, even a Pokémon card — and get back a rich, structured "Dex entry" identifying it. Every discovery is collected into a persistent, browseable Pokédex.
Under the hood it is a hybrid, two-pass identifier. Apple's on-device Vision framework gives an instant, offline, private coarse guess; the secure cloud Vision endpoint then returns a fine-grained, structured result. The two passes complement each other: the local guess seeds the UI immediately and acts as a graceful offline fallback, while the cloud pass produces the detailed entry the user saves.
Heads Up!
The Dex scanner reuses the same secure vision endpoint as the rest of the app. Every cloud call is an HMAC-signed request through your backend proxy, so your OpenAI keys never ship inside the app binary. See In-App Purchases and the API client for the shared request machinery.
Architecture at a Glance
The feature lives in three places:
- Views —
Views/AI/Pokedex/(the grid, the scan flow, the entry views, and theDexCategory/ScanResult/DexEntrymodels that back them). - Services —
Services/Vision/ImageClassifier.swift(on-device pass),Services/OpenAI/Vision/DexVisionService.swift(cloud pass), andServices/Pokedex/DexStore.swift(persistence). - Backend — the existing signed
Endpoints.visionendpoint, reused unchanged.
ScannerViewModel is the orchestrator. It runs the on-device pass, then the cloud pass, then builds (but does not insert) a DexEntry. Persistence is left to the view, which keeps the view model fully testable.
The Two-Pass Hybrid Flow
When the user captures or picks an image, ScannerViewModel.analyze(_:) drives the whole sequence. It moves through a small state machine — Phase.capturing → analyzing → result (or failed) — that the ScannerView renders.
Pass 0: Encode once, off the main actor
Before any classification happens, the captured UIImage is downscaled and JPEG-encoded once, off the main actor, and the resulting bytes are reused for both passes and for persistence. This avoids three separate main-thread encodes.
// Resize + JPEG-encode ONCE, off the main actor, then reuse the bytes for
// both passes and for persistence — avoids three main-thread encodes.
let maxDimension = Config.imageMaxDimension
let jpeg = await Task.detached(priority: .userInitiated) {
ScannerViewModel.downscaledJPEG(from: image, maxDimension: maxDimension, quality: 0.8)
}.value
Downscaling uses ImageIO thumbnailing (CGImageSourceCreateThumbnailAtIndex), capped at Config.imageMaxDimension. The resulting Data is held on imageData for the rest of the scan.
Pass 1: Instant on-device classification (offline, private, free)
The first pass is Apple's Vision framework via VNClassifyImageRequest, wrapped in ImageClassifier. It runs entirely on-device — no network, no cost, instant — and produces a coarse category and a confidence:
// 1. Instant, offline on-device pass.
let guess = await classifier.classify(imageData: jpeg)
withAnimation { localGuess = guess }
ImageClassifier takes pre-encoded JPEG Data (not a UIImage) precisely so it can reuse the bytes encoded in Pass 0. It runs off the main thread and never throws — a failed classification simply yields nil:
nonisolated private static func classify(cgImage: CGImage) -> LocalGuess? {
let request = VNClassifyImageRequest()
let handler = VNImageRequestHandler(cgImage: cgImage, options: [:])
do {
try handler.perform([request])
} catch {
// Best-effort, offline hint — a failure just means no local guess.
return nil
}
guard let observations = request.results, !observations.isEmpty else { return nil }
let ranked = observations.sorted { $0.confidence > $1.confidence }
guard let best = ranked.first else { return nil }
let topLabels = ranked.prefix(8).map { $0.identifier }
let category = DexCategory.matching(fromLabels: topLabels)
let label = best.identifier.replacingOccurrences(of: "_", with: " ")
return LocalGuess(category: category, label: label, confidence: Double(best.confidence))
}
The result is a LocalGuess:
nonisolated struct LocalGuess: Equatable, Sendable {
let category: DexCategory
/// The top raw label, lightly cleaned for display (e.g. "tree frog").
let label: String
/// Confidence in `[0, 1]`.
let confidence: Double
}
As soon as the local guess lands, the analyzing screen upgrades its headline from a generic "Analyzing…" to "Looks like a Plant…" — so the user sees a meaningful response before the cloud has even replied.
Pass 2: Rich cloud identification
The second pass calls DexVisionService.identify(imageData:categoryHint:). The on-device guess is passed through as a weak categoryHint, nudging the model without overriding it:
// 2. Rich cloud pass.
do {
let scan = try await visionService.identify(imageData: jpeg, categoryHint: guess?.category.title)
result = scan
phase = .result
} catch {
// graceful fallback — see below
}
DexVisionService reuses the existing VisionRequest / VisionResponse types and the signed Endpoints.vision endpoint — no backend change. It sends the base64 image plus a prompt that instructs the model to return strict JSON, then parses a ScanResult out of the response:
struct DexVisionService: DexVisionServicing {
func identify(imageData: Data, categoryHint: String?) async throws -> ScanResult {
let request = VisionRequest(
image: imageData.base64EncodedString(),
prompt: Self.prompt(categoryHint: categoryHint)
)
let body = try JSONEncoder().encode(request)
let result = await ApiClient.shared.sendRequest(
endpoint: Endpoints.vision,
body: body,
responseModel: VisionResponse.self
)
switch result {
case .success(let response):
return try ScanResult(jsonString: response.content)
case .failure(let error):
Logger.dex.error("Cloud identification request failed: \(error.localizedDescription)")
throw error
}
}
}
Because the request goes through ApiClient.shared.sendRequest, it is HMAC-signed with the per-session token before it leaves the device. The model never sees your keys, and neither does the app.
The Identification Prompt
The model is told it is "the identification engine for a Pokédex-style app" and asked to identify the single main subject as specifically as possible — including special handling for Pokémon (cards, figures, plush, or artwork), where it uses the Pokémon's type(s) as the "scientific name". The on-device categoryHint is injected only as a weak hint.
Crucially, the model is asked to respond with only a JSON object matching the ScanResult shape:
Respond with ONLY a JSON object — no markdown, no commentary — exactly matching:
{
"commonName": "string, the everyday name, e.g. Monstera Deliciosa / Clownfish / Pikachu",
"category": "one of: plant, animal, fish, bird, insect, reptile, mineral, fungus, food, pokemon, object, unknown",
"scientificName": "latin/binomial name, or Pokémon type(s), or empty string if unknown",
"summary": "one or two engaging sentences describing it",
"facts": ["3 to 5 short, fun, Pokédex-style facts"],
"rarity": "one of: Common, Uncommon, Rare, Legendary",
"confidence": 0.0
}
To customize the entries the Dex produces — different tone, extra fields, stricter rarity rules — edit DexVisionService.prompt(categoryHint:). It is the single source of truth for what the cloud model is asked to return.
Parsing the Result Tolerantly
Models don't always cooperate: they wrap JSON in markdown fences, add prose, or omit fields. ScanResult is built to survive all of that.
nonisolated struct ScanResult: Codable, Equatable {
var commonName: String
var category: String
var scientificName: String
var summary: String
var facts: [String]
var rarity: String
var confidence: Double
/// The matched, stable category for this result.
var dexCategory: DexCategory { DexCategory.matching(category) }
}
Three layers of tolerance keep a single bad response from blowing up the scan:
- Fence + brace extraction.
ScanResult.extractJSONObject(from:)strips markdown code fences and slices out the first balanced{ … }object, tracking string literals so a brace inside a value (or stray trailing prose) doesn't throw off the extraction. - Per-field tolerant decoding. The custom
init(from:)decodes every field withtry?and a sensible default, so a missing field never fails the whole scan. - Refusal rejection.
init(jsonString:)rejects all-default or refusal payloads — valid JSON that contains no real identification — so the flow surfaces a failure (or offline fallback) instead of letting the user save a bogus "Unknown" entry.
// Reject all-default / refusal payloads (valid JSON, but no real ID) so the
// scan flow surfaces a failure or offline fallback instead of a bogus
// "Unknown" success that the user could save.
let name = decoded.commonName.trimmingCharacters(in: .whitespacesAndNewlines)
let isEmptyOrRefusal = name.isEmpty
|| (name.caseInsensitiveCompare("Unknown") == .orderedSame && decoded.summary.isEmpty)
if isEmptyOrRefusal {
throw ParseError.invalidJSON
}
Graceful Offline Fallback
If the cloud pass throws — no network, a parse failure, a refusal — the scan does not simply fail when an on-device guess exists. Instead it degrades to an offline-only entry built purely from the local guess, and tells the user via a Toast:
} catch {
Logger.dex.error("Identification failed: \(error.localizedDescription)")
if let guess {
// Degrade gracefully: an offline-only entry beats nothing.
result = ScanResult.offlineFallback(for: guess)
phase = .result
Toast.shared.present(
title: "Identified offline — reconnect for full details",
symbol: "wifi.slash",
tint: .orange,
timing: .short
)
} else {
errorMessage = error.localizedDescription
phase = .failed
// ...
}
}
The fallback entry carries the on-device category and confidence plus a clear note that the rich details require reconnecting:
static func offlineFallback(for guess: LocalGuess) -> ScanResult {
ScanResult(
commonName: guess.label.capitalized,
category: guess.category.rawValue,
scientificName: "",
summary: "Identified on-device while offline. Reconnect and scan again for the full Dex entry.",
facts: [],
rarity: "Common",
confidence: guess.confidence
)
}
Only when there is no local guess and the cloud fails does the flow move to Phase.failed. Errors are always logged with Logger.dex and surfaced via Toast — never silent.
Categories
DexCategory is the stable vocabulary the whole feature speaks. It's a nonisolated, Codable, Sendable enum (so it can cross actor boundaries from the off-main classifier and @Model inits), and each case carries its display title, SF Symbol, and tint color used for badges throughout the UI.
nonisolated enum DexCategory: String, CaseIterable, Codable, Sendable {
case plant, animal, fish, bird, insect, reptile
case mineral, fungus, food, pokemon, object, unknown
}
Both the cloud category string and the on-device Vision labels are free-form, so DexCategory maps them onto the stable set. Exact rawValue matches win; otherwise ordered keyword heuristics decide, scanning the highest-confidence labels first:
/// Maps a ranked list of labels (e.g. on-device Vision identifiers) to the
/// best-fitting category. Exact `rawValue` matches win; otherwise keyword
/// heuristics decide, scanning highest-confidence labels first.
static func matching(fromLabels labels: [String]) -> DexCategory {
let normalized = labels.map { $0.lowercased() }
for label in normalized {
if let exact = DexCategory(rawValue: label) { return exact }
}
for label in normalized {
if let matched = keywordMatch(label) { return matched }
}
return .unknown
}
The keyword matcher is deliberately careful: short needles (like "ant") match whole words only — tokenizing the label first — so "plant" and "elephant" don't resolve to .insect, while 4+ character needles still match as substrings so "goldfish" and "houseplant" resolve via "fish" and "plant".
Persistence: A Dedicated SwiftData Store
Every saved discovery is a SwiftData @Model:
@Model
final class DexEntry {
@Attribute(.unique) var id: String
var commonName: String
var categoryRaw: String
var scientificName: String
var summary: String
var facts: [String]
var rarity: String
var confidence: Double
/// What the on-device classifier guessed, kept for transparency in the UI.
var localGuessLabel: String
@Attribute(.externalStorage) var imageData: Data
var dateDiscovered: Date
/// The stable category, derived from the stored raw value.
var category: DexCategory { DexCategory(rawValue: categoryRaw) ?? .unknown }
}
A few details worth noting: the captured photo is stored with @Attribute(.externalStorage) so large image blobs live outside the main store; localGuessLabel is persisted for transparency (the detail page can show what the on-device pass thought); and a convenience initializer builds an entry straight from a ScanResult.
Its own isolated store
The Dex does not share the app's default SwiftData store. DexStore vends a ModelContainer backed by its own file so the Dex schema can never collide with other models (such as ChatMessage from the chat feature):
@MainActor
final class DexStore {
static let shared = DexStore()
let container: ModelContainer
private init() {
do {
let supportURL = URL.applicationSupportDirectory
try FileManager.default.createDirectory(at: supportURL, withIntermediateDirectories: true)
let storeURL = supportURL.appending(path: "Dex.store")
let configuration = ModelConfiguration(url: storeURL)
container = try ModelContainer(for: DexEntry.self, configurations: configuration)
} catch {
Logger.dex.error("Failed to initialize Dex ModelContainer: \(error.localizedDescription)")
fatalError("Failed to initialize Dex ModelContainer: \(error.localizedDescription)")
}
}
}
This container is attached once at the Dex root, so the grid's @Query and the scanner's inserts share a single context:
struct PokedexHomeView: View {
var body: some View {
// Attach the dedicated, isolated Dex store so the grid's @Query and the
// scanner's inserts share one context.
PokedexGridScreen()
.modelContainer(DexStore.shared.container)
}
}
The Grid and the Entry Views
PokedexGridScreen is the home of the Dex. It uses @Query to fetch entries newest-first and renders them in an adaptive LazyVGrid, with a live discovery count and a persistent Scan button pinned to the bottom safe area:
@Query(sort: \DexEntry.dateDiscovered, order: .reverse) private var entries: [DexEntry]
When the collection is empty, an EmptyDexView invites the user to "Scan a plant, a fish, a rock — or even a Pokémon — to start your collection." Tapping Scan presents ScannerView as a fullScreenCover. Each grid cell shows a downsampled thumbnail with a category badge; tapping one navigates to DexEntryDetailView.
The entry presentation is shared. DexEntryContentView renders the body of an entry — hero image, category badge, rarity pill, confidence meter, summary, and a "Field Notes" list — and the hero image is injected so the same view powers both:
ScanResultView— the live reveal card after a scan (hero is the capturedUIImage).DexEntryDetailView— a stored entry (hero is a downsampledDownsampledImageViewdecoded off the main thread from the persistedData).
Images are always downsampled off-main for the grid and detail views via DownsampledImageView, which keys its load task on the entry id (cheaper than hashing the full Data, and correct when SwiftUI recycles a cell for a different entry).
Saving a Discovery
The view model deliberately builds an entry without inserting it — persistence belongs to the view:
/// Builds — but does not insert — a `DexEntry` for the current result.
func makeEntry() -> DexEntry? {
guard let result, let imageData else { return nil }
return DexEntry(scan: result, localGuess: localGuess, imageData: imageData, discoveredAt: Date())
}
ScannerView.save() inserts and saves into the Dex context, then confirms with a Toast and dismisses — or logs and surfaces an error if the save fails.
Accessibility & Motion
The scan flow is built to be inclusive:
- The scanning sweep animation (
ScanningOverlay) is only shown when Reduce Motion is off. - Because the analysis is silent and asynchronous, the view posts a VoiceOver announcement when a result or failure appears (
AccessibilityNotification.Announcement). - The analyzing headline carries the
.updatesFrequentlytrait, and grid cells, badges, the confidence meter, and the scan buttons all have explicit accessibility labels and hints. - All text respects Dynamic Type.
Extending the Dex
The scanner is designed to be customized. The most common changes:
- Add or change a category. Add a case to
DexCategory, give it atitle,symbol, andtint, and (optionally) add keyword groups inkeywordMatch(_:)so on-device and cloud labels resolve to it. The badge styling, grid, and filters pick it up automatically. - Change what the model returns. Edit the prompt in
DexVisionService.prompt(categoryHint:). If you add fields, mirror them onScanResult(with tolerant decoding) and onDexEntryfor persistence. - Swap the classification or cloud engine. Both passes sit behind protocols —
ImageClassifyingandDexVisionServicing— and are injected intoScannerViewModel. Provide your own conforming type (for example, a custom Core ML model on-device) without touching the view model or the UI.
init(
classifier: ImageClassifying = ImageClassifier(),
visionService: DexVisionServicing = DexVisionService()
) {
self.classifier = classifier
self.visionService = visionService
}
Don't share the default store
The Dex keeps its own Dex.store on purpose. If you point DexStore at the app's default SwiftData container — or attach a second container for DexEntry elsewhere — you risk schema collisions and duplicated contexts. Keep all Dex reads and writes flowing through DexStore.shared.container.