Power BI / Power Query
Step-by-step Power Query setup for pulling GET /2020-01/roleplay/reports/sessions into Power BI:
paged retrieval with throttle-aware retry, a session/goal star schema, and native incremental
refresh keyed on publishedAt — the publication date — so refreshes can never silently miss a
session.
Prerequisites
Section titled “Prerequisites”- An API key with Roleplay API access, sent as the
X-API-KEYheader. - Base URL:
https://api.rehearsal.com(production).
Response envelope (what the queries below rely on)
Section titled “Response envelope (what the queries below rely on)”| Field | Meaning |
| --- | --- |
| items | The session records for this page. |
| hasMore | true — more records exist beyond this page. |
| nextPageToken | Opaque cursor for the next page — pass it back verbatim. |
| totalCount | Approximate total, present only on the first call (no pageToken). Capped at 10000. |
| totalCountCapped | true means the total met/exceeded the 10000 cap — display as “10000+”. |
| pageSize | Echo of the effective page size. |
With orderBy=publishedAt the final page also returns a nextPageToken — a resume watermark
(see the API reference). This guide’s recipe windows by publishedAfter/publishedBefore instead,
so the queries below simply stop when hasMore is false and can ignore it.
Other filters: environmentIds / learnerIds accept up to 100 IDs; to filter by learner email, resolve
emails first via POST /2020-01/roleplay/lookup/profile-ids. pageSize range is 1–100 (default 10) —
use 100 for bulk pulls. Date filters are detailed below.
The two timestamps: completedAt and publishedAt
Section titled “The two timestamps: completedAt and publishedAt”Every record carries two UTC timestamps with different jobs:
| Timestamp | Meaning | Filters | Use for |
| --- | --- | --- | --- |
| completedAt | When the learner finished the session | completedAfter / completedBefore | Business questions (“May’s sessions”), slicing, one-time pulls |
| publishedAt | When the session’s results became visible in this report — its AI analysis succeeded, typically under a minute after completion, occasionally up to ~15 minutes on retries | publishedAfter / publishedBefore (require orderBy=publishedAt) | Replication: incremental refresh, warehouse sync |
One boundary rule for all four date filters: *After is inclusive (>=), *Before is
exclusive (<), and a date-only value (YYYY-MM-DD) means midnight UTC at the start of that
day. Adjacent half-open windows never overlap and never gap — “all of May” is
completedAfter=2026-05-01&completedBefore=2026-06-01. Datetimes must carry a timezone (Z or an
offset like -04:00); zoneless datetimes are rejected with a 400.
(publishedAfter/publishedBefore accept only the datetime form.)
Working in a local timezone: all timestamps are UTC, and the recommended approach is to keep the API
window in UTC too — replicate by publishedAt (the incremental-refresh recipe below) and group by
local day in your data model, using a date dimension table that maps UTC dates to your local calendar.
DST then lives in one maintained table instead of in filter logic. Power Query itself has no IANA
timezone support (DateTimeZone.SwitchZone applies a fixed offset, which goes wrong across DST
transitions), so local-day logic belongs in the model, not in M.
If you genuinely must limit a business-window pull to one exact local day at the API, send local midnight-to-midnight as offset datetimes — computed by an IANA-aware timezone library, never hardcoded — e.g. May 29 in New York:
completedAfter=2026-05-29T00:00:00-04:00&completedBefore=2026-05-30T00:00:00-04:00Setting up in Power BI
Section titled “Setting up in Power BI”-
In Power Query, create the two incremental-refresh parameters: Manage Parameters → New —
RangeStartandRangeEnd(exactly those names), type Date/Time, any initial values a few days apart (e.g.2026-05-01and2026-06-01; Desktop uses them only for preview — the Power BI service replaces them per partition). -
Create a blank query named
SessionsRawand paste Query 1. ReplaceYOUR_API_KEY— or better, promoteBaseUrl/ApiKeyto Power BI parameters. -
Right-click
SessionsRaw→ Enable load: off (it is a base query the others build on). -
When prompted for credentials, connect as Anonymous (the API key travels in the header).
-
In the model view, relate
SessionGoals[sessionId]→Sessions[sessionId](many-to-one) for a star schema. -
Configure the incremental-refresh policy on both tables — see Incremental refresh: configuration.
Query 1 — SessionsRaw (base query; disable load)
Section titled “Query 1 — SessionsRaw (base query; disable load)”Fetches all pages with throttle-aware retry and flattens to one row per session, keeping the goals
list column for Query 3. In Power BI: right-click the query → Enable load: off.
let // ── Configuration (promote BaseUrl/ApiKey to Power BI parameters) ───────── BaseUrl = "https://api.rehearsal.com", ApiKey = "YOUR_API_KEY", PageSize = "100", // max; fewer requests = fewer throttles
// RangeStart/RangeEnd are Power BI's incremental-refresh parameters (type // Date/Time, exactly those names — created in step 1). Values are treated // as UTC. The API window is half-open exactly like Power BI's partition // convention: >= RangeStart, < RangeEnd — every record lands in exactly // one partition. ToApiUtc = (t as datetime) as text => DateTime.ToText(t, [Format = "yyyy-MM-dd'T'HH:mm:ss"]) & "Z",
// ── One page, with 429-aware retry ───────────────────────────────────────── // Retries ONLY on HTTP 429 (throttle). Other statuses fail immediately — a 400 // never succeeds on retry. Max 5 attempts per page (1 initial + 4 retries). // Delay: Retry-After header when present, else exponential 0.5s, 1s, 2s, 4s. FetchPage = (pageToken as nullable text, attempt as number) as record => let resp = Web.Contents(BaseUrl, [ RelativePath = "2020-01/roleplay/reports/sessions", Query = [ pageSize = PageSize, orderBy = "publishedAt", publishedAfter = ToApiUtc(RangeStart), publishedBefore = ToApiUtc(RangeEnd) ] & (if pageToken <> null then [pageToken = pageToken] else []), Headers = [#"X-API-KEY" = ApiKey], ManualStatusHandling = {429}, // inspect 429 instead of erroring IsRetry = attempt > 1 // bypass Power Query's response cache on retry ]), meta = Value.Metadata(resp), status = meta[Response.Status], // Honor Retry-After when the service sends one; fall back to exponential. headers = Record.FieldOrDefault(meta, "Headers", []), retryAfterText = Record.FieldOrDefault(headers, "Retry-After", null), retryAfterSec = if retryAfterText <> null then (try Number.From(retryAfterText) otherwise null) else null, delaySec = if retryAfterSec <> null and retryAfterSec > 0 then retryAfterSec else Number.Power(2, attempt - 1) * 0.5, result = if status = 429 and attempt < 5 then Function.InvokeAfter(() => @FetchPage(pageToken, attempt + 1), #duration(0, 0, 0, delaySec)) else if status = 429 then error "Rehearsal API: still throttled (HTTP 429) after 5 attempts" else Json.Document(resp) in result,
// ── Walk all pages via nextPageToken ─────────────────────────────────────── // 1s self-throttle between pages keeps a bulk pull comfortably under the // gateway rate limit, so the retry path above is rarely exercised. AllPages = List.Generate( () => FetchPage(null, 1), each _ <> null, each if _[hasMore] then Function.InvokeAfter(() => FetchPage(_[nextPageToken], 1), #duration(0, 0, 0, 1)) else null ),
// ── Flatten to one row per session ──────────────────────────────────────── AllItems = List.Combine(List.Transform(AllPages, each _[items])), AsTable = Table.FromList(AllItems, Splitter.SplitByNothing(), {"session"}), Expanded = Table.ExpandRecordColumn(AsTable, "session", {"sessionId", "startedAt", "completedAt", "publishedAt", "status", "score", "feedback", "learner", "environment", "goals"}), ExpandLearner = Table.ExpandRecordColumn(Expanded, "learner", {"id", "name", "email"}, {"learner.id", "learner.name", "learner.email"}), ExpandEnvironment = Table.ExpandRecordColumn(ExpandLearner, "environment", {"id", "name"}, {"environment.id", "environment.name"}), Typed = Table.TransformColumnTypes(ExpandEnvironment, { {"sessionId", type text}, {"startedAt", type datetimezone}, {"completedAt", type datetimezone}, {"publishedAt", type datetimezone}, {"score", type number} }),
// Redundant with the server-side window, but makes each partition provably // pure — Power BI's incremental refresh expects rows filtered by // RangeStart/RangeEnd, and this guarantees it locally too. Partitioned = Table.SelectRows(Typed, each [publishedAt] >= DateTime.AddZone(RangeStart, 0) and [publishedAt] < DateTime.AddZone(RangeEnd, 0))in PartitionedQuery 2 — Sessions (session grain)
Section titled “Query 2 — Sessions (session grain)”let Source = SessionsRaw, NoGoals = Table.RemoveColumns(Source, {"goals"})in NoGoalsQuery 3 — SessionGoals (goal grain)
Section titled “Query 3 — SessionGoals (goal grain)”One row per goal per session. Relate to Sessions on sessionId (many-to-one) for a star schema.
let Source = Table.SelectColumns(SessionsRaw, {"sessionId", "goals"}), PerGoal = Table.ExpandListColumn(Source, "goals"), Expanded = Table.ExpandRecordColumn(PerGoal, "goals", {"name", "status", "explanation"}, {"goal.name", "goal.status", "goal.explanation"})in Expandedgoal.status carries the canonical tri-state outcome:
met / partiallyMet / notMet.
Incremental refresh: configuration
Section titled “Incremental refresh: configuration”With RangeStart/RangeEnd wired into Query 1, configure the policy:
-
In the report view, right-click the
Sessionstable → Incremental refresh. -
Archive data starting: e.g. 2 years before refresh date (size to how much history you want in the model).
-
Incrementally refresh data starting: e.g. 14 days before refresh date. Leave “Get the latest data in real time (DirectQuery)” and “Only refresh complete days” off.
-
Repeat for
SessionGoals(it inherits the window throughSessionsRaw). -
Publish to the service. The first scheduled refresh builds all historical partitions — that is your backfill; no separate full pull is needed.
Why this recipe cannot lose a session:
- The lag case is covered by construction. A session that completes just before a refresh but
is analyzed just after it gets
publishedAt= analysis time — inside the next refresh’s 14-day window. It simply arrives one refresh later, regardless of when it completed. - Both bounds, exactly one partition. Each partition queries
publishedAfter = RangeStartandpublishedBefore = RangeEnd— the API’s half-open window matches Power BI’s own>= RangeStart,< RangeEndpartition convention, so a record can never be double-counted across partitions. (A lower bound alone would double-count every overlap.) - Re-deliveries dedupe themselves. Refreshing a partition replaces its rows wholesale, so the at-least-once delivery semantics cost nothing inside the refresh window.
Modeling notes
Section titled “Modeling notes”- Group by
environment.id/learner.id(stable UUIDs) and label withenvironment.name/learner.name. Names can change; grouping by name splits history on rename. - One-time business pulls (an ad-hoc export, a fiscal-quarter extract) can use
completedAfter/completedBeforeon a separate query — business slicing is what the completion-time filters are for. Keep scheduled refresh on thepublishedAtrecipe above. - The queries use
RelativePath/Queryagainst a fixedBaseUrldeliberately — that keeps the data source static, so scheduled refresh in the Power BI service works (string-concatenated URLs get flagged as a dynamic data source and refuse to refresh).
Throttling: what to expect
Section titled “Throttling: what to expect”The API key is subject to an API-gateway usage plan (token bucket: steady requests/sec + burst). The cheapest strategy is not needing retries at all:
pageSize=100— a few hundred sessions is a handful of requests.- Pace page requests ~1s apart (built into Query 1) — negligible cost in a scheduled refresh.
- Let incremental refresh do its job — each scheduled refresh pulls only the refresh-window partitions, not full history.
The retry in Query 1 exists as a safety net and follows these rules: trigger on HTTP 429 only; max 5
attempts per page request (budget resets each page); honor Retry-After when present, else exponential
0.5s → 4s. There is intentionally no jitter — fine for one sequential walker. If you run several
queries against the same key in parallel, add random jitter (±50%) to delaySec so retries don’t
synchronize.