Skip to content

Power BI / Power Query

Step-by-step Power Query setup for pulling GET /2020-01/roleplay/reports/sessions into Power BI: paged retrieval with throttle-aware retry, a session/goal star schema, and native incremental refresh keyed on publishedAt — the publication date — so refreshes can never silently miss a session.

  • An API key with Roleplay API access, sent as the X-API-KEY header.
  • Base URL: https://api.rehearsal.com (production).

Response envelope (what the queries below rely on)

Section titled “Response envelope (what the queries below rely on)”

| Field | Meaning | | --- | --- | | items | The session records for this page. | | hasMore | true — more records exist beyond this page. | | nextPageToken | Opaque cursor for the next page — pass it back verbatim. | | totalCount | Approximate total, present only on the first call (no pageToken). Capped at 10000. | | totalCountCapped | true means the total met/exceeded the 10000 cap — display as “10000+”. | | pageSize | Echo of the effective page size. |

With orderBy=publishedAt the final page also returns a nextPageToken — a resume watermark (see the API reference). This guide’s recipe windows by publishedAfter/publishedBefore instead, so the queries below simply stop when hasMore is false and can ignore it.

Other filters: environmentIds / learnerIds accept up to 100 IDs; to filter by learner email, resolve emails first via POST /2020-01/roleplay/lookup/profile-ids. pageSize range is 1–100 (default 10) — use 100 for bulk pulls. Date filters are detailed below.

The two timestamps: completedAt and publishedAt

Section titled “The two timestamps: completedAt and publishedAt”

Every record carries two UTC timestamps with different jobs:

| Timestamp | Meaning | Filters | Use for | | --- | --- | --- | --- | | completedAt | When the learner finished the session | completedAfter / completedBefore | Business questions (“May’s sessions”), slicing, one-time pulls | | publishedAt | When the session’s results became visible in this report — its AI analysis succeeded, typically under a minute after completion, occasionally up to ~15 minutes on retries | publishedAfter / publishedBefore (require orderBy=publishedAt) | Replication: incremental refresh, warehouse sync |

One boundary rule for all four date filters: *After is inclusive (>=), *Before is exclusive (<), and a date-only value (YYYY-MM-DD) means midnight UTC at the start of that day. Adjacent half-open windows never overlap and never gap — “all of May” is completedAfter=2026-05-01&completedBefore=2026-06-01. Datetimes must carry a timezone (Z or an offset like -04:00); zoneless datetimes are rejected with a 400. (publishedAfter/publishedBefore accept only the datetime form.)

Working in a local timezone: all timestamps are UTC, and the recommended approach is to keep the API window in UTC too — replicate by publishedAt (the incremental-refresh recipe below) and group by local day in your data model, using a date dimension table that maps UTC dates to your local calendar. DST then lives in one maintained table instead of in filter logic. Power Query itself has no IANA timezone support (DateTimeZone.SwitchZone applies a fixed offset, which goes wrong across DST transitions), so local-day logic belongs in the model, not in M.

If you genuinely must limit a business-window pull to one exact local day at the API, send local midnight-to-midnight as offset datetimes — computed by an IANA-aware timezone library, never hardcoded — e.g. May 29 in New York:

completedAfter=2026-05-29T00:00:00-04:00&completedBefore=2026-05-30T00:00:00-04:00
  1. In Power Query, create the two incremental-refresh parameters: Manage Parameters → NewRangeStart and RangeEnd (exactly those names), type Date/Time, any initial values a few days apart (e.g. 2026-05-01 and 2026-06-01; Desktop uses them only for preview — the Power BI service replaces them per partition).

  2. Create a blank query named SessionsRaw and paste Query 1. Replace YOUR_API_KEY — or better, promote BaseUrl / ApiKey to Power BI parameters.

  3. Right-click SessionsRawEnable load: off (it is a base query the others build on).

  4. Create Sessions from Query 2 and SessionGoals from Query 3.

  5. When prompted for credentials, connect as Anonymous (the API key travels in the header).

  6. In the model view, relate SessionGoals[sessionId]Sessions[sessionId] (many-to-one) for a star schema.

  7. Configure the incremental-refresh policy on both tables — see Incremental refresh: configuration.

Query 1 — SessionsRaw (base query; disable load)

Section titled “Query 1 — SessionsRaw (base query; disable load)”

Fetches all pages with throttle-aware retry and flattens to one row per session, keeping the goals list column for Query 3. In Power BI: right-click the query → Enable load: off.

SessionsRaw
let
// ── Configuration (promote BaseUrl/ApiKey to Power BI parameters) ─────────
BaseUrl = "https://api.rehearsal.com",
ApiKey = "YOUR_API_KEY",
PageSize = "100", // max; fewer requests = fewer throttles
// RangeStart/RangeEnd are Power BI's incremental-refresh parameters (type
// Date/Time, exactly those names — created in step 1). Values are treated
// as UTC. The API window is half-open exactly like Power BI's partition
// convention: >= RangeStart, < RangeEnd — every record lands in exactly
// one partition.
ToApiUtc = (t as datetime) as text =>
DateTime.ToText(t, [Format = "yyyy-MM-dd'T'HH:mm:ss"]) & "Z",
// ── One page, with 429-aware retry ─────────────────────────────────────────
// Retries ONLY on HTTP 429 (throttle). Other statuses fail immediately — a 400
// never succeeds on retry. Max 5 attempts per page (1 initial + 4 retries).
// Delay: Retry-After header when present, else exponential 0.5s, 1s, 2s, 4s.
FetchPage = (pageToken as nullable text, attempt as number) as record =>
let
resp = Web.Contents(BaseUrl, [
RelativePath = "2020-01/roleplay/reports/sessions",
Query = [
pageSize = PageSize,
orderBy = "publishedAt",
publishedAfter = ToApiUtc(RangeStart),
publishedBefore = ToApiUtc(RangeEnd)
]
& (if pageToken <> null then [pageToken = pageToken] else []),
Headers = [#"X-API-KEY" = ApiKey],
ManualStatusHandling = {429}, // inspect 429 instead of erroring
IsRetry = attempt > 1 // bypass Power Query's response cache on retry
]),
meta = Value.Metadata(resp),
status = meta[Response.Status],
// Honor Retry-After when the service sends one; fall back to exponential.
headers = Record.FieldOrDefault(meta, "Headers", []),
retryAfterText = Record.FieldOrDefault(headers, "Retry-After", null),
retryAfterSec = if retryAfterText <> null then (try Number.From(retryAfterText) otherwise null) else null,
delaySec = if retryAfterSec <> null and retryAfterSec > 0
then retryAfterSec
else Number.Power(2, attempt - 1) * 0.5,
result =
if status = 429 and attempt < 5 then
Function.InvokeAfter(() => @FetchPage(pageToken, attempt + 1), #duration(0, 0, 0, delaySec))
else if status = 429 then
error "Rehearsal API: still throttled (HTTP 429) after 5 attempts"
else
Json.Document(resp)
in
result,
// ── Walk all pages via nextPageToken ───────────────────────────────────────
// 1s self-throttle between pages keeps a bulk pull comfortably under the
// gateway rate limit, so the retry path above is rarely exercised.
AllPages = List.Generate(
() => FetchPage(null, 1),
each _ <> null,
each if _[hasMore]
then Function.InvokeAfter(() => FetchPage(_[nextPageToken], 1), #duration(0, 0, 0, 1))
else null
),
// ── Flatten to one row per session ────────────────────────────────────────
AllItems = List.Combine(List.Transform(AllPages, each _[items])),
AsTable = Table.FromList(AllItems, Splitter.SplitByNothing(), {"session"}),
Expanded = Table.ExpandRecordColumn(AsTable, "session",
{"sessionId", "startedAt", "completedAt", "publishedAt", "status", "score", "feedback", "learner", "environment", "goals"}),
ExpandLearner = Table.ExpandRecordColumn(Expanded, "learner",
{"id", "name", "email"}, {"learner.id", "learner.name", "learner.email"}),
ExpandEnvironment = Table.ExpandRecordColumn(ExpandLearner, "environment",
{"id", "name"}, {"environment.id", "environment.name"}),
Typed = Table.TransformColumnTypes(ExpandEnvironment, {
{"sessionId", type text},
{"startedAt", type datetimezone},
{"completedAt", type datetimezone},
{"publishedAt", type datetimezone},
{"score", type number}
}),
// Redundant with the server-side window, but makes each partition provably
// pure — Power BI's incremental refresh expects rows filtered by
// RangeStart/RangeEnd, and this guarantees it locally too.
Partitioned = Table.SelectRows(Typed, each
[publishedAt] >= DateTime.AddZone(RangeStart, 0) and
[publishedAt] < DateTime.AddZone(RangeEnd, 0))
in
Partitioned
Sessions
let
Source = SessionsRaw,
NoGoals = Table.RemoveColumns(Source, {"goals"})
in
NoGoals

One row per goal per session. Relate to Sessions on sessionId (many-to-one) for a star schema.

SessionGoals
let
Source = Table.SelectColumns(SessionsRaw, {"sessionId", "goals"}),
PerGoal = Table.ExpandListColumn(Source, "goals"),
Expanded = Table.ExpandRecordColumn(PerGoal, "goals",
{"name", "status", "explanation"}, {"goal.name", "goal.status", "goal.explanation"})
in
Expanded

goal.status carries the canonical tri-state outcome: met / partiallyMet / notMet.

With RangeStart/RangeEnd wired into Query 1, configure the policy:

  1. In the report view, right-click the Sessions table → Incremental refresh.

  2. Archive data starting: e.g. 2 years before refresh date (size to how much history you want in the model).

  3. Incrementally refresh data starting: e.g. 14 days before refresh date. Leave “Get the latest data in real time (DirectQuery)” and “Only refresh complete days” off.

  4. Repeat for SessionGoals (it inherits the window through SessionsRaw).

  5. Publish to the service. The first scheduled refresh builds all historical partitions — that is your backfill; no separate full pull is needed.

Why this recipe cannot lose a session:

  • The lag case is covered by construction. A session that completes just before a refresh but is analyzed just after it gets publishedAt = analysis time — inside the next refresh’s 14-day window. It simply arrives one refresh later, regardless of when it completed.
  • Both bounds, exactly one partition. Each partition queries publishedAfter = RangeStart and publishedBefore = RangeEnd — the API’s half-open window matches Power BI’s own >= RangeStart, < RangeEnd partition convention, so a record can never be double-counted across partitions. (A lower bound alone would double-count every overlap.)
  • Re-deliveries dedupe themselves. Refreshing a partition replaces its rows wholesale, so the at-least-once delivery semantics cost nothing inside the refresh window.
  • Group by environment.id / learner.id (stable UUIDs) and label with environment.name / learner.name. Names can change; grouping by name splits history on rename.
  • One-time business pulls (an ad-hoc export, a fiscal-quarter extract) can use completedAfter/completedBefore on a separate query — business slicing is what the completion-time filters are for. Keep scheduled refresh on the publishedAt recipe above.
  • The queries use RelativePath/Query against a fixed BaseUrl deliberately — that keeps the data source static, so scheduled refresh in the Power BI service works (string-concatenated URLs get flagged as a dynamic data source and refuse to refresh).

The API key is subject to an API-gateway usage plan (token bucket: steady requests/sec + burst). The cheapest strategy is not needing retries at all:

  1. pageSize=100 — a few hundred sessions is a handful of requests.
  2. Pace page requests ~1s apart (built into Query 1) — negligible cost in a scheduled refresh.
  3. Let incremental refresh do its job — each scheduled refresh pulls only the refresh-window partitions, not full history.

The retry in Query 1 exists as a safety net and follows these rules: trigger on HTTP 429 only; max 5 attempts per page request (budget resets each page); honor Retry-After when present, else exponential 0.5s → 4s. There is intentionally no jitter — fine for one sequential walker. If you run several queries against the same key in parallel, add random jitter (±50%) to delaySec so retries don’t synchronize.