Zimeng Xiong's Weblog

About

Excalidash, a whiteboard organizer

The robot was done, the mechanisms were functioning smoothly, and our next match wasn't for another hour. I sat in the pit, laptop open, refining our engineering notebook and staring at yet another Excalidraw tab with a drawing I'd probably never see again.

Excalidraw is, in many ways, the perfect diagramming tool. It's fast. It's beautiful with that hand-drawn aesthetic. It has an incredible community library system with premade drawing sets. But there's one thing that's always driven me insane: you can only work on one drawing at a time. Excalidraw stores everything in browser localStorage.

Sure, you can export to a .excalidraw file. Sure, you can import it back later. But that workflow is painful. Every time I want to start a new diagram, I have to manually save the current one, clear the canvas, and hope I remember where I put that file three weeks from now. There's no gallery. No collections. No way to see all my work at a glance.

I'd been meaning to solve this problem for months. Excalidraw is also open source and available as an NPM package. How hard could it be?


Part I, Version 1 - Cut and Reroute

November 18, 2025 — 2:00 PM

My first instinct was the most obvious one: fork Excalidraw and modify it.

The plan was elegant in theory. Excalidraw is open source. It saves to localStorage via a few specific functions in App.tsx. I would simply replace those localStorage calls with API calls to my own backend.

The architecture looked like this:

┌─────────────────┐         ┌─────────────────┐
│    Dashboard    │◄────────│   Excalidraw    │
│   (Container A) │  HTTP   │  (Container B)  │
│   Port: 8081    │────────►│   Port: 3001    │
│                 │         │                 │
│  - File Gallery │         │ - Drawing Tool  │
│  - REST API     │         │ - Modified Save │
└────────┬────────┘         └─────────────────┘
         │
         ▼
   ┌──────────┐
   │  Volume  │
   │  (Data)  │
   └──────────┘

The dashboard would be a simple Express server serving a gallery UI:

// dashboard/server.js
app.get('/', (req, res) => {
    fs.readdir(DATA_DIR, (err, files) => {
        const sceneFiles = files.filter(f => f.endsWith('.excalidraw'));
        // Render HTML gallery with file cards
        // Each card links to excalidraw?id=<filename>
    });
});

app.put('/api/v1/scenes/:id', (req, res) => {
    const filePath = path.join(DATA_DIR, `${req.params.id}.excalidraw`);
    fs.writeFileSync(filePath, JSON.stringify(req.body, null, 2));
    res.json({ success: true });
});

Simple. Elegant.

The Fork Problem

The Excalidraw repository is massive. Over 100,000 lines of code across a monorepo structure with multiple packages:

excalidraw/
├── packages/
│   ├── excalidraw/       # Core library
│   ├── element/          # Element handling
│   ├── math/             # Mathematical utilities
│   └── utils/            # Shared utilities
└── excalidraw-app/       # The web application

I started digging into excalidraw-app/App.tsx to find the localStorage calls. What I found was a labyrinth of state management, collaboration hooks, Firebase integration, and PWA service workers.

By 5 PM, I had made exactly zero commits:

$ cd Excalidraw-Dashboard && git log --oneline
fatal: your current branch 'main' does not have any commits yet

Lesson Learned

Forking a 100k+ LOC application and maintaining it against upstream updates is a maintenance nightmare. Every Excalidraw update would require careful merging. The approach was fundamentally unsustainable.


Part II

November 19, 2025 — Morning

Back home from the tournament. Fresh perspective. New approach.

What if instead of modifying Excalidraw, I wrapped it? Excalidraw is available as an NPM package (@excalidraw/excalidraw). I could embed it inside my own React application, handle all the persistence externally, and pass data in and out through props.

I created Excalidraw-Dashboardv2 with a completely new architecture:

Excalidraw-Dashboardv2/
├── docker-compose.yml
├── backend/
│   ├── server.js        # Express + SQLite
│   └── database.sqlite
├── frontend/
│   └── src/
│       ├── pages/
│       │   ├── Dashboard.tsx
│       │   ├── Editor.tsx
│       │   └── Settings.tsx
│       └── components/
└── excalidraw/          # Still forking, but less invasively

Wait—I was still forking Excalidraw? Yes, but the approach was different this time. Instead of modifying the persistence layer, I was modifying the deployment to work alongside my dashboard. The key insight from the AGENTS.md file I wrote to Gemini:

"Removed: The collab (real-time collaboration) and share (link sharing) directories/modules have been removed from excalidraw-app/. This aligns with the project's goal of being a local dashboard without external dependencies."

By stripping out the collaboration features (which required external Firebase services), I could make Excalidraw work purely as a local drawing tool.

0f32491 initial working!!
2dbbad3 working save to server
46000e4 working excalidraw and backend
fa1f388 fully working dashboard with integrated excalidraw editor
b5f1db3 working autosave
bc67c4b dropdown to select category
383a288 WORKING PREVIEWS YAY!
eee413f add import json/sqlite and export, style ui

You could feel the excitement building.

The Backend: SQLite Over Files

Version 1 used flat files (.excalidraw JSON files in a folder). Version 2 used SQLite:

const initializeDatabase = async () => {
    await dbRun(`
        CREATE TABLE IF NOT EXISTS drawings (
            id TEXT PRIMARY KEY,
            name TEXT,
            elements TEXT,      -- JSON string
            appState TEXT,      -- JSON string
            collectionId TEXT,
            updatedAt INTEGER,
            createdAt INTEGER,
            preview TEXT,       -- SVG string
            files TEXT          -- JSON string (embedded images)
        )
    `);
    
    await dbRun(`
        CREATE TABLE IF NOT EXISTS collections (
            id TEXT PRIMARY KEY,
            name TEXT,
            createdAt INTEGER
        )
    `);
};

SQLite provided atomic operations, easy backup/restore, and the ability to query drawings efficiently. Export became straightforward, just copy the .sqlite file.


Part III, Collaboration Catastrophe

November 20-21, 2025

Version 2 was working beautifully for single-user scenarios. But I wanted more. Excalidraw's collaboration features were one of its killer features

The approach was two-pronged:

  1. Local presence tracking using the BroadcastChannel API (for same-browser, different-tabs)
  2. Real collaboration using Socket.io (for different devices)

BroadcastChannel

The first problem was simpler: what happens when someone opens the same drawing in two browser tabs? The BroadcastChannel API lets same-origin tabs communicate:

const channel = new BroadcastChannel(`presence-room-${id}`);

const broadcastHello = () => {
    channel.postMessage({
        type: 'HELLO',
        id: myId,
        name: myName.current
    });
};

channel.addEventListener('message', handleMessage);
const heartbeatInterval = setInterval(broadcastHello, 3000);

This worked, but revealed a deeper problem: reconciliation. When two tabs both make changes, whose version wins?

Docker Collaboration: The Breaking Point

Emboldened by local success, I tried to add real multi-device collaboration. The official Excalidraw project has a collaboration server (excalidraw/excalidraw-room). I integrated it:

# docker-compose.yml (commit 3e9c665)
services:
  collaboration-server:
    image: excalidraw/excalidraw-room:latest
    networks:
      - excalidash-network
  
  excalidraw:
    build:
      context: ./excalidraw
      args:
        VITE_APP_WS_SERVER_URL: http://collaboration-server:80

The commit message: "docker collaboration support"

Eight hours later:

8dbe8b8 Revert "docker collaboration support"
This reverts commit 3e9c66514080d3f87eccb3cd0cc868751faf6e20.

It didn't work.

The problem was architectural. I had removed Excalidraw's collaboration modules to make it work as a local tool. But the collaboration server expected those modules to exist. I was trying to have it both ways—a stripped-down local app that could magically speak collaboration protocols.

Even worse, the README (that would never be published) documented a unacceptable limitation:

"Does not support multiple canvases open at the same time. (You can have multiple tabs open with different drawings, but you can only edit one at a time, and the next time you click the other tab it bugs out and you have to refresh)"

This was the reconciliation problem. When you have multiple sources of truth (different tabs, different devices), you need a strategy for merging conflicts. Version 3 didn't have one.

The State of Version 2

Version 3 was usable. It had:

  • Dashboard with drawing cards and previews
  • Collections for organization
  • Import/export (JSON and SQLite)
  • Dark mode
  • Dockerized deployment

But it was fundamentally broken for multi-tab usage, and collaboration was a dead end. I needed to rethink the architecture completely.


Part IV, ExcaliDash

November 22, 2025

I created a new repository: ExcaliDash.

The key architectural decisions:

1. Excalidraw as an NPM Package Only

No more forking. The Excalidraw component is imported directly:

import { Excalidraw, exportToSvg } from '@excalidraw/excalidraw';

All persistence, collaboration, and state management happens in my code and not in a modified Excalidraw.

2. Prisma + SQLite Instead of Raw SQL

The v2 backend used raw sqlite3 with callback-based APIs. ExcaliDash uses Prisma ORM:

// schema.prisma
model Drawing {
  id           String      @id @default(uuid())
  name         String
  elements     String      // JSON string
  appState     String      // JSON string
  files        String      @default("{}")
  preview      String?
  version      Int         @default(1)
  collectionId String?
  collection   Collection? @relation(fields: [collectionId], references: [id])
  createdAt    DateTime    @default(now())
  updatedAt    DateTime    @updatedAt
  trashedAt    DateTime?
}

Migrations, type safety, and a clean query API.

3. Socket.io for Real Collaboration (Done Right)

Instead of trying to use Excalidraw's collaboration infrastructure, I built my own:

io.on("connection", (socket) => {
  socket.on("join-room", ({ drawingId, user }) => {
    const roomId = `drawing_${drawingId}`;
    socket.join(roomId);
    
    // Track presence
    const newUser = { ...user, socketId: socket.id, isActive: true };
    const currentUsers = roomUsers.get(roomId) || [];
    const filteredUsers = currentUsers.filter((u) => u.id !== user.id);
    filteredUsers.push(newUser);
    roomUsers.set(roomId, filteredUsers);
    
    io.to(roomId).emit("presence-update", filteredUsers);
  });

  socket.on("cursor-move", (data) => {
    // Volatile emission - ok to lose packets for cursor positions
    socket.volatile.to(roomId).emit("cursor-move", data);
  });

  socket.on("element-update", ({ elements }) => {
    // Broadcast element changes to other users
    socket.to(roomId).emit("element-update", { elements });
  });
});

Each drawing becomes its own "room," and users joining that room receive presence updates about who else is there. For cursor movements, I used volatile emissions—meaning if a packet is lost, we just move on. Cursor position isn't worth guaranteed delivery when you're getting 60 updates per second.

4. The Transformer Identity System

A fun detail: for the collaboration presence system (showing who else was viewing a drawing), I needed random user identities. Instead of boring User123 labels, I populated an array with Transformer character names:

const TRANSFORMERS = [
  { name: "Optimus Prime", initials: "OP" },
  { name: "Megatron", initials: "ME" },
  { name: "Starscream", initials: "ST" },
  { name: "Bumblebee", initials: "BB" },
  { name: "Grimlock", initials: "GL" },
  // ... 50+ more Transformers
];

const COLORS = [
  "#ef4444", // red-500
  "#3b82f6", // blue-500
  "#22c55e", // green-500
  // ... 16 distinct colors
];

When you first visit ExcaliDash, you get assigned a random Transformer name and color, stored in localStorage. Now when you're collaborating, you see "Megatron" drawing a rectangle in purple, while "Bumblebee" is editing text in cyan.

We browser fingerprint with a randomly generated UUID with a fallback chain for older browsers:

const generateClientId = (): string => {
  const cryptoObj = globalThis.crypto || globalThis.msCrypto;
  
  if (cryptoObj?.randomUUID) {
    return cryptoObj.randomUUID();
  }
  
  if (cryptoObj?.getRandomValues) {
    // Manual UUID v4 generation
    const bytes = new Uint8Array(16);
    cryptoObj.getRandomValues(bytes);
    bytes[6] = (bytes[6] & 0x0f) | 0x40; // Version 4
    bytes[8] = (bytes[8] & 0x3f) | 0x80; // Variant
    // ... convert to UUID string
  }
  
  // Final fallback for ancient browsers
  return `id-Date.now().toString(16){Date.now().toString(16)}-{Math.random().toString(16).slice(2)}`;
};

Part V, Building Conflict-free Collaborative Drawing

What Broke in Version 2

Version 2 had a "working sharing" commit where I thought I'd solved collaboration. Two users could open the same drawing. Their browsers would communicate via BroadcastChannel or Socket.io.

It worked 90% of the time.

Suppose User A and User B are editing the same rectangle. User A is moving it to (100, 100). At the exact same moment, User B is resizing it to be 200x200. On User A's screen, the rectangle gets resized but moves back to its original position. On User B's screen, the rectangle moves but keeps its original size.

What happened:

  1. User A emits: { id: "rect-1", x: 100, y: 100, version: 5 }
  2. User B emits: { id: "rect-1", width: 200, height: 200, version: 5 }
  3. Both arrive at different times
  4. A's version runs the reconciliation algorithm and chooses... which one?

This is the core challenge of real-time collaborative editing. Not all conflicts are equal. Some can be safely merged (resize + move). Some cannot (both resizing to different dimensions).

The industry calls solutions to this problem "CRDTs"—Conflict-free Replicated Data Types. But ExcaliDash doesn't use a CRDT in the formal sense. It uses something more pragmatic: a Last-Write-Wins strategy with version vectors.

Understanding CRDTs

A CRDT is a data structure that can be replicated across multiple computers, allows concurrent updates without coordination, and always converges to the same final state.

The key word: "always." Not "usually." Not "when the network is healthy." Always.

There are two flavors:

Operation-based CRDTs (CmRDTs) track operations rather than state. Instead of sending "the rectangle is now at position (100, 100)", you send "move rectangle 50 pixels right". The operations are designed so they commute—order doesn't matter.

// Operation-based approach
const operations = [
  { type: "moveRight", element: "rect-1", amount: 50 },
  { type: "resizeWidth", element: "rect-1", amount: 100 },
  { type: "moveLeft", element: "rect-1", amount: 25 }
];

// These converge regardless of order:
applyOps(ops) === applyOps(shuffle(ops))

State-based CRDTs (CvRDTs) track the final state and merge by finding the least upper bound of all versions.

// State-based approach
const merge = (a: Rectangle, b: Rectangle) => ({
  x: a.timestamp > b.timestamp ? a.x : b.x,
  y: a.timestamp > b.timestamp ? a.y : b.y,
  width: a.timestamp > b.timestamp ? a.width : b.width,
  height: a.timestamp > b.timestamp ? a.height : b.height,
  timestamp: Math.max(a.timestamp, b.timestamp),
});

What ExcaliDash Actually Does

ExcaliDash uses a Last-Write-Wins (LWW) strategy, which is not a true CRDT. It's pragmatic, but limited.

The algorithm:

// frontend/src/utils/sync.ts
export const reconcileElements = (
  localElements: readonly any[],
  remoteElements: readonly any[]
): any[] => {
  const localMap = new Map<string, any>();
  localElements.forEach((el) => localMap.set(el.id, el));

  remoteElements.forEach((remoteEl) => {
    const localEl = localMap.get(remoteEl.id);
    
    if (!localEl) {
      // New element - add it
      localMap.set(remoteEl.id, remoteEl);
      return;
    }

    const remoteVersion = remoteEl.version ?? 0;
    const localVersion = localEl.version ?? 0;

    // PRIMARY: Version number (higher wins)
    if (remoteVersion > localVersion) {
      localMap.set(remoteEl.id, remoteEl);
      return;
    }

    if (remoteVersion < localVersion) {
      return; // Keep local
    }

    // SECONDARY: Timestamp (newer wins)
    const remoteUpdated = remoteEl.updated ?? 0;
    const localUpdated = localEl.updated ?? 0;

    if (remoteUpdated > localUpdated) {
      localMap.set(remoteEl.id, remoteEl);
      return;
    }

    // TERTIARY: Version nonce (break ties)
    if (
      remoteUpdated === localUpdated &&
      remoteEl.versionNonce !== localEl.versionNonce
    ) {
      localMap.set(remoteEl.id, remoteEl);
    }
  });

  return Array.from(localMap.values());
};

The Three-Tier Tiebreaker

Each Excalidraw element has three metadata fields:

  1. version: Integer counter. Increments each time the element changes.
  2. updated: Timestamp (milliseconds since epoch).
  3. versionNonce: Random value generated at creation time.

The merge logic:

  • Tier 1: Higher version wins
  • Tier 2: If versions tie, newer timestamp wins
  • Tier 3: If timestamps tie, different nonce = remote wins (arbitrary deterministic choice)

Why This Isn't a True CRDT

A true CRDT would guarantee: if the same operations are applied in any order to any initial state, the final state is identical.

ExcaliDash doesn't guarantee that. Here's why:

Scenario 1: Version Increment Race

Time 0: rect-1 has version=1
Time 1: User A changes x → version becomes 2, timestamp=1000
Time 1: User B changes y → version becomes 2, timestamp=1010
Time 2: Merge happens

Result: version tie, so we use timestamp.
User A's change is discarded because User B's timestamp is newer.
But User A's change was "earlier" in real time!

This happens because the version counter isn't synchronized across clients. Both A and B locally increment to version=2, but they made different changes.

Scenario 2: Clock Skew

Time 0: User A (fast clock) changes rect at system time 2000
Time 1: User B (slow clock) changes rect at system time 1995
Time 2: Changes propagate to both clients

Merge sees B's timestamp (1995) is earlier.
B's change is lost, even though B made it "after" A did.

The updated field is based on wall-clock time, which can skew between devices.

Scenario 3: The Nonce Illusion

User A and User B both edit the same element at the same timestamp.
The algorithm picks the one with the "different nonce".
This works in a pair, but with 3+ users, it becomes arbitrary.

The Field Merging Problem

What if we wanted to support true field-level merging? Like "User A moved the rectangle, User B resized it, and both changes should apply"?

We can't.

The algorithm works at the element level:

const localEl = {
  id: "rect-1",
  x: 50,
  y: 50,
  width: 100,
  height: 100,
  version: 5,
  updated: 1000,
};

const remoteEl = {
  id: "rect-1",
  x: 50,     // unchanged from 50
  y: 50,
  width: 200, // changed to 200
  height: 100,
  version: 5,  // same version!
  updated: 1000, // same timestamp!
};

// Merge result: We either keep local or remote.
// We CAN'T merge to { x: 50, width: 200 }
// because the algorithm doesn't work field-by-field.

If version and timestamp are identical, the algorithm uses the nonce, a random value—to pick one. This means User B's width change wins completely, and User A's x-position changes are lost if A's nonce doesn't match.

So Why Does It Work At All?

Because in practice:

  1. Version increments are usually unique. Each client increments its local counter, and Excalidraw's architecture means most edits result in version increments.

  2. Clock skew is usually small. Clocks are mostly synchronized these days.

  3. The element is the whole state. We're not trying to merge individual fields—we're keeping the entire element or replacing it.


Part VI: The Bugs We're Hiding From

Bug #1: The Deselection Exclusion

Look at this line in the reconciliation code:

socket.on('element-update', ({ elements }: { elements: any[] }) => {
  if (!excalidrawAPI.current) return;

  isSyncing.current = true;

  const currentAppState = excalidrawAPI.current.getAppState();
  const mySelectedIds = currentAppState.selectedElementIds || {};

  // THIS IS THE KEY LINE
  const validRemoteElements = elements.filter((el: any) => !mySelectedIds[el.id]);

  const localElements = excalidrawAPI.current.getSceneElementsIncludingDeleted();
  const mergedElements = reconcileElements(localElements, validRemoteElements);
  
  // ...
});

We filter out elements that the user is currently selecting.

Why? Because if you're in the middle of editing a rectangle, and your peer sends a change to that same rectangle, we don't want to interrupt you mid-edit.

But this creates a bug:

User A: Selects rect-1, starts dragging it
User B: Changes rect-1's color
User A: Still dragging rect-1

Result: User B's color change is ignored because rect-1 is selected on A's screen.

The merge never happens. User B's change is lost.

We mitigate this by deferring merges until the user deselects, but that's just a hack. The real issue: you can't safely merge while someone is editing.

Bug #2: The Version Increment Plateau

Every time a drawing is updated via API, the version increments:

const updatedDrawing = await prisma.drawing.update({
  where: { id },
  data: {
    version: { increment: 1 },  // <- Always increments
    elements: JSON.stringify(payload.elements),
    // ...
  },
});

But here's the scenario:

Time 0: Drawing has version=10
Time 1: User A loads drawing (gets version=10)
Time 2: User B loads drawing (gets version=10)
Time 3: User A makes 5 rapid changes locally (version seen locally: 15)
Time 4: User B makes 1 change, saves (API updates version to 11)
Time 5: User A's changes are sent Socket.io (never hitting API, so version stays 10)
Time 6: Merge happens

Local state (from API): version 11
Remote change (from A): version 10
Result: B's version is newer, so A's changes are discarded.

This is because the version counter is only incremented on API updates, not on local changes. The Socket.io messages contain stale version numbers.

How ExcaliDash Actually Avoids These Bugs

It's not the algorithm. It's the architecture.

Pattern 1: Debouncing Saves

const debouncedSave = useCallback(
  debounce((elements, appState) => {
    if (saveDataRef.current) {
      saveDataRef.current(elements, appState);
    }
  }, 1000),  // <- 1 second debounce
  []
);

By waiting 1 second between saves, we ensure that rapid, local changes are batched together.

Pattern 2: Broadcast Only Deltas

const broadcastChanges = useCallback(
  throttle((elements: readonly any[]) => {
    if (!socketRef.current || !id) return;
    
    const changes: any[] = [];

    elements.forEach((el) => {
      if (hasElementChanged(el)) {  // <- Check if version/nonce changed
        changes.push(el);
        recordElementVersion(el);
      }
    });
    
    if (changes.length > 0) {
      socketRef.current.emit('element-update', {
        drawingId: id,
        elements: changes,  // <- Only changed elements
        userId: me.id
      });
    }
  }, 100),
  [id, hasElementChanged, recordElementVersion]
);

We only broadcast elements that actually changed.

Pattern 3: Track Element Versions Locally

const elementVersionMap = useRef<Map<string, ElementVersionInfo>>(new Map());

const recordElementVersion = useCallback((element: any) => {
  elementVersionMap.current.set(element.id, {
    version: element.version ?? 0,
    versionNonce: element.versionNonce ?? 0,
  });
}, []);

const hasElementChanged = useCallback((element: any) => {
  const previous = elementVersionMap.current.get(element.id);
  if (!previous) return true;

  const nextVersion = element.version ?? 0;
  const nextNonce = element.versionNonce ?? 0;

  return previous.version !== nextVersion || previous.versionNonce !== nextNonce;
}, []);

By maintaining a local version map, we know which elements have changed since the last broadcast.


Part VII: Security

After pushing the initial MVP to Docker Hub, I started getting paranoid about security. A web app that accepts arbitrary JSON drawings and stores them in a database? That's an XSS playground waiting to happen.

XSS

Excalidraw elements can contain text, URLs in the link field, and arbitrary customData. Any of these could be vectors for script injection. My solution was a comprehensive sanitization layer using DOMPurify:

import DOMPurify from "dompurify";
import { JSDOM } from "jsdom";

// Create a DOM environment for DOMPurify (Node.js compatibility)
const window = new JSDOM("").window;
const purify = DOMPurify(window);

export const sanitizeHtml = (input: string): string => {
  return purify.sanitize(input, {
    ALLOWED_TAGS: ["b", "i", "u", "em", "strong", "p", "br", "span", "div"],
    ALLOWED_ATTR: [],
    FORBID_TAGS: [
      "script", "iframe", "object", "embed", "link", "style",
      "form", "input", "button", "select", "textarea",
      "svg", "foreignObject",
    ],
    FORBID_ATTR: [
      "onload", "onclick", "onerror", "onmouseover", "onfocus",
      "onblur", "onchange", "onsubmit", "onreset",
      "href", "src", "action", "formaction",
    ],
  });
};

But drawings also have SVG previews for thumbnails. SVG is notoriously dangerous—you can embed JavaScript via <foreignObject>, <script> tags, or event handlers. So SVG gets its own sanitizer:

export const sanitizeSvg = (svgContent: string): string => {
  return purify.sanitize(svgContent, {
    ALLOWED_TAGS: [
      "svg", "g", "rect", "circle", "ellipse", "line",
      "polyline", "polygon", "path", "text", "tspan",
    ],
    ALLOWED_ATTR: [
      "x", "y", "width", "height", "cx", "cy", "r",
      "fill", "stroke", "stroke-width", "opacity", "transform",
    ],
    FORBID_TAGS: [
      "script", "foreignObject", "iframe", "object",
      "use", "image", "style", "link", "defs", "symbol",
    ],
  });
};

URL Sanitization

The link field in elements posed a special challenge. You want to allow legitimate URLs but block javascript: pseudo-protocol attacks:

export const sanitizeUrl = (url: unknown): string => {
  if (typeof url !== "string") return "";
  const trimmed = url.trim();

  // Block dangerous protocols
  if (/^(javascript|data|vbscript):/i.test(trimmed)) {
    return "";
  }

  // Only allow safe protocols
  if (/^(https?:\/\/|mailto:|\/|\.\/|\.\.\/)/i.test(trimmed)) {
    return trimmed;
  }
  return "";
};

The Image Truncation Bug (Issue #17)

One of the trickier bugs involved image persistence. Users were reporting that images would disappear after saving and reopening a drawing. After investigation, I discovered my string sanitization was truncating all strings to 1000 characters—including base64 image data URLs that can easily be millions of characters.

The fix required granular handling:

// Maximum size for dataURL (configurable, default 10MB)
const MAX_DATAURL_SIZE = activeConfig.maxDataUrlSize;

for (const key in file) {
  const value = file[key];
  if (typeof value === "string") {
    if (key === "dataURL") {
      // Special handling for image data URLs
      const isSafeImageType = safeImageTypes.some((type) =>
        value.toLowerCase().startsWith(type)
      );

      if (isSafeImageType) {
        const hasSuspiciousContent = suspiciousPatterns.some((p) => p.test(value));
        const isTooLarge = value.length > MAX_DATAURL_SIZE;

        if (hasSuspiciousContent || isTooLarge) {
          file[key] = ""; // Clear suspicious/oversized
        }
        // Otherwise keep the full base64 data
      }
    } else {
      // Other string fields: strict sanitization
      file[key] = sanitizeText(value, 1000);
    }
  }
}

Zod Schema Validation

All incoming requests pass through Zod schemas for type-safe validation:

export const elementSchema = z
  .object({
    id: z.string().min(1).max(200).optional().nullable(),
    type: z.string().optional().nullable(),
    x: z.number().optional().nullable(),
    y: z.number().optional().nullable(),
    width: z.number().optional().nullable(),
    height: z.number().optional().nullable(),
    text: z.string().optional().nullable(),
    link: z.string().optional().nullable(),
    // ... many more fields
  })
  .passthrough()
  .transform((element) => {
    const sanitized = { ...element };
    if (typeof sanitized.text === "string") {
      sanitized.text = sanitizeText(sanitized.text, 5000);
    }
    if (typeof sanitized.link === "string") {
      sanitized.link = sanitizeUrl(sanitized.link);
    }
    return sanitized;
  });

The .passthrough() is crucial—Excalidraw might add new fields in updates, and we don't want to break compatibility by rejecting unknown properties.

SQLite Database Import Security

The app supports importing/exporting the entire database. This is extremely dangerous—accepting arbitrary SQLite files could allow remote code execution through malformed database files or SQL injection.

First line of defense: validate the SQLite magic header:

const validateSqliteHeader = (filePath: string): boolean => {
  const buffer = Buffer.alloc(16);
  const fd = fs.openSync(filePath, "r");
  fs.readSync(fd, buffer, 0, 16, 0);
  fs.closeSync(fd);

  // SQLite files start with "SQLite format 3\0"
  const expectedHeader = Buffer.from([
    0x53, 0x51, 0x4c, 0x69, 0x74, 0x65, 0x20, 0x66,
    0x6f, 0x72, 0x6d, 0x61, 0x74, 0x20, 0x33, 0x00,
  ]);

  return buffer.equals(expectedHeader);
};

But a valid header doesn't mean a safe database. For integrity checks, I offload to a worker thread to avoid blocking the main event loop:

const verifyDatabaseIntegrityAsync = (filePath: string): Promise<boolean> => {
  if (!validateSqliteHeader(filePath)) {
    return Promise.resolve(false);
  }

  return new Promise((resolve) => {
    const worker = new Worker("./workers/db-verify.js", {
      workerData: { filePath },
    });
    
    let timeoutHandle: NodeJS.Timeout;
    
    worker.on("message", (isValid) => {
      clearTimeout(timeoutHandle);
      resolve(isValid);
    });
    
    // 10 second timeout for integrity check
    timeoutHandle = setTimeout(() => {
      worker.terminate();
      resolve(false);
    }, 10000);
  });
};

Security Headers & Rate Limiting

Every response includes security headers:

app.use((req, res, next) => {
  res.setHeader("X-Content-Type-Options", "nosniff");
  res.setHeader("X-Frame-Options", "DENY");
  res.setHeader("X-XSS-Protection", "1; mode=block");
  res.setHeader("Referrer-Policy", "strict-origin-when-cross-origin");
  res.setHeader(
    "Permissions-Policy",
    "geolocation=(), microphone=(), camera=()"
  );
  res.setHeader(
    "Content-Security-Policy",
    "default-src 'self'; " +
    "script-src 'self' 'unsafe-inline' 'unsafe-eval' https://cdn.jsdelivr.net; " +
    "style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; " +
    "img-src 'self' data: blob: https:; " +
    "connect-src 'self' ws: wss:; " +
    "frame-ancestors 'none';"
  );
  next();
});

And rate limiting to prevent DoS attacks:

const RATE_LIMIT_WINDOW = 15 * 60 * 1000; // 15 minutes
const RATE_LIMIT_MAX_REQUESTS = 1000;

app.use((req, res, next) => {
  const ip = req.ip || req.connection.remoteAddress || "unknown";
  const clientData = requestCounts.get(ip);

  if (!clientData || Date.now() > clientData.resetTime) {
    requestCounts.set(ip, { count: 1, resetTime: Date.now() + RATE_LIMIT_WINDOW });
    return next();
  }

  if (clientData.count >= RATE_LIMIT_MAX_REQUESTS) {
    return res.status(429).json({
      error: "Rate limit exceeded",
      message: "Too many requests, please try again later",
    });
  }

  clientData.count++;
  next();
});

CSRF Protection

Buried in the csrf-impl branch is a complete CSRF protection implementation:

// Cryptographically secure token generation
export const generateCsrfToken = (): string => {
  return crypto.randomBytes(32).toString("hex");
};

// Timing-safe validation to prevent timing attacks
export const validateCsrfToken = (clientId: string, token: string): boolean => {
  const stored = csrfTokenStore.get(clientId);
  if (!stored || Date.now() > stored.expiresAt) return false;
  
  // Use timing-safe comparison
  const storedBuffer = Buffer.from(stored.token, "utf8");
  const providedBuffer = Buffer.from(token, "utf8");
  return crypto.timingSafeEqual(storedBuffer, providedBuffer);
};

The implementation includes:

  • Token endpoint (GET /csrf-token) with its own rate limiter
  • Client identification via IP + User-Agent hash
  • 24-hour token expiry with periodic cleanup
  • Store size limits (max 10,000 tokens) to prevent memory exhaustion

Part VIII: Testing Infrastructure

A project isn't complete without tests. I set up two testing layers:

Unit/Integration Tests with Vitest

The security module has extensive unit tests:

describe("Security Sanitization - Image Data URLs", () => {
  it("should allow dataURL under configured limit", () => {
    configureSecuritySettings({ maxDataUrlSize: 5000 });
    
    const smallDataUrl = "data:image/png;base64," + "A".repeat(100);
    const files = {
      "file-1": {
        id: "file-1",
        mimeType: "image/png",
        dataURL: smallDataUrl,
      },
    };
    
    const result = sanitizeDrawingData({
      elements: [],
      appState: {},
      files,
    });
    
    expect(result.files["file-1"].dataURL).toBe(smallDataUrl);
  });

  it("should block javascript: URLs", () => {
    const result = sanitizeUrl("javascript:alert('xss')");
    expect(result).toBe("");
  });
});

End-to-End Tests with Playwright

For UI testing, Playwright runs actual browser automation:

test("should move drawing to trash and permanently delete", async ({ page, request }) => {
  const drawingName = `Trash Workflow ${Date.now()}`;
  const createdDrawing = await createDrawing(request, { name: drawingName });

  await page.goto("/");
  await page.getByPlaceholder("Search drawings...").fill(drawingName);
  
  const card = page.locator(`#drawing-card-${createdDrawing.id}`);
  await card.waitFor();
  
  // Select and move to trash
  await page.getByTestId(`select-drawing-${createdDrawing.id}`).click();
  await page.getByTitle("Move to Trash").click();
  
  await expect(card).toHaveCount(0);
  
  // Navigate to Trash
  await page.getByRole("button", { name: /^Trash$/ }).click();
  const trashCard = page.locator(`#drawing-card-${createdDrawing.id}`);
  await expect(trashCard).toBeVisible();
  
  // Permanently delete
  await page.getByTestId(`select-drawing-${createdDrawing.id}`).click();
  await page.getByTitle("Delete Permanently").click();
  await page.getByRole("button", { name: /Delete \d+ Drawings/ }).click();
  
  await expect(trashCard).toHaveCount(0);
  
  // Verify via API
  const response = await request.get(`APIURL/drawings/{API_URL}/drawings/{createdDrawing.id}`);
  expect(response.status()).toBe(404);
});

The collaboration tests spin up two browser contexts to simulate multiple users:

test("should show presence when multiple users view same drawing", async ({ browser, request }) => {
  const drawing = await createDrawing(request, { name: `Collab_Test` });

  const context1 = await browser.newContext();
  const context2 = await browser.newContext();

  const page1 = await context1.newPage();
  const page2 = await context2.newPage();

  await page1.goto(`/editor/${drawing.id}`);
  await page2.goto(`/editor/${drawing.id}`);

  // Wait for socket connections and presence updates
  await page1.waitForSelector("[class*='excalidraw']");
  await page2.waitForSelector("[class*='excalidraw']");

  // Both users should see each other's presence...
});

Part IX: Performance & Deployment

In-Memory Response Cache

As the number of drawings grew during testing, the /drawings endpoint became slow. Time for optimization:

const DRAWINGS_CACHE_TTL_MS = 5000;
const drawingsCache = new Map<string, { body: Buffer; expiresAt: number }>();

const buildDrawingsCacheKey = (keyParts) =>
  JSON.stringify([keyParts.searchTerm, keyParts.collectionFilter, keyParts.includeData]);

app.get("/drawings", async (req, res) => {
  const cacheKey = buildDrawingsCacheKey({ /* ... */ });
  
  const cachedBody = getCachedDrawingsBody(cacheKey);
  if (cachedBody) {
    res.setHeader("X-Cache", "HIT");
    return res.send(cachedBody);
  }

  const drawings = await prisma.drawing.findMany({ /* ... */ });
  
  const body = cacheDrawingsResponse(cacheKey, drawings);
  res.setHeader("X-Cache", "MISS");
  return res.send(body);
});

Benchmark results were significant:

  • Before caching: ~45ms p50 latency
  • After caching: ~7ms p50 latency, 668 req/s average throughput

Docker: From Development to Production

Multi-stage builds keep images small:

# Frontend Dockerfile
FROM node:20-alpine AS builder
WORKDIR /app
COPY package*.json ./
RUN npm ci
COPY . .
RUN npm run build

FROM nginx:alpine
COPY --from=builder /app/dist /usr/share/nginx/html
COPY nginx.conf.template /etc/nginx/templates/
COPY docker-entrypoint.sh /docker-entrypoint.d/

The nginx.conf.template gets dynamically configured at runtime to proxy API requests:

server {
    listen 80;
    
    location / {
        root /usr/share/nginx/html;
        try_files uriuriuri/ /index.html;
    }
    
    location /api/ {
        proxy_pass ${BACKEND_URL}/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
    
    location /socket.io/ {
        proxy_pass ${BACKEND_URL}/socket.io/;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
    }
}

Part X Epilogue: The Numbers

Looking at the final state:

Metric Value
Failed versions 4 (v1, v2, v3, personal)
Final version commits 124
Days from idea to v1 MVP 4
Total lines of TypeScript ~10,000
Security vulnerabilities fixed 7
Transformer names in codebase 150+

The evolution timeline:

Version Date Status
Excalidraw-Dashboard Nov 18, 2025 Abandoned (no commits)
Excalidraw-Dashboardv2 Nov 18, 2025 Single checkpoint, abandoned
Excalidraw-Dashboardv3 Nov 18-21, 2025 Working, collaboration reverted
ExcaliDash (current) Nov 22, 2025+

What I'd Do Differently

  1. Start with the NPM package approach — Forking was always the wrong choice
  2. Design for collaboration from day one — Adding reconciliation later is painful
  3. Write tests earlier — The image truncation bug would have been caught immediately
  4. Don't remove features you'll want later — Stripping collaboration from v3 was a dead end

What Worked

  1. SQLite for persistence — Simple, portable, easy to backup
  2. Prisma ORM — Type safety and migrations saved countless hours
  3. Socket.io for real-time — Reliable, well-documented, just works
  4. DOMPurify for sanitization — Battle-tested security

Why We're Not Using a Real CRDT (And That's OK)

At this point, you might ask: "Why not just use Yjs or Automerge?"

Three reasons:

  1. Excalidraw isn't CRDT-native. Its data model (an array of elements with nested objects) doesn't map cleanly to CRDT primitives.

  2. The network model is close to strong consistency. Socket.io preserves order within a connection. Using a CRDT here is like using a sledgehammer for a nail.

  3. Complexity cost. Yjs and Automerge are powerful but complex. For a whiteboard app where "we lost your last keystroke" is acceptable (you can undo), the pragmatic approach is better.


The Code

ExcaliDash is open source.

The dashboard works. The collaboration works. The previews work :)


002352 visitors