01

01

01

ArchiCheck

ArchiCheck

ArchiCheck

AI Code Compliance Checker

Type

Year

Team

Personal Project

2025

Team of 2 (Arjun Khurana, Angad Khurana)

AEC TECH

AI

Retrieval Augmented Generation | Artificial Intelligence

The Idea

I worked as an architect for six months for one of India’s largest real estate developer, where much of my role involved manual, labor-intensive code compliance reviews

It wasn’t until almost a year after leaving that I saw a similar job description shared in my college WhatsApp group and the inefficiency of these tasks truly struck me. That moment clarified the opportunity: much of this work could be streamlined with the right technology. It became the starting point for pursuing a solution

It was a moment of realization when I recognized that nearly 50% of the job description wasn’t actually a skill, but a set of tasks that could be handled efficiently with the right technology

Defining the Problem

Information Overload
Architects and engineers must navigate massive regulatory documents (building codes, bylaws, LEED, TOD), often thousands of pages long



Manual Bottlenecks
Searching these documents is slow, tedious, and highly error-prone

Traditional AI Limitations

  • Can only process a small amount of text at once(“lost-in-the-middle” problem)

  • Models often generate confident but incorrect information when context is missing or unclear

  • Models can’t retain uploaded information across sessions or manage large, changing document sets

The Solution

An intelligent document retrieval system that transforms how architecture professionals access regulatory information

An intelligent document retrieval system that transforms how architecture professionals access regulatory information

An intelligent document retrieval system that transforms how architecture professionals access regulatory information

All compliance documents for construction were grouped into four categories: building guidelines, local bylaws, sustainability policies, and miscellaneous standards (e.g., TOD). The goal was to organize them for intuitive, high-level selection while emphasizing discovery rather than traditional search

Working Principle Investigation | Conventional RAG Framework

Limitations of Conventional RAG System

Understanding Challenges of Document Structure

Complex table notes structure

Atypical table layout

Images embedded in tables

Conventional RAG proved challenging for two major reasons:

1. Text extraction was unreliable because the documents contained complex layouts, including multi-column and nested tables that existing extraction tools struggled to interpret accurately. This led to fragmented or incorrect text output.

2. Determining an effective chunking strategy was difficult. Smaller chunks required retrieving many segments, increasing noise, while larger chunks often lost important nuance and failed to surface the right information.

The combination of inconsistent extraction quality and ambiguity around optimal chunk size ultimately pushed us to explore more robust alternatives.

Final RAG Framework | Late Interaction Retrieval using ColQwen

Technical Innovation

This project implements late-interaction multi-vector retrieval using ColQwen-based embeddings, where each document token generates its own vector representation.
At query time, the system performs token-level similarity matching between multi-vector embeddings, dramatically improving precision over single-vector approaches

Jina Embeddings v4 is an open-source model that’s a fine-tuned version of ColQwen v2, further boosting its performance.

Weaviate's database was used for memory-efficient multi-vector embedding storage and fast, accurate retrieval powered by its MUVERA algorithm

Orchestration & Scalability

LangGraph structures the retrieval pipeline as a directed graph. This graph-based approach enables conditional routing (e.g., switching strategies for single vs. multi-document queries) and robust state persistence across user sessions.

User Experience & Interface

Layout & Heirarchy

A familiar layout is maintained, consistent with standard LLM interfaces such as ChatGPT. Preserving this format allows users to rely on their existing muscle memory for “talking to AI,” making the transition to our system seamless and intuitive

Document Root: 

Single-click selection for main codes (NBC)
Nested Jurisdictions: 

Collapsible menus keep main navigation clean

Visualizing Logic

Building codes are logic puzzles. We reduce cognitive tracing load by synthesizing text into diagrams

Engineering the UX

Users

I recently showed the ArchiCheck prototype to some former colleagues and friends, who are working as architects for a large real estate developer. They offered thoughtful feedback and genuinely appreciated the product. It felt a bit surreal, just a year and a half ago, I was sitting in that same office, learning the ropes, and now something I’ve built is actually helping my friends who work there today. It was a simple but meaningful moment, seeing the tool make their work easier

Builders

Contributions





I focused on the front-end development and overall user experience, while Angad managed the web deployment and performance optimization.
We both collaborated closely on the RAG orchestration and the broader AI pipeline, shaping the system’s intelligence together

Logo

The logo combines the letter ‘A’ with a simplified document icon, linking “Architect” with the documentation workflow at the heart of the product

Website Link

© Designed & Built by Arjun Khurana

© 2025

© Designed & Built by Arjun Khurana

© 2025

© Designed & Built by Arjun Khurana

© 2025