AI Code Compliance Checker
Type
Year
Team
Personal Project
2025
Team of 2 (Arjun Khurana, Angad Khurana)
AEC TECH
AI
Retrieval Augmented Generation | Artificial Intelligence
The Idea
I worked as an architect for six months for one of India’s largest real estate developer, where much of my role involved manual, labor-intensive code compliance reviews
It wasn’t until almost a year after leaving that I saw a similar job description shared in my college WhatsApp group and the inefficiency of these tasks truly struck me. That moment clarified the opportunity: much of this work could be streamlined with the right technology. It became the starting point for pursuing a solution
It was a moment of realization when I recognized that nearly 50% of the job description wasn’t actually a skill, but a set of tasks that could be handled efficiently with the right technology
Defining the Problem
Information Overload
Architects and engineers must navigate massive regulatory documents (building codes, bylaws, LEED, TOD), often thousands of pages long
Manual Bottlenecks
Searching these documents is slow, tedious, and highly error-prone
Traditional AI Limitations
Can only process a small amount of text at once(“lost-in-the-middle” problem)
Models often generate confident but incorrect information when context is missing or unclear
Models can’t retain uploaded information across sessions or manage large, changing document sets
The Solution
All compliance documents for construction were grouped into four categories: building guidelines, local bylaws, sustainability policies, and miscellaneous standards (e.g., TOD). The goal was to organize them for intuitive, high-level selection while emphasizing discovery rather than traditional search
Working Principle Investigation | Conventional RAG Framework
Limitations of Conventional RAG System
Understanding Challenges of Document Structure

Complex table notes structure

Atypical table layout

Images embedded in tables
Conventional RAG proved challenging for two major reasons:
1. Text extraction was unreliable because the documents contained complex layouts, including multi-column and nested tables that existing extraction tools struggled to interpret accurately. This led to fragmented or incorrect text output.
2. Determining an effective chunking strategy was difficult. Smaller chunks required retrieving many segments, increasing noise, while larger chunks often lost important nuance and failed to surface the right information.
The combination of inconsistent extraction quality and ambiguity around optimal chunk size ultimately pushed us to explore more robust alternatives.
Final RAG Framework | Late Interaction Retrieval using ColQwen
Technical Innovation
This project implements late-interaction multi-vector retrieval using ColQwen-based embeddings, where each document token generates its own vector representation.
At query time, the system performs token-level similarity matching between multi-vector embeddings, dramatically improving precision over single-vector approaches

Jina Embeddings v4 is an open-source model that’s a fine-tuned version of ColQwen v2, further boosting its performance.

Weaviate's database was used for memory-efficient multi-vector embedding storage and fast, accurate retrieval powered by its MUVERA algorithm
Orchestration & Scalability
LangGraph structures the retrieval pipeline as a directed graph. This graph-based approach enables conditional routing (e.g., switching strategies for single vs. multi-document queries) and robust state persistence across user sessions.
User Experience & Interface
Layout & Heirarchy
A familiar layout is maintained, consistent with standard LLM interfaces such as ChatGPT. Preserving this format allows users to rely on their existing muscle memory for “talking to AI,” making the transition to our system seamless and intuitive
Document Root:
Single-click selection for main codes (NBC)
Nested Jurisdictions:
Collapsible menus keep main navigation clean
Visualizing Logic
Building codes are logic puzzles. We reduce cognitive tracing load by synthesizing text into diagrams
Engineering the UX
Users
I recently showed the ArchiCheck prototype to some former colleagues and friends, who are working as architects for a large real estate developer. They offered thoughtful feedback and genuinely appreciated the product. It felt a bit surreal, just a year and a half ago, I was sitting in that same office, learning the ropes, and now something I’ve built is actually helping my friends who work there today. It was a simple but meaningful moment, seeing the tool make their work easier
Builders
Contributions
I focused on the front-end development and overall user experience, while Angad managed the web deployment and performance optimization.
We both collaborated closely on the RAG orchestration and the broader AI pipeline, shaping the system’s intelligence together
Logo
The logo combines the letter ‘A’ with a simplified document icon, linking “Architect” with the documentation workflow at the heart of the product
















