AutoFormFiller: Building a Chrome Extension with a Cloud Backend

5 min read by Eddie Chongtham
PythonFlaskChrome ExtensionGPT-4o VisionAWS

The original motivation for AutoFormFiller was a real frustration: we were filling out the same information across dozens of government and insurance forms. Same address, same ID numbers, same answers to the same questions. Surely an AI could do this.

The Architecture

The system has two parts: a Chrome extension (the front end) and a Flask API (the backend). The extension captures a screenshot of the active tab and sends it to the backend. The backend uses GPT-4o's vision capability to identify form fields and determine what values to fill in. The filled values are sent back to the extension, which injects them into the page.

Vision-Based Form Detection

Using vision instead of DOM parsing was a deliberate choice. DOM parsing is brittle โ€” every site has different markup. Vision models are flexible enough to understand form intent from the visual layout, the same way a human would.

# Backend: detect fields with GPT-4o vision
def detect_form_fields(screenshot_base64: str, user_data: dict) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot_base64}"}},
                {"type": "text", "text": FIELD_DETECTION_PROMPT.format(data=user_data)}
            ]
        }]
    )
    return parse_field_response(response)

Token Management

The extension manages OAuth tokens for the backend API. Token rotation, refresh flows, and secure storage in Chrome's local storage were all non-trivial to get right. The tokens.json pattern we settled on works well for local development but gets replaced by a proper secrets manager in production.

โ† Back to Blog
Building a Voice AI Assistant for Windows โ†’