The original motivation for AutoFormFiller was a real frustration: we were filling out the same information across dozens of government and insurance forms. Same address, same ID numbers, same answers to the same questions. Surely an AI could do this.
The Architecture
The system has two parts: a Chrome extension (the front end) and a Flask API (the backend). The extension captures a screenshot of the active tab and sends it to the backend. The backend uses GPT-4o's vision capability to identify form fields and determine what values to fill in. The filled values are sent back to the extension, which injects them into the page.
Vision-Based Form Detection
Using vision instead of DOM parsing was a deliberate choice. DOM parsing is brittle โ every site has different markup. Vision models are flexible enough to understand form intent from the visual layout, the same way a human would.
# Backend: detect fields with GPT-4o vision
def detect_form_fields(screenshot_base64: str, user_data: dict) -> dict:
response = client.chat.completions.create(
model="gpt-4o",
messages=[{
"role": "user",
"content": [
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{screenshot_base64}"}},
{"type": "text", "text": FIELD_DETECTION_PROMPT.format(data=user_data)}
]
}]
)
return parse_field_response(response)
Token Management
The extension manages OAuth tokens for the backend API. Token rotation, refresh flows, and secure storage in Chrome's local storage were all non-trivial to get right. The tokens.json pattern we settled on works well for local development but gets replaced by a proper secrets manager in production.
MLE