Star Wars Data Explorer
An interactive data platform for exploring the Star Wars universe through analytics, visualisations, and AI-powered insights.
Vision
Traditional wiki browsing is linear — you read one article, click a link, read another. This project aims to surface the connections, patterns, and context that exist across the entire Star Wars universe but are impossible to see one page at a time.
By processing the full breadth of Wookieepedia content through automated pipelines and AI-powered analysis, we build structured datasets, knowledge graphs, and semantic indexes that power interactive explorations no traditional wiki can offer.
Data Source
All content is sourced from Wookieepedia (starwars.fandom.com), the comprehensive Star Wars wiki maintained by the fan community. This includes both Canon and Legends continuity material. Structured infobox data, article content, and category metadata are downloaded via the MediaWiki API and stored in MongoDB for processing.
How It Works
ETL Pipeline
Automated daily sync downloads pages from Wookieepedia, extracts structured infobox data, and builds categorised timeline events across every era of the Star Wars timeline.
Knowledge Graph
An AI agent analyses every article to extract relationships between characters, factions, locations, and events — building a massive knowledge graph that maps the connections across the entire universe.
Semantic Search
Article content is chunked and embedded using OpenAI, enabling vector-based semantic search. Ask natural-language questions and get answers grounded in the actual wiki content.
AI Agent
An interactive AI agent powered by OpenAI can answer questions, generate character timelines, render charts and visualisations, and traverse the knowledge graph to find non-obvious connections.
Features
AI-Powered
Ask AI — Conversational agent with tool access to the knowledge graph, data tables, and wiki search. The agent creates chart and graph configurations that query the real underlying data — no data points are fabricated. The exception is the table component, which the agent populates with ad-hoc data to present textual information in a structured format when no direct query can produce the result.
Character Timelines — AI-generated biographical timelines for characters, synthesised from their full wiki articles.
Data Exploration
The following features do not use AI at runtime. They are built on structured data extracted from Wookieepedia through the ETL pipeline.
Timeline — Browse thousands of events across all eras with category filtering. Events are extracted from Wookieepedia infoboxes.
Data Tables — Searchable, sortable tables for every infobox category: characters, ships, weapons, species, planets, and more.
Graph Explorer — Browse the relationship graph connecting characters, factions, and events. The graph itself was built using AI (via the OpenAI Batch API), but exploring it does not involve AI.
Grid Map — A galactic grid map with drill-down from regions to systems to planets.
Interactive Map — Interactive galactic map with trade routes, nebulae, and location markers.
Events Heatmap — A heatmap overlay showing the density of events across the galaxy.
Built With
This project was built with assistance from AI coding tools, including GitHub Copilot and Claude Code. AI-assisted development is rapidly becoming a standard part of the software engineering workflow. The industry is increasingly competitive, and developers who can leverage these tools effectively are better positioned to remain hireable and desirable. Using AI in this context is no different from using any other productivity tool — the architecture, design decisions, and quality standards remain the developer's own. The AI accelerates the work; it does not replace the vision behind it.
Technology
Content Licensing
Wookieepedia's user-contributed text is published under the Creative Commons Attribution-ShareAlike 3.0 (CC-BY-SA 3.0) licence. This licence explicitly permits copying, redistributing, adapting, and building upon the material — including for commercial purposes — provided that attribution is given and derivative works are shared under the same or a compatible licence.
What we believe is within terms
Extracting structured data from infoboxes, building timelines, and displaying transformed or derived content are all forms of adaptation that CC-BY-SA 3.0 expressly permits. We attribute Wookieepedia as the source throughout the site, and derived textual content is made available under the same licence. No copyrighted images or media owned by Lucasfilm/Disney are stored or displayed — only text-based data.
Where there is a grey area
Fandom's Terms of Use prohibit automated access to their services without express written permission, including via robots, spiders, or scrapers. This project retrieves content through the MediaWiki API, which is a standard, publicly available interface provided by the platform. The Terms of Use themselves include an exception for content that is “submitted to particular Fandom communities as permitted as set forth at our licensing page” — acknowledging that CC-BY-SA text carries broader reuse rights than the general restrictions would suggest. However, the boundary of this exception is ambiguous, and the tension between the content licence (which permits reuse) and the platform terms (which restrict automated access) is not clearly resolved.
Additionally, Fandom's 2025 Terms of Use update prohibits using content “for the development of any software program, including, but not limited to, training a machine learning or artificial intelligence (AI) system” without prior written consent. This project generates vector embeddings for semantic search and uses AI to extract relationships and summarise articles. Whether retrieval-focused embeddings and structured extraction constitute “training” in the sense intended by this clause is an open question — these are not used to train foundation models, but the wording is broad.
Our position
This is a non-commercial fan project for educational and analytical purposes. We respect the Wookieepedia community and the work of its contributors. API requests are rate-limited with a descriptive User-Agent header. No copyrighted media is used. We attribute all content to Wookieepedia and its editors. If the Wookieepedia community or Fandom have concerns about this project, we are open to discussion and will act in good faith.