AI Modernization Powers OldNYC Expansion: 10,000 New Historic Photos Added Through GPT and OpenStreetMap
Key Takeaways
- ▸OpenAI's GPT-4o and GPT-4o-mini enabled the geolocation of 6,000 additional historical photos by extracting location details from image descriptions with semantic understanding
- ▸OCR coverage improved by 28% (25,000 to 32,000 images) with GPT outperforming custom legacy OCR systems in 75% of comparisons
- ▸Integration of OpenStreetMap and historical street datasets increased geolocation accuracy for mapped images to 96%, addressing limitations of modern geocoding services on defunct street intersections
Summary
OldNYC, a historical photo archive and map of New York City, has expanded from 39,000 to 49,000 photographs through a major 2024 rebuild leveraging modern AI tools and open-source technologies. The expansion was driven by three key improvements: better geolocation using OpenAI's GPT-4o to extract location details from photo descriptions, dramatically improved optical character recognition (OCR) using GPT-4o-mini to transcribe historical catalog text, and a switch from Google Maps to OpenStreetMap for more accurate historical street data. These AI-powered enhancements improved geolocation accuracy to 87% for photos with usable location data and 96% accuracy for mapped images, while OCR coverage increased from 25,000 to 32,000 images with GPT outperforming the previous custom pipeline in approximately 75% of cases.
The project demonstrates how large language models can solve complex historical digitization challenges that require semantic understanding of context. GPT's ability to interpret ambiguous historical descriptions—such as understanding "North 6th" as "North 6th Street" and extracting relevant intersections while ignoring irrelevant details—enabled the automated geolocation of approximately 6,000 additional photos. The integration of historical street datasets from the New York Public Library further improved accuracy by correcting modern geocoding errors on streets that no longer exist in their historical configurations.
- The project demonstrates practical applications of LLMs for digital humanities and historical preservation work
Editorial Opinion
This project exemplifies how modern generative AI can dramatically improve historical digitization projects that were previously bottlenecked by technical limitations. By combining GPT's sophisticated text understanding with open-source mapping infrastructure, the OldNYC team achieved what custom-built systems couldn't—accurate interpretation of ambiguous historical descriptions and reliable text extraction from degraded archival images. This approach could serve as a model for other digital humanities initiatives seeking to unlock historical collections at scale.


