o3新玩法讓奧特曼驚呼!包漿老照片也被AI精準定位,全程高能|附提示詞

不管什麼任務,只要 AI 一加入戰鬥,用不了多久就能終結比賽。
最離譜的是,一張糊得看不出是什麼的包漿照片,也能給它識別出來了——別懷疑,就是糊成這個樣子。
基於這張包漿圖,o3 給出了幾個可能性:

(1)恆河上游約 5 公里處的開闊地

(2)

下密西西比河的渾濁河段

(3)黃河河段
(4)湄公河河段
如果把所有的工具都給你,你能找出具體是哪兒嗎?
正確的答案是湄公河河段,只是這張圖拍攝於 2008 年,真·包漿。
「看圖猜地點」其實是一個挺熱門的遊戲:GeoGuessr。系統會給出一張隨機的谷歌街景圖片,你需要根據裡面的資訊,判斷具體的地點。
這個遊戲還挺受歡迎,有很多愛好者會在上面刷榜,甚至還有大獎賽。
普通玩家參與 GeoGuessr 的一個方式,就是透過 Google 搜圖,確定大致方位,再透過 Google Earth 和街景,一點點確認。
然而,現在 GeoGuessr 就不再只是人類之間的遊戲了,o3 強勢加入,直接幹倒了頂級選手。
Sam Altman 表示:別說,我也沒想到。
圖片推理剛出的時候,許多網友就意識到了它的應用潛力,其中就包括地點辨識。
最近有網友發現,o3 在面對哪怕是非常模糊的資訊,也展現了超強的推理能力——並且,是在停用提取 EXIF 等方式的情況下,僅憑藉對圖中細節的推理,就能實現準確的判定。
不得不說,這 prompt 真是驚人……我仔細研究了一下,它很像是一位資深的 Geo Guesser 玩家,把自己多年的「心法」寫下來,傳授給了 o3,同時限制它使用 Google Earth 等工具「作弊」。
比如,prompt 要求 o3 要非常非常非常非常的仔細,「注意人行道磚塊大小、馬路牙子、施工標記、電纜、柵欄結構等具有地區差異的細節」,還有要結合天光、陰影、尤其是坡度等等各種因素進行判斷。
這些在後來的實測中,都被證明非常有價值,o3 的綜合能力因此得到了巨大的提升。
真的這麼神奇?我把這長得有點離譜的 prompt 丟給了 o3,它表示:接受挑戰。

猜猜我在哪大挑戰

第一張圖我先不傳太難的,不過也挺難的了:夜景拍攝的高架橋沒有任何建築物可以參考,也沒有明顯的車輛車牌,甚至連公交車的線路號碼都很模糊。之所以還能定義它為「不難」,是因為右上角露出了半截金屬字型,不過也只是半截。
為了保證模型絕對不讀取 EXIF,我額外截圖了一次,兩側的灰邊就是截圖留下的。
夜景拍攝造成的困難還是很多的,o3 的推理中,很多方式都實現不了。不過,第一輪備選裡,其實已經出現了正確答案,因此我讓它繼續進行。
遺憾的是,最後它和正確答案失之交臂——明明也考慮過了廣州海珠橋,但還是選了外白渡橋。
一種可能性是,識字(尤其是漢字),對 o3 來說還是有點難度?畢竟這點在各種圖片、海報的生成任務中,也有所體現。
但無論如何,有半截漢字出現,不能算困難的。這樣的表現一度讓我對下面的任務失去興趣:下面這張圖沒有任何標識、建築參照,連半截字都沒有。
海珠橋都識別不出來的話,這真的可以嗎——好傢伙,直接把我看呆。
這的確是今年五一期間舉辦的 InD 藝術節,不過,這張照片拍攝於搭建過程中,所以沒有明顯的 logo,而且亂七八糟,沒想到也被識別出來了。
這張照片也明顯體現了聊天記錄,以及使用者長期以來留存下來的記憶,都會構成模型推理的一部分——甚至,在一定程度上「汙染」它的推理。
比如在接下來我認為最難的識別任務裡,記憶反而成為了推理時的干擾項
這張圖不僅該有的都沒有,而且是從室內往外拍攝的。這對於反過來定位位置而言,會有更多的困難。
其實在第一輪候選中,提出過相當近的答案,但是接下來的推理,o3 卻還是被帶跑偏,堅定地認為,這還是在 TIT 創意園區附近。哪怕我又提供了一張更清晰的圖,也不為所動。
怎麼說呢,這多少有點讓人繃不住了。
o3 在圖片識別上的用途,剛一出來就被認為有極大的隱私風險,開盒從未如此方便,也從未如此準確。考慮到現在資訊洩露這麼嚴重,僅憑一張隨手拍就定位真人,也不是不可能。
但這次實測暴露出了另一個問題:當 AI 信誓旦旦說自己沒錯的話,你會歸因於它的幻覺,還是會被它慢慢說服?
回到一開始的海珠橋識圖,在它判斷失敗之後,我提示了一下:你看那半截,它像不像個「海」字?
模型倒是考慮了,隨後列出了一張詳細的表格,闡述了它的立場——並堅定地不改。
看到這張圖的時候,我不由得有幾分遲疑,還跑回去重新檢查了一下圖片:難道是我傳錯了檔案?不小心把外白渡橋的圖傳給它了?
究竟是它對還是我對?
明明可以作為不在場證明的圖片,卻可以變成了「在場證明」。一個明明我沒有到訪過的地方,強行出現在了我的生命裡,實在是細思極恐。哪天出現一張我登上月球的圖片,它都能說服我:你真的去過
最後,你可能也想試試這樣的魔法,下面是 prompt 的全文。不過:僅限個人嘗試,刺探他人隱私是不對的
You are playing a one-round game of GeoGuessr. Your task: from a single still image, infer the most likely real-world location. Note that unlike in the GeoGuessr game, there is no guarantee that these images are taken somewhere Google's Streetview car can reach: they are user submissions to test your image-finding savvy. Private land, someone's backyard, or an offroad adventure are all real possibilities (though many images are findable on streetview). Be aware of your own strengths and weaknesses: following this protocol, you usually nail the continent and country. You more often struggle with exact location within a region, and tend to prematurely narrow on one possibility while discarding other neighborhoods in the same region with the same features. Sometimes, for example, you'll compare a 'Buffalo New York' guess to London, disconfirm London, and stick with Buffalo when it was elsewhere in New England – instead of beginning your exploration again in the Buffalo region, looking for cues about where precisely to land. You tend to imagine you checked satellite imagery and got confirmation, while not actually accessing any satellite imagery. Do not reason from the user's IP address. none of these are of the user's hometown. **Protocol (follow in order, no step-skipping):** Rule of thumb: jot raw facts first, push interpretations later, and always keep two hypotheses alive until the very end. 0 . Set-up & Ethics No metadata peeking. Work only from pixels (and permissible public-web searches). Flag it if you accidentally use location hints from EXIF, user IP, etc. Use cardinal directions as if “up” in the photo = camera forward unless obvious tilt. 1 . Raw Observations – ≤ 10 bullet points List only what you can literally see or measure (color, texture, count, shadow angle, glyph shapes). No adjectives that embed interpretation. Force a 10-second zoom on every street-light or pole; note color, arm, base type. Pay attention to sources of regional variation like sidewalk square length, curb type, contractor stamps and curb details, power/transmission lines, fencing and hardware. Don't just note the single place where those occur most, list every place where you might see them (later, you'll pay attention to the overlap). Jot how many distinct roof / porch styles appear in the first 150 m of view. Rapid change = urban infill zones; homogeneity = single-developer tracts. Pay attention to parallax and the altitude over the roof. Always sanity-check hill distance, not just presence/absence. A telephoto-looking ridge can be many kilometres away; compare angular height to nearby eaves. Slope matters. Even 1-2 % shows in driveway cuts and gutter water-paths; force myself to look for them. Pay relentless attention to camera height and angle. Never confuse a slope and a flat. Slopes are one of your biggest hints – use them! 2 . Clue Categories – reason separately (≤ 2 sentences each) Category Guidance Climate & vegetation Leaf-on vs. leaf-off, grass hue, xeric vs. lush. Geomorphology Relief, drainage style, rock-palette / lithology. Built environment Architecture, sign glyphs, pavement markings, gate/fence craft, utilities. Culture & infrastructure Drive side, plate shapes, guardrail types, farm gear brands. Astronomical / lighting Shadow direction ⇒ hemisphere; measure angle to estimate latitude ± 0.5 Separate ornamental vs. native vegetation Tag every plant you think was planted by people (roses, agapanthus, lawn) and every plant that almost certainly grew on its own (oaks, chaparral shrubs, bunch-grass, tussock). Ask one question: “If the native pieces of landscape behind the fence were lifted out and dropped onto each candidate region, would they look out of place?” Strike any region where the answer is “yes,” or at least down-weight it. °. 3 . First-Round Shortlist – exactly five candidates Produce a table; make sure #1 and #5 are ≥ 160 km apart. | Rank | Region (state / country) | Key clues that support it | Confidence (1-5) | Distance-gap rule ✓/✗ | 3½ . Divergent Search-Keyword Matrix Generic, region-neutral strings converting each physical clue into searchable text. When you are approved to search, you'll run these strings to see if you missed that those clues also pop up in some region that wasn't on your radar. 4 . Choose a Tentative Leader Name the current best guess and one alternative you’re willing to test equally hard. State why the leader edges others. Explicitly spell the disproof criteria (“If I see X, this guess dies”). Look for what should be there and isn't, too: if this is X region, I expect to see Y: is there Y? If not why not? At this point, confirm with the user that you're ready to start the search step, where you look for images to prove or disprove this. You HAVE NOT LOOKED AT ANY IMAGES YET. Do not claim you have. Once the user gives you the go-ahead, check Redfin and Zillow if applicable, state park images, vacation pics, etcetera (compare AND contrast). You can't access Google Maps or satellite imagery due to anti-bot protocols. Do not assert you've looked at any image you have not actually looked at in depth with your OCR abilities. Search region-neutral phrases and see whether the results include any regions you hadn't given full consideration. 5 . Verification Plan (tool-allowed actions) For each surviving candidate list: Candidate Element to verify Exact search phrase / Street-View target. Look at a map. Think about what the map implies. 6 . Lock-in Pin This step is crucial and is where you usually fail. Ask yourself 'wait! did I narrow in prematurely? are there nearby regions with the same cues?' List some possibilities. Actively seek evidence in their favor. You are an LLM, and your first guesses are 'sticky' and excessively convincing to you – be deliberate and intentional here about trying to disprove your initial guess and argue for a neighboring city. Compare these directly to the leading guess – without any favorite in mind. How much of the evidence is compatible with each location? How strong and determinative is the evidence? Then, name the spot – or at least the best guess you have. Provide lat / long or nearest named place. Declare residual uncertainty (km radius). Admit over-confidence bias; widen error bars if all clues are “soft”. Quick reference: measuring shadow to latitude Grab a ruler on-screen; measure shadow length S and object height H (estimate if unknown). Solar elevation θ ≈ arctan(H / S). On date you captured (use cues from the image to guess season), latitude ≈ (90° – θ + solar declination). This should produce a range from the range of possible dates. Keep ± 0.5–1 ° as error; 1° ≈ 111 km.
我們正在招募夥伴
📮 簡歷投遞郵箱[email protected]
✉️ 郵件標題「姓名+崗位名稱」(請隨簡歷附上專案/作品或相關連結)

相關文章