Comment by shpx - Hacker Neue

shpx Jun 18, 2025 parent

It seems like a waste that you didn't use the 89,259 yurts that are already outlined in OpenStreetMap as input, though you would've probably had issues aligning the outlines with google maps imagery

https://taginfo.geofabrik.de/asia:mongolia/tags/building=ger

I'm also guessing your model doesn't handle yurts that are on the border of a tile.

Finally, that's a much smaller number than I expected for a country of 3 million.

biorach Jun 18, 2025

> Finally, that's a much smaller number than I expected for a country of 3 million.

172.7k yurts. Assuming that these are family residences for the most part, if we take an average occupancy of 4 (which is probably too low - the fertility rate is still quite high there) gives ~691k people living in yurts - approximately 20% of the population of 3.5 million - sounds reasonable.

ViscountPenguin Jun 18, 2025

About 45% in 2010 apparently https://unstats.un.org/unsd/demographic-social/census/docume...

pmontra Jun 18, 2025

My quick estimate before clicking the link was:

From my memory: 3 million people, 1.5 living in the capital.

Let's say 1 million are living outside cities.

4 people per yurt.

250,000 yurt.

Add some extra yurts because there will be people having more than one or people living in a house with a yurt in the garden or yurts used as warehouses, etc

300,000 which is almost the double of the count from the ML app.

joshvm Jun 18, 2025

This is a nice idea that often comes up in geo/ml projects. (Why not just use OSM for all your labels?)

To start, OSM doesn't use Google Maps imagery for annotation due to licensing concerns. As someone else mentioned, it's rarely clear whether researchers have the right to use Maps imagery let alone download/re-publish it. Part of the reason is that Google sub-licenses imagery from several different providers who are usually extremely protective of IP. So immediately you'd have image/label alignment issues.

Even if you had access to the image that someone used for labeling, it's non-trivial. They might not even have used an image! For example you might walk around and take a GPS reading next to every object and use the keypoints as object centers. Sometimes the annotation quality is low, for example if you want to try using building outlines or roads as segmentation targets for aerial imagery. Or things are simply misaligned. Also since yurts are inherently mobile, you might not even be able to use those labels because objects have moved and there's no guarantee they'll be present in Google Maps.

Finally you'd have issues of omission/commission, because you would have to assume that OSM is complete. That's very sensitive to how active the local community is. Some places are accurate down to the fire hydrant. Where I live, there are plenty of unmapped businesses that have been here for years. Though you could definitely use it to cross-check your own labels + predictions.

The standard for detecting objects on tiles is to discard border predictions and rely on overlap (sliding window) prediction + non max suppression (NMS) to handle duplicates. The overlap is usually something like 1x receptive field of your model, and your "discard" region is a bit larger than your max expected object size.

colkassad Jun 19, 2025

From experience, I agree with your points. One thing OSM data is ok for is land classification labels ("landuse", etc tags) as the accuracy is not as important at their scales and requires less effort to cleanup. Most of the work is aggregating disparate landuses into buckets that make sense for your model.

rsynnott Jun 18, 2025

> Finally, that's a much smaller number than I expected for a country of 3 million.

172k of them? That still seems like quite a lot of yurts; certainly more yurts per capita than anyone else has.

shpx OP Jun 18, 2025

Wikipedia says 30% of 3.5 million are "nomadic or semi-nomadic", which would be 6 people to a yurt. I couldn't figure out what percentage of the country was done, but if he did 270,559/37,258,617 zoom 17 tiles then there could be another 100k in the other 99% of the data.

Living away from other people and not next to anything in particular is what I associate with nomads, the heuristic of searching a radius around landmarks doesn't make sense to me. I scrolled around a random remote desert area in Mongolia on Google Maps and found a yurt every couple of minutes.

shiandow Jun 18, 2025

I'm confused why you wouldnt just do some random sampling to get some statistical bounds. At least then you'll know if you are close.

This item has no comments currently.