OSM Contributions by Country

How do we determine which Country a mapper might be from?

There has been a decent amount of research on this question, and it's a moving target. One reasonable thought (if we ignore humanitarian or paid-editing teams), is that the Country in which a mapper makes the majority of their changesets is perhaps their home. If not their actual home, it is still clearly a location they posess knowledge of and continually map. Some of Pascal Neis's tools use this approach, so I will re-create it here roughly.

Quick Approach: Find the most occuring country that the min_lat, min_lon point of each of a mapper's changesets falls into.


Note, as of February 21, the most recent data available on the AWS public dataset is is 2/15/2021.

Thanks!

Jennings Anderson

Query:

First, extract data form the OSM changesets data on AWS. This relies on the OSM public dataset and a separate "countries_geom" table that has WKT outlines of world countries.

-- A user's most frequently edited Country in the past 5 years? 
WITH most_frequent_editing_country AS (
  WITH user_history AS (
     SELECT uid, code FROM changesets
       JOIN country_geoms
         ON ST_Contains(ST_GeometryFromText(geometry_wkt), ST_Point(min_lon, min_lat)) 
      WHERE ("created_at" >= ( (SELECT "max"("created_at") FROM changesets) - INTERVAL '5' YEAR))
  )
  SELECT uid,
         code AS most_edited_country
  FROM (SELECT uid, code, COUNT(*) AS cnt, ROW_NUMBER() OVER (PARTITION BY uid
           ORDER BY  COUNT(*) DESC) AS seqnum FROM user_history GROUP BY  uid, code ) t
  WHERE seqnum = 1
),

-- Calculate "active contributor" stats
active_contributor AS (
  SELECT uid,
         COUNT(id) AS changesets_in_last365,
         COUNT(DISTINCT(date_trunc('day', created_at))) AS editing_days_in_last365, 
         SUM(num_changes) AS total_edits_in_last365, MAX(created_at) AS most_recent_changeset
    FROM changesets
    WHERE created_at >= ( (SELECT MAX(created_at) FROM changesets) - INTERVAL '365' DAY)
    GROUP BY  uid
)
-- Finally, put it all together:         
SELECT changesets.id,
       changesets.uid,
       changesets.user,
       created_at,
       num_changes,
       country,
       code,
       editing_days_in_last365,
       most_edited_country,
       total_edits_in_last365,
       changesets_in_last365
  FROM changesets 
  LEFT JOIN active_contributors ON changesets.uid = active_contributors.uid
  LEFT JOIN most_frequent_editing_country ON most_frequent_editing_country.uid = changesets.uid
  JOIN country_geoms
    ON ST_Contains(ST_GeometryFromText(geometry_wkt), ST_Point(min_lon, min_lat))

WHERE created_at >= date '2020-01-01'
ORDER BY created_at DESC

Save this file as 2020_changesets_with_homes.csv

1. Aggregate Per Country

Counts the number of Changesets submitted per Country — as in, the Country that the changeset falls within. In this case, local refers to whether or not the mapper who submitted each changeset is likely from that Country.

Countries like the USA are going to be over-represented with local changesets because of mapping teams like Amazon Logistics that almost exclusively edit in the US, so they will appear local.

By actual edit count from num_changes field

Aggregated by Mapper

Breaking down the Total Mapper count by local/non-local mappers per Country

These numbers are massively inflated by one-time contributors in OSM. For reference, in the last year, there have been over 170k mappers who edited only on 1 day. This likely isn't really enough information to determine a "local country" so these numbers are being inflated.

To adjust for this, let's look only at the number of mappers active for more than 2 days:

Notice the change in ratios of local / non-local mappers, partially given our loose definition of "local." Overall, many of the non-local mappers disappeared in this figure (notice the major decrease on the y-axis)

Identifying the Active Contributors active per Country?

Save Per-Country Results as Table

Finally, Active Contributors per Country?