Where to live in Poland or where to move
Dec 3, 2017
TK
9 minute read

Background

Inspired by this post by Roel M. Hogervorst, which was inspired partly by this post by Maëlle Salmon inspired by this xkcd strip, I decided to add to this inspirationally-overloaded-cycle and by the way get around to starting my blogdown-based site. Let’s get to it then.

The original idea of Randall Munroe, the author of xkcd, was to show cities from all around the world in the plot that tells you “where to live based on your temperature preferences” - one of the axes shows temperatures in the winter and the second corresponds to humidex in the summer (we’ll get to what it means later).


This post will deal with several things: getting weather data from the web, scraping Wikipedia, choosing a city to live in Poland or elsewhere (based additionally on my personal choice of “freedom” metrics). Here I’ll include only parts of the code I wrote, but all of it can be found here.

Actual work

My whole analysis, as was the case with Roel’s and Maëlle’s, was done in R - the language that’s currently second on my “most used” list, just after Polish and before English.

Packages that I used:

  • magrittr, tibble, dplyr, stringr, stringi, readr, purrr - for all the data mumbo-jumbo,

  • rvest - for scraping Wikipedia,

  • rwunderground - for accessing Weather Underground API,

  • weathermetrics - for some weather-related metrics (will describe below),

  • ggplot2, ggrepel, extrafont, xkcd - for the plots.

Where to live in Poland

For starters, I needed the list of some biggest cities in Poland. Luckily, Wikipedia’s got it in a pretty nice table and rvest lets you easily scrape them.

cities_wiki <-
  read_html("https://pl.wikipedia.org/wiki/Miasta_w_Polsce_(statystyki)")
cities_wiki %<>% html_nodes(".sortable") %>% html_table()

cities_tbl <- as_tibble(cities_wiki[[1]]) %>%
  select(city = Miasto,
         region = `Województwo`,
         population = `Liczba ludności (01.01.2017)`) %>%
  mutate(population = as.integer(str_replace_all(population, "\\s", "")))

top_cities_tbl <-
  cities_tbl %>%
  top_n(25, population) %>%
  arrange(-population)

head(top_cities_tbl, 5)
## # A tibble: 5 x 3
##       city        region population
##      <chr>         <chr>      <int>
## 1 Warszawa   mazowieckie    1753977
## 2   Kraków   małopolskie     765320
## 3     Łódź       łódzkie     696503
## 4  Wrocław  dolnośląskie     637683
## 5   Poznań wielkopolskie     540372

Aaaand, it was the end of the easy part. Then, I realized I need weather data containing at least monthly averages of temperature and humidex values. Humidex is a pretty nifty index developed in Canada and used by meteorologists to indicate how the weather really “feels”, by combining temperature and humidity, which is in turn measured by the dew point temperature, which is “the temperature at which the air can no longer”hold" all of the water vapor which is mixed with it, and some of the water vapor must condense into liquid water“.

As I learned, one can get that kind of data reather easily from Weather Underground API, but only for cities that have airports. I checked the state of Polish airports - as it turns out, we have 15 of them, but unfortunately that didn’t cover my needs.

I had to turn to IMGW (Instytut Meteorologii i Gospodarki Wodnej) - which in English goes by “Institute of Meteorology and Water Management” and is Polish public entity that deals with collecting weather data (amongst other things). One day, probably, their data will be available through some friendly API at https://danepubliczne.gov.pl/, but right now all we got are CSVs locked in ZIP files. Better than nothing, I guess. I downloaded three files containing data from around 60 weather stations from 2014, 2015 and 2016 from here.

Unfortunately, those files didn’t contain dew point temperatures, so I had to throw away whole humidex approach. Luckily, there’s a second, similar index (this time from the USA) - heat index. There are some differences between them, but for the purpose of this analysis, it didn’t really matter.

After some data cleaning and learning few more things about dplyr and stringr, I got the data into some presentable and workable form.

## # A tibble: 5 x 3
##            city   avg_temp avg_heat_index
##           <chr>      <dbl>          <dbl>
## 1     Bialystok -0.6111111       17.25556
## 2 Bielsko-Biala  1.9555556       17.91250
## 3   Czestochowa  1.2222222       18.78571
## 4        Gdansk  1.2444444       17.52222
## 5      Katowice  1.4666667       17.96250

At this point I could use the code from the posts referred in the beggining to generate the xkcd-styled plot for the Polish cities.

xrange <- range(pl_data_final$avg_heat_index)
yrange <- range(pl_data_final$avg_temp)

set.seed(42)

ggplot(pl_data_final,
       aes(avg_heat_index, avg_temp)) +
  geom_point() +
  geom_text_repel(aes(label = city),
                  family = "xkcd", 
                  max.iter = 50000) +
  ggtitle("Where to live in Poland based on your temperature preferences",
          subtitle = "Data from IMGW, 2014-2016") +
  xlab("Heat index")+
  ylab("Winter temperature in Celsius degrees") +
  xkcdaxis(xrange = xrange,
           yrange = yrange)+
  theme_xkcd() +
  theme(text = element_text(size = 16, family = "xkcd"))

As expected, the differences between Polish cities aren’t that big, but if you want to be as warm as one can in this country, you should think about Wrocław!

Where to move

It’s currently not hard to notice that sociopolitical climate in Poland is not so warm either. This leads me to think more and more often about giving a try to move and experience a different place. In order to make this analysis somewhat more interesting and useful, I decided to include cities from other countries in it.

To paraphrase Mindless Self Indulgence’s biggest hit - I like my countries free, just like my jazz and when thinking about relocation can’t but include this sentiment in my considerations. Fortunately, some people and organizations share my feelings or at least want to know how this whole notion of “freedom” looks around the world. I went through most of the list of freedom indices from Wikipedia and decided to take a closer look at two of them, namely Press Freedom Index by Reporters Without Borders and World Index of Moral Freedom by Foundation for the Advancement of Liberty. You can read about specifics of their methodologies on their respective pages, but I’ll try to summarize the most important parts below.

Press Freedom Index aims to “reflect the degree of freedom that journalists, news organisations, and netizens have in each country” on a scale from 0 to 100, where the fewer points the better. Countries scoring 0 to 25 get “good” or “fairly good” mark, whereas 25-35 means the situation is “problematic”, 35-55 “bad” and 55-100 “very bad”. Poland got 26.47 points this year.

press_index_countries <- 
  read_html("https://en.wikipedia.org/wiki/Press_Freedom_Index") %>% 
  html_nodes(".sortable") %>% 
  html_table()

press_tbl <- as_tibble(press_index_countries[[1]]) %>% 
  select(1:2) %>%
  mutate(index = str_sub(`2017[4]`, 6)) %>%
  select(country = Country, index) %>%
  mutate(country = str_replace(country, "\\[.{1}\\]", ""),
         index = as.double(str_replace(index, "\\s", "")))

head(press_tbl, 5)
## # A tibble: 5 x 2
##       country index
##         <chr> <dbl>
## 1      Norway  7.60
## 2      Sweden  8.27
## 3     Finland  8.92
## 4     Denmark 10.36
## 5 Netherlands 11.28

World Index of Moral Freedom tries to answer the question: “how free from state-imposed moral constraints are human beings depending on their countries of residence”. It ranks countries in 5 categories: religious freedom, bioethical freedom, drugs freedom, sexual freedom and gender & family freedom. The scale is also 0-100, but other way round - 90+ points represents “highest moral freedom” and every country below 50 is deemed as having insufficient or low moral freedom. Poland scored 50,08 in the last ranking (2016).

moral_index_countries <- 
  read_html("https://en.wikipedia.org/wiki/World_Index_of_Moral_Freedom") %>% 
  html_nodes(".sortable") %>% 
  html_table()

moral_tbl <- as_tibble(moral_index_countries[[1]]) %>% 
  select(country = COUNTRY, index = `WIMF 2016`) %>%
  mutate(index = as.double(str_replace(index, ",", ".")))

head(moral_tbl, 5)
## # A tibble: 5 x 2
##          country index
##            <chr> <dbl>
## 1    Netherlands 91.70
## 2        Uruguay 88.75
## 3       Portugal 83.80
## 4 Czech Republic 80.50
## 5        Belgium 79.35

So, I looked at the tables and decided to put the threshold at <25 for press freedom and >60 for the moral one. I hesitated a moment when looking at Italy (26.26 in the first ranking) and had to convince myself that I can actually grow my grapevines in a few other places, but eventually said arrivederci and moved on. That’s what I was left with:

## # A tibble: 22 x 3
##           country index.x index.y
##             <chr>   <dbl>   <dbl>
##  1    Netherlands   91.70   11.28
##  2        Uruguay   88.75   17.43
##  3       Portugal   83.80   15.77
##  4 Czech Republic   80.50   16.91
##  5        Belgium   79.35   12.75
##  6          Spain   78.60   18.69
##  7  United States   78.20   23.88
##  8        Germany   78.03   14.97
##  9         Canada   76.58   16.53
## 10     Luxembourg   72.60   14.72
## 11    Switzerland   72.38   12.13
## 12        Austria   71.13   13.47
## 13       Slovenia   70.00   21.70
## 14         France   69.93   22.24
## 15        Estonia   69.40   13.55
## 16         Sweden   66.95    8.27
## 17        Denmark   66.33   10.36
## 18    New Zealand   65.25   13.98
## 19       Slovakia   62.33   15.51
## 20   South Africa   61.70   20.12
## 21      Australia   61.35   16.02
## 22        Finland   60.58    8.92

At this point, I didn’t want to complicate my life and this post any further. I found really cool world’s cities database at simplemaps.com, chose two biggest places (in terms of population) from each of the 22 countries and could finally use Weather Underground API. In order to do that, you have to sign in at their website and get an API key. The process is pretty straight-forward, so I won’t go into details here. When you get your key, you can use it straight from R thanks to the wunderground package, like that:

library(rwunderground)
rwunderground::set_api_key("YOUR_KEY_HERE")

planner(set_location(territory = "Poland", city = "Warsaw"),
        start_date = "0101",
        end_date = "0131")

The “Travel Planner” gives us 30+ variables for the requested city and time period, average temperatures and dew points being among them. It is a little bit enigmatic, as it doesn’t say anywhere in the documentation how many years of data it considers, but well, that’ll do.

The biggest downside of the free plan of their API is that it lets you make max. 10 calls per minute, so I had to put some “sleep time” (with Sys.sleep()) into my functions and wait around 20 minutes to get all the data I needed.

Finally, however, I could plot my final output, and now we can all choose some warm (or cold, if you prefer), big city located in one of the least authoritative countries in the world.

Final notes regarding the plot: ‘index’ value come from reversing the press freedom index and adding it to the moral one. I also left the Polish cities (Łódź and Warsaw) for reference. And I lost the axes somewhere along the way - they didn’t want to work with coloured points (I’ll work on my ggplot skills and fix that one day, I promise).

Enjoy.

xrange <- range(world_cities_tbl$humidex)
yrange <- range(world_cities_tbl$wint_avg_temp)

set.seed(42)

ggplot(world_cities_tbl,
       aes(humidex, wint_avg_temp, color = `freedom index`)) +
  geom_point() +
  geom_text_repel(aes(label = city),
                  family = "xkcd", 
                  max.iter = 50000) +
  ggtitle("Where to live based on your temperature preferences",
          subtitle = "Data from Weather Underground, simplemaps and Wikipedia") +
  xlab("Summer humidex") +
  ylab("Winter temperature in Celsius degrees") +
  theme_xkcd() +
  theme(text = element_text(size = 16, family = "xkcd"),
        legend.position = c(0.85, 0.2))



comments powered by Disqus