Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.
In my last post, I found that experience has a significant impact on the salary of engineers. Is the significance of experience on wages unique to engineers or are there similar correlations in other occupational groups?
I will use the same model in principal as in my previous post to calculate the significance of age. I will not use sex as an explanatory variable since there are occupational groups that do not have enough data for both genders. I will also use a polynomial of degree three since this provides a significant model fit for some occupational groups.
There are still occupational groups with too little data for regression analysis. More than 30 posts are necessary to fit both age and year.
The R-value from the Anova table is used as the single value to discriminate how much the age and salary correlates. For exploratory analysis, the Anova value seems good enough.
In the figure below I will also use the estimate for the year to see how much the salaries are raised each year for the different occupational groups holding age as constant.
library (tidyverse) ## -- Attaching packages -------------------------------------------- tidyverse 1.2.1 -- ## v ggplot2 3.2.0 v purrr 0.3.2 ## v tibble 2.1.3 v dplyr 0.8.3 ## v tidyr 0.8.3 v stringr 1.4.0 ## v readr 1.3.1 v forcats 0.4.0 ## -- Conflicts ----------------------------------------------- tidyverse_conflicts() -- ## x dplyr::filter() masks stats::filter() ## x dplyr::lag() masks stats::lag() library (broom) library (car) ## Loading required package: carData ## ## Attaching package: 'car' ## The following object is masked from 'package:dplyr': ## ## recode ## The following object is masked from 'package:purrr': ## ## some library (polynom) readfile <- function (file1){read_csv (file1, col_types = cols(), locale = readr::locale (encoding = "latin1"), na = c("..", "NA")) %>% gather (starts_with("19"), starts_with("20"), key = "year", value = salary) %>% drop_na() %>% mutate (year_n = parse_number (year)) }
The data table is downloaded from Statistics Sweden. It is saved as a comma-delimited file without heading, 000000D2.csv, http://www.statistikdatabasen.scb.se/pxweb/en/ssd/.
The table: Average basic salary, monthly salary and women´s salary as a percentage of men´s salary by sector, occupational group (SSYK 2012), sex and age. Year 2014 – 2018 Monthly salary All sectors
tb <- readfile("000000D2.csv") %>% rowwise() %>% mutate(age_l = unlist(lapply(strsplit(substr(age, 1, 5), "-"), strtoi))[1]) %>% rowwise() %>% mutate(age_h = unlist(lapply(strsplit(substr(age, 1, 5), "-"), strtoi))[2]) %>% mutate(age_n = (age_l + age_h) / 2) summary_table = 0 anova_table = 0 for (i in unique(tb$`occuptional (SSYK 2012)`)){ temp <- filter(tb, `occuptional (SSYK 2012)` == i) if (dim(temp)[1] > 30){ model <-lm (log(salary) ~ year_n + poly(age_n, 3), data = temp) summary_table <- rbind (summary_table, mutate (tidy (summary (model)), ssyk = i)) anova_table <- rbind (anova_table, mutate (tidy (Anova (model, type = 2)), ssyk = i)) } } merge(summary_table, anova_table, by = "ssyk", all = TRUE) %>% filter (term.y == "poly(age_n, 3)") %>% filter (term.x == "year_n") %>% ggplot () + geom_point (mapping = aes(x = estimate, y = statistic.y)) + labs( x = "Increase in salaries (% / year)", y = "F-value for age" )
The table with all occupational groups sorted by F-value in descending order.
merge(summary_table, anova_table, by = "ssyk", all = TRUE) %>% filter (term.y == "poly(age_n, 3)") %>% filter (term.x == "year_n") %>% select (ssyk, estimate, statistic.y) %>% rename (`F-value for age` = statistic.y) %>% rename (`Increase in salary` = estimate) %>% arrange (desc (`F-value for age`)) %>% knitr::kable( booktabs = TRUE, caption = 'Correlation for F-value (age) and the yearly increase in salaries with age held as constant')
ssyk | Increase in salary | F-value for age |
---|---|---|
234 Primary- and pre-school teachers | 0.0345563 | 1349.859088 |
233 Secondary education teachers | 0.0294574 | 861.331070 |
532 Personal care workers in health services | 0.0285338 | 800.259659 |
336 Police officers | 0.0284911 | 675.571576 |
223 Nursing professionals (cont.) | 0.0303955 | 625.404523 |
214 Engineering professionals | 0.0192393 | 612.414362 |
235 Teaching professionals not elsewhere classified | 0.0245885 | 578.686817 |
266 Social work and counselling professionals | 0.0316617 | 551.888399 |
221 Medical doctors | 0.0150176 | 449.792500 |
251 ICT architects, systems analysts and test managers | 0.0249600 | 415.590103 |
534 Attendants, personal assistants and related workers | 0.0191811 | 406.258604 |
231 University and higher education teachers | 0.0254827 | 404.602202 |
222 Nursing professionals | 0.0414071 | 371.107319 |
533 Health care assistants | 0.0205813 | 345.075594 |
531 Child care workers and teachers aides | 0.0219044 | 291.049608 |
351 ICT operations and user support technicians | 0.0211211 | 271.091961 |
159 Other social services managers | 0.0251218 | 191.570380 |
211 Physicists and chemists | 0.0207272 | 186.366824 |
321 Medical and pharmaceutical technicians | 0.0288946 | 177.137635 |
152 Managers in social and curative care | 0.0387001 | 164.636802 |
243 Marketing and public relations professionals | 0.0150173 | 154.310784 |
723 Machinery mechanics and fitters | 0.0204993 | 146.299981 |
125 Sales and marketing managers | 0.0187356 | 145.732333 |
141 Primary and secondary schools and adult education managers | 0.0346753 | 142.578762 |
341 Social work and religious associate professionals | 0.0255830 | 137.073911 |
133 Research and development managers | 0.0137728 | 135.323107 |
153 Elderly care managers | 0.0331514 | 132.163025 |
242 Organisation analysts, policy administrators and human resource specialists | 0.0223881 | 132.013557 |
332 Insurance advisers, sales and purchasing agents | 0.0176134 | 128.196288 |
218 Specialists within environmental and health protection | 0.0258110 | 120.206634 |
311 Physical and engineering science technicians | 0.0213202 | 119.371812 |
422 Client information clerks | 0.0175877 | 117.057208 |
411 Office assistants and other secretaries | 0.0250406 | 115.401389 |
264 Authors, journalists and linguists | 0.0158766 | 107.667527 |
226 Dentists | 0.0230213 | 99.061845 |
232 Vocational education teachers | 0.0298647 | 93.534293 |
122 Human resource managers | 0.0365348 | 86.595103 |
342 Athletes, fitness instructors and recreational workers | 0.0162825 | 86.085107 |
515 Building caretakers and related workers | 0.0188443 | 85.346469 |
123 Administration and planning managers | 0.0423650 | 81.886461 |
137 Production managers in manufacturing | 0.0267995 | 80.958767 |
227 Naprapaths, physiotherapists, occupational therapists | 0.0212967 | 78.930141 |
132 Supply, logistics and transport managers | 0.0135557 | 78.186301 |
817 Wood processing and papermaking plant operators | 0.0289197 | 75.983376 |
441 Library and filing clerks | 0.0210449 | 75.872685 |
131 Information and communications technology service managers | 0.0431537 | 75.423080 |
343 Photographers, interior decorators and entertainers | 0.0339142 | 75.132287 |
241 Accountants, financial analysts and fund managers | 0.0270620 | 71.204029 |
216 Architects and surveyors | 0.0241267 | 68.945982 |
134 Architectural and engineering managers | 0.0236760 | 68.279874 |
228 Specialists in health care not elsewhere classified | 0.0272838 | 64.426085 |
213 Biologists, pharmacologists and specialists in agriculture and forestry | 0.0144849 | 63.378555 |
831 Train operators and related workers | 0.0177987 | 55.404356 |
334 Administrative and specialized secretaries | 0.0292702 | 52.477105 |
335 Tax and related government associate professionals | 0.0227003 | 49.850281 |
224 Psychologists and psychotherapists | 0.0270655 | 47.653074 |
511 Cabin crew, guides and related workers | 0.0069736 | 47.413185 |
812 Metal processing and finishing plant operators | 0.0176743 | 47.395879 |
331 Financial and accounting associate professionals | 0.0229113 | 45.186053 |
261 Legal professionals | 0.0292942 | 44.569161 |
819 Process control technicians | 0.0232825 | 43.919550 |
333 Business services agents | 0.0263028 | 43.327180 |
961 Recycling collectors | 0.0225031 | 42.772133 |
312 Construction and manufacturing supervisors | 0.0322029 | 41.767797 |
516 Other service related workers | 0.0202784 | 41.325733 |
262 Museum curators and librarians and related professionals | 0.0228651 | 40.378111 |
265 Creative and performing artists | 0.0252235 | 39.119906 |
741 Electrical equipment installers and repairers | 0.0221901 | 38.176541 |
524 Event seller and telemarketers | 0.0203373 | 36.349688 |
941 Fast-food workers, food preparation assistants | 0.0199578 | 35.998201 |
815 Machine operators, textile, fur and leather products | 0.0128372 | 33.582965 |
962 Newspaper distributors, janitors and other service workers | 0.0141958 | 32.540073 |
136 Production managers in construction and mining | 0.0264825 | 31.006282 |
834 Mobile plant operators | 0.0251599 | 30.439935 |
816 Machine operators, food and related products | 0.0198706 | 29.543569 |
129 Administration and service managers not elsewhere classified | 0.0171682 | 29.032377 |
212 Mathematicians, actuaries and statisticians | 0.0240773 | 28.949679 |
352 Broadcasting and audio-visual technicians | 0.0067079 | 28.725776 |
513 Waiters and bartenders | 0.0214795 | 28.455515 |
813 Machine operators, chemical and pharmaceutical products | 0.0254550 | 26.563325 |
151 Health care managers | 0.0211530 | 24.870942 |
611 Market gardeners and crop growers | 0.0089573 | 23.602904 |
732 Printing trades workers | 0.0191704 | 23.581610 |
432 Stores and transport clerks | 0.0217702 | 22.969527 |
217 Designers | 0.0252062 | 22.823943 |
161 Financial and insurance managers | 0.0518758 | 21.908728 |
711 Carpenters, bricklayers and construction workers | 0.0136555 | 20.268520 |
541 Other surveillance and security workers | 0.0239438 | 19.245270 |
179 Other services managers not elsewhere classified | 0.0272448 | 17.108091 |
911 Cleaners and helpers | 0.0176513 | 16.284355 |
512 Cooks and cold-buffet managers | 0.0278549 | 15.787404 |
814 Machine operators, rubber, plastic and paper products | 0.0245275 | 15.256042 |
267 Religious professionals and deacons | 0.0268407 | 11.266331 |
761 Butchers, bakers and food processors | 0.0153660 | 11.168879 |
722 Blacksmiths, toolmakers and related trades workers | 0.0192713 | 10.890741 |
121 Finance managers | 0.0276643 | 9.785317 |
752 Wood treaters, cabinet-makers and related trades workers | 0.0269102 | 9.779896 |
713 Painters, Lacquerers, Chimney-sweepers and related trades workers | 0.0259098 | 9.415854 |
932 Manufacturing labourers | 0.0266336 | 9.113769 |
522 Shop staff | 0.0267679 | 8.247675 |
818 Other stationary plant and machine operators | 0.0237780 | 6.983074 |
344 Driving instructors and other instructors | 0.0286480 | 6.971261 |
523 Cashiers and related clerks | 0.0041737 | 4.970851 |
833 Heavy truck and bus drivers | 0.0188392 | 4.786235 |
912 Washers, window cleaners and other cleaning workers | 0.0382761 | 4.701424 |
821 Assemblers | 0.0286219 | 1.405402 |
Let’s check what we have found.
temp <- tb %>% filter(`occuptional (SSYK 2012)` == "234 Primary- and pre-school teachers") temp %>% ggplot () + geom_point (mapping = aes(x = year_n,y = salary, colour = age)) + facet_grid(. ~ sex) + labs( x = "Year", y = "Salary (SEK/month)" )
model <-lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp) summod <- tidy(summary (model)) temp %>% ggplot () + geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) + labs( x = "Age", y = "Salary" )
pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5]))) temp %>% ggplot () + geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) + labs( x = "Age", y = "Salary raise (%)" )
temp <- tb %>% filter(`occuptional (SSYK 2012)` == "821 Assemblers") temp %>% ggplot () + geom_point (mapping = aes(x = year_n,y = salary, colour = age)) + facet_grid(. ~ sex) + labs( x = "Year", y = "Salary (SEK/month)" )
model <-lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp) summod <- tidy(summary (model)) temp %>% ggplot () + geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) + labs( x = "Age", y = "Salary" )
pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5]))) temp %>% ggplot () + geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) + labs( x = "Age", y = "Salary raise (%)" )
temp <- tb %>% filter(`occuptional (SSYK 2012)` == "161 Financial and insurance managers") temp %>% ggplot () + geom_point (mapping = aes(x = year_n,y = salary, colour = age)) + facet_grid(. ~ sex) + labs( x = "Year", y = "Salary (SEK/month)" )
model <- lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp) summod <- tidy(summary (model)) temp %>% ggplot () + geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) + labs( x = "Age", y = "Salary" )
pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5]))) temp %>% ggplot () + geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) + labs( x = "Age", y = "Salary raise (%)" )
temp <- tb %>% filter(`occuptional (SSYK 2012)` == "523 Cashiers and related clerks") temp %>% ggplot () + geom_point (mapping = aes(x = year_n,y = salary, colour = age)) + facet_grid(. ~ sex) + labs( x = "Year", y = "Salary (SEK/month)" )
model <-lm (log(salary) ~ year_n + poly(age_n, 3, raw = T), data = temp) summod <- tidy(summary (model)) temp %>% ggplot () + geom_point (mapping = aes(x = age_n,y = age_n * summod$estimate[3] + summod$estimate[4] * age_n^2 + summod$estimate[5] * age_n^3)) + labs( x = "Age", y = "Salary" )
pdx <- deriv(as.polynomial(c(0, summod$estimate[3], summod$estimate[4], summod$estimate[5]))) temp %>% ggplot () + geom_point (mapping = aes(x = age_n, y = summod$estimate[2] + pdx[1] + pdx[2] * age_n + pdx[3] * age_n^2)) + labs( x = "Age", y = "Salary raise (%)" )
R-bloggers.com offers daily e-mail updates about R news and tutorials about learning R and many other topics. Click here if you're looking to post or find an R/data-science job.
Want to share your content on R-bloggers? click here if you have a blog, or here if you don't.