Regional GVA and job count data: easy squeezy versions

What’s this?

  • Easy-access ONS regional gross value added (GVA) by SIC sector data, supplied as tidy CSVs: ITL2 & 3 geographies, and SIC industrial codes at 2-digit, Section (20 categories) and Production/Construction/Services, 1998 to most recent available year, 2022. (Original ONS excel, code, example use in R: location quotients / proportion plots.)
  • Same for BRES (the Business Register and Employment Survey) region-by-sector job count data, with extra-bonus higher accuracy values (see below). 2015 to latest year, 2023. (NOMIS page, code for API & wrangling.)
  • Those two linked together into single GVA/jobcount datasets (2015 to 2022). (Code.)
See below the data for ‘Why’s this?’

GET THE DATA —>


REGIONAL GVA

Table: regional GVA
ITL2 ITL3
3GROUPS long CP, wide CPlong CV, wide CV long CP, wide CPlong CV, wide CV
SIC_SECTION long CP, wide CPlong CV, wide CV long CP, wide CPlong CV, wide CV
SIC_2DIGIT long CP, wide CPlong CV, wide CV long CP, wide CPlong CV, wide CV


These CSVs are closest to the original ONS excel sheet. Code going from the xls to these is here.

GVA values are all in millions of pounds, same as the original.

As well as ITL2 and ITL3 geographical zones, there’s:

  • “3GROUPS”: SIC sectoral groupings for production, construction and services separately.
  • “SIC_SECTION”: 20 SIC top-level sections (SIC codes with letters A to T) (see SIC hierarchy).
  • “SIC_2DIGIT”: 2 digit codes, the most granular level available in the regional GVA data. NOTE HOWEVER: here, ONS combine a few of them, so these are are a smaller bespoke list in the GVA data. For ITL2 zones, there are 72 SIC categories. For ITL3 zones, there are 48 SIC categories.

There a couple of other choices of CSV type:

  • CP is “current prices”, CV is “chained volume”.
  • “Current prices” gives pound values at the time the data was collected, without inflation accounted for. Useful for analysing relative change and comparing places. And because it’s the actual price values at the time, your own groupings of places and sectors can be summed without problems.
  • Chained Volume” adjusts for inflation - “by calculating the production volume for each year in the prices of a reference year. The resultant time-series of production figures has the effects of price changes removed.” Prices are set relative to 2019. So, absolute change over time can be analysed - e.g. “has sector x in place y grown at a more rapid rate than others?” (With current price data, we can only ask, “has sector x / place y grown relative to others? An entire sector may have shrunk, but one sector/place may still have relatively increased its proportion.)
  • Chained volume data cannot be summed by the user, you can’t add places or sectors together, as each has been chained together to produce values relative to the reference year. This means that for some of the data below, it’s not possible to supply a chained volume (CV) version because it relies on being able to add up values. CV is included where possible.

Lastly:

  • Long data has a single “year” column containing all years.
  • Wide data is in the same format as the original Excel sheet, with a column for each year.

REGIONAL GVA excluding imputed rent

Table: regional GVA excluding imputed rent
ITL2 ITL3
3GROUPS CP, – CP, –
SIC_SECTION CV, CP CV, CP


The only difference here to the datasets above is that imputed rent has been removed.

There are various reasons leave imputed rent out, especially for productivity and jobs analysis. For the data here, it’s necessary remove it for linking to BRES: job count data doesn’t include it by default, as of course “imputed rent” has no jobs.

This list doesn’t include 2 digit SICs as you can just remove the “imputed rent” row yourself from the first lot of data if you don’t want it. For SIC Sections, the real estate section (L 68) only has two sub-rows, one with and one without imputed rent - so the ‘excluding imputed rent’ row can just be dropped in without having to sum. The 3GROUPS groupings, however, need summing, which isn’t valid for chained volume prices, so those aren’t present.


BRES JOB COUNT DATA (with more granular count values)


These CSVs are collated from the BRES dataset (using the API and wrangled here). They are broken into the following categories, a file for each combination:

  • Count of jobs for full time, part time and “in employment”1. Full and part-time summed is the same as the “employees” category, so that isn’t included - hover/click the footnote for details on what “employment” includes (a slightly larger number than ‘employees’).
  • ITL2 and ITL3 geography (2022 to 2023), and NUTS2 and 3 geography (2015 to 2022).2
  • The same three SIC sector groupings used above: “Production, construction and services” in 3GROUP, then SIC Sections and 2-digit SIC codes. The BRES data by itself has the full list of 2-digit sectors, not the cut-down version used in the GVA data. (Apart from imputed rent - this is true of the underlying data too.)

This version of the BRES data has slightly more accurate job counts3 than the original data, at these SIC group levels.


COMBINED GVA + BRES JOB COUNT DATA

Table: combined GVA and BRES data
ITL2 ITL3
3GROUPS CP, – CP, –
SIC_SECTION CV, CP CV, CP
SIC_2DIGIT CV, CP CV, CP


These datasets combine regional GVA and BRES job count data into single CSVs. They contain the same fields as the separate datasets above, but note these two points:

  1. The newer post-Brexit ITL geographies for 2021 are the same as the previous EU-wide NUTS 2016 ones - apart from one small difference: ITL3 has “Bournemouth, Christchurch and Poole”, NUTS 3 has “Bournemouth and Poole”. Christchurch moved from “Dorset” in the data in 2021. Due to the “chained volume GVA can’t be summed” issue, the solution used here is this: The ITL3 zones for “Bournemouth, Christchurch and Poole” / “Dorset” / “Somerset” are replaced with the ITL2 zone that overlaps them all (“Dorset and Somerset”). Apologies to those zones affected, this is a less-than-ideal solution to keep the time series4.
  2. BRES job count data has been summed to match the bespoke SIC 2 digit lists in the GVA data. For ITL2 zones, there are 72 SIC categories. For ITL3 zones, there are 48 SIC categories.

In the repo, there is a bespoke ITL3 geo-file reflecting point #1 with the Dorset / Somerset / Bournemouth+Poole zones replaced with the ITL2 for Dorset/Somerset. The geojson file (~1.6mb) is here on github (download button, top right) or raw here. It will join on either zone ID or zone name to the BRES + GVA combo CSVs above.

Other ways to get the data –>

(For the original ONS / NOMIS datasets, see the links at the top of this page.)

  • To get ALL the data in one go: download (or clone) the entire repo. To download - see the green “CODE” button on the main page? Use that for the ‘download ZIP’ option, then look in the data folder after unzipping.

Or to get individual files via github:

  • Look in the data folder on the repo - there are separate subfolders for each group of data. Details for each of those are explained below.
  • Github has 4 options to access/download the files: download a single CSV, download the whole thing, get raw data (and its URL), copy to clipboard.
  • So e.g. after clicking on a CSV there are three small buttons in the top right that look like this:

  • Use ‘raw’ to get the URL for raw CSV data (to e.g. get directly from URL in code)
  • The rightmost button will download the CSV.
  • The middle button will copy the data to the clipboard.

Why?

  • Main reason: contribute to reducing the effort / wrangling / wheel-reinventing needed to analyse UK regional economic data, including making sure all national data is together in one place, because any place needs comparing to others to understand it properly).
  • The ONS publish regional GVA data for the UK, covering a range of SIC sectors (2022 is the latest available year - the data’s here). It’s amazing data, presented in one handy place - but it needs some wrangling to get into an analyseable form, especially for use in R or similar.
  • This page provides easy-to-access copies from that dataset, with each useful dataset in its own CSV. It’ll be updated when the latest GVA data comes out.
  • Examples of analysing this data in R are in the left-hand bar. For instance, UK Sectors: LQs and proportion plots walks through using the data here to produce location quotients, maps and other plots.

Also:

  • UK regional/sectoral job count data is available via BRES (the Business Register and Employment Survey) on NOMIS. NOMIS is a brilliant platform, but again there are repeated steps usually needed before it’s in good shape for use. (And here we can also use NOMISR to get directly.)
  • It’s also possible to extract more accurate job count numbers by aggregating up from the most granular sector counts The BRES data below uses these more accurate counts.
  • So this page also has quick-access versions of all the most useful versions of the latest BRES data, including previously available years.

Also also:

  • It’s useful to have both regional GVA and job count values combined (for e.g. GVA per job analysis, or to have gva and job analysis using the same SIC categories and places) - so this page also has linked GVA and job count data.
  • There are several wrinkles with this data to get them to play nicely together that anyone using should be aware of - see details below.
  • Any thoughts on things that could be improved or that I’ve stuffed up, please let me know - either in the github issues or via email at danolner at gmail dot com (or any other contact methods listed top right on my code blog).

Automate!

Making data processing like this as automated and repeatable as possible makes it as open and useable as possible, which is a step towards quicker, more shareable analyses. As soon as new GVA or BRES data comes out, code from the repo can (hopefully!) be re-run painlessly5.

Quarto doc for this page is here.

Footnotes

  1. The split between full and part-time is more/less than 30 hours a week. The “in employment” category “includes employees plus the number of working owners. BRES therefore includes self-employed workers as long as they are registered for VAT or Pay-As-You-Earn (PAYE) schemes. Self employed people not registered for these, along with HM Forces and Government Supported trainees are excluded.”↩︎

  2. Apart from one tiny difference - see the section describing the GVA/BRES joined datasets for more info - ITL and the most recent NUTS zones are the same, but the NUTS zones don’t have data past 2022, and the ITL zones have none prior to 2022 (they overlap one year).↩︎

  3. Or I think so anyway. How come? The BRES rounding rules are applied to every SIC hierarchy the same - so 5 digit SICs are rounded the same as sections. Summing from 5-digit upwards - smaller average values here means smaller roundings applied - thus gives more granular job values. I’ve tested by checking the spread of the newer job counts around the original values, so e.g. comparing 2 digit SIC sums from this version and the values for 2 digit jobs in BRES itself. The new values spread evenly either side of the old, suggesting no introduced bias. Code for that test is in this script, in this section: ‘CHECK HOW THE “SECTION AND 2 DIGIT JOBCOUNTS SUMMED FROM 5 DIGIT” compare to the BRES original versions of 2 DIG and Sections’.↩︎

  4. The ‘current price’ data could be summed, but the best that could be achieved there to keep a consistent time series would be a zone combining Dorset with Bournemouth, Christchurch and Poole. For chained volume values, it has to be the larger ITL2 zone, for the reasons given. Rather than do both, here I’ve just used the one approach so all data is comparable directly.↩︎

  5. Well, that’s the theory. The underlying data structures might stay the same, you never know…↩︎