1 Introduction

This document describes the development of the Longitudinal Educational Outcomes for Northern Ireland (LEO NI) dataset and the data linkage methods employed. The LEO NI project aims to join Department of Education (DE) School Census data with Department for the Economy (DfE) datasets, including Training for Success/Skills for Life and Work (TfS), Apprenticeships NI (AppsNI), Further Education (FE), and Higher Education Statistics Authority (HESA).

Statisticians in NISRA’s Census Office act as a Trusted Third Party (TTP) for ADR NI research projects. Using limited personal information (such as name, date of birth, and postcode), TTP securely creates linkage keys for the datasets involved. These keys are then provided to NISRA’s Research Support Unit (RSU), which uses them to assemble anonymised research datasets. TTP never have access to attribute data, ensuring privacy and confidentiality.

This document will illustrate how the LEO NI has been built as a collection of the datasets with records grouped together by a LEO ID. The LEO ID is a unique identifier that is used to identify the same individual across all datasets. The methodology outlined was planned as much as possible prior to receiving the datasets and tested and refined through the feasibility assessment to ensure the linkage is of a high quality.

Note: Figures in this report are rounded to the nearest 1,000. As a result, totals may not sum due to rounding.

2 Data Included

2.1 DE School Census 2016/17 to 2023/24

Information on pupil enrolments is collected annually from schools through the School Census. This exercise provides an annual snapshot of pupil and school level data for each pre-school centre, nursery school, special, primary (including nursery units, reception and preschool age pupils enrolled in specialist provision), post-primary, EOTAS (educated other than at school) centre and independent school in Northern Ireland (NI). The date of the School Census is the Friday of the first full week in October. All schools are legally required to return a fully completed School Census return under the Education and Libraries (NI) Order 2003 Article 37. The LEO NI currently contains records for post primary and special school pupils from the academic years 2016/17 to 2023/24 who are aged 14 years and over prior to the commencement of the academic year.

Information available to the TTP for linkage purposes -

  • Unique Pupil Number (UPN) - anonymised version
  • Unique Learner Number (ULN) - anonymised version
  • Name information of the pupil which includes Forename, Midname and Surname
  • Address information
  • Sex
  • Date of birth
  • Year of school census
  • School Type

Industrial action led to the unavailability of pupil level data for approximately 25% of schools in 2023/24. This significantly reduced the records available for inclusion in the LEO NI for this academic year.

2.2 DfE Further Education 2017/18 to 2023/24

Further Education (FE) Colleges are the main providers of further education and training in Northern Ireland. The sector plays a central role in raising literacy and numeracy levels and in upskilling and reskilling the population through a broad range of courses leading to qualifications, particularly at National Qualifications Framework (NQF) and Qualifications and Credit Framework (QCF) levels 2 and 3 and their equivalents. Higher Education (at level 4 and above) is also delivered across the FE colleges.

The Consolidated Data Return (CDR) is a combined dataset populated with information from all six FE colleges in Northern Ireland and includes enrolment details for each student. The CDR is facilitated by data consultants employed by the Department. Data are entered into the administrative system by the FE colleges, who are responsible for ensuring the accuracy of the information recorded.

Information available to the TTP for linkage purposes -

  • Student ID - suffixed with FE college code and anonymised,
  • Unique Learner Number - anonymised version
  • Student Name which includes Forename(s), Surname and Previous Surname
  • Date of Birth
  • Sex
  • Home Address Postcode
  • FE College Attended

2.3 DfE Higher Education 2017/18 to 2022/23

The Higher Education Statistics Authority (HESA) collects enrolment data and supplies this to DfE. This data contains details of all NI-domiciled students, of all ages, who are enrolled in a government-funded Higher Education Institution (HEI) within NI, as well as students enrolled at any HEI elsewhere in the United Kingdom who have a home address in NI. Data is available for academic years 2017/18 to 2022/23. Data for the academic year 2023/24 is not yet available and will be expected to be added to the LEO NI build in the next phase. The data includes details of:

  • Full-time students
  • Part-time students
  • Foreign students, including visiting and exchange students (students based at an institution abroad who study at a UK institution for at least eight weeks)

Enrolment data is collected through returns submitted by HEIs based on their student records. Each HEI is responsible for providing accurate data to HESA for all students registered at their institution.

Information available to the TTP for linkage purposes -

  • HUSID - anonymised version
  • Student Name which includes Forename(s), Surname and Previous Surname at Age 16
  • Date of Birth
  • Sex
  • Home Address Postcode
  • Term-time Address Postcode
  • Course Commencement Date

2.4 DfE Training for Success/Skills for Life and Work and Apprenticeships NI 2017/18 to 2023/24

Training for Success (TfS) is a programme designed for young people, providing training to equip them with the tools and skills needed to gain employment. The programme closed to new entrants in 2020/21 and was replaced by Skills for Life and Work for Entry Level and Level 1, which commenced for new participants in 2021/22. As such, references to TfS throughout this document refer to both the Training for Success and Skills for Life and Work programmes.

The aim of Apprenticeships NI (AppsNI) is to offer participants the opportunity to achieve a Level 2 or Level 3 Apprenticeship. Apprentices are in paid employment from day one and work towards meeting the requirements of an industry-approved Level 2 or Level 3 Apprenticeship Framework. The TfS/AppsNI database, maintained by the Department for the Economy (DfE), is based on extracts taken every four weeks from DfE’s Client Management System (CMS) up to 31st July 2022 and monthly extracts from DfE’s Trainee and Apprentice Management System (TAMS) from 1st August 2022.

Information available to the TTP for linkage purposes -

  • ClientRef - anonymised version
  • Unique Learner Number (ULN) - anonymised version
  • Name Information which includes Forename(s) and Surname
  • Home Address information
  • Postcode
  • Sex
  • Date of Birth
  • Start Date and End Date

3 Methodology overview

3.1 Stage 1: Cleaning the datasets

Prior to linking records across datasets, the data received was standardised to ensure consistency and accuracy. This process involved quality assurance checks and cleaning of variables to identify and correct errors or inconsistencies in the raw data. Standardisation ensures that variables are formatted uniformly across datasets, with variable strings separated where required and any erroneous characters replaced. Clean data is crucial for the match-key linking process to be successful.

3.2 Stage 2: Pre-processing

Each data source contains an anonymised unique student identifier. Within the LEO Index, these identifiers enable linkage within datasets and across different academic years. Additionally, where populated, the Unique Learner Number (ULN) enables linkage between different datasets.

The datasets were initially de-duplicated to create a list of unique records based on their specific anonymised student identifiers and core demographic variables.

Some datasets contain additional fields that can be used in the matching process, such as multiple postcodes (e.g. term-time and home address) or surname variations in cases of name changes.

3.3 Stage 3: Combine datasets

Once pre-processing is complete, all records are appended together to form the LEO Index. When records are later linked to the same individual, these records are assigned the same LEO ID. This ensures that records from different datasets can be connected via this LEO ID. Further detail on how the LEO ID is updated is provided in Section 4.

3.4 Stage 4: Linkage via ULN, student identifiers and exact demographic matches

The initial linkage stage is divided into three parts:

1. Exact Match Using Demographics:
Initial matches are identified based on exact matches of full demographic variables: forename, surname, date of birth, sex, and postcode.

2. Linking via Unique Learner Number (ULN): Records are matched using the ULN where available.

3. Linking Within Datasets Using Student Identifiers: Records are linked within each dataset using the student identifiers specific to each dataset type.

When a link is identified between two or more records, the LEO ID for these records is updated so that all matching records share the same value. This ensures that linked records are grouped under a single individual.

The subsequent stage will involve more complex linkage using match-keys.

3.5 Stage 5: Linkage using match-keys

Further linkage methods are required as records for the same individual across different sources may not be identical. Variations can occur due to inaccuracies in names or dates of birth, the use of middle names instead of forenames, aliases, or surname changes. Additionally, individuals may have more than one student identifier within data sources.

To address these challenges, a ‘rule based’ or deterministic data matching approach, referred to as match-keys, is used. Match-keys are created by putting together pieces of information to create unique keys that can be used for automated matching.

The variation in recording demographic information across datasets can occur in a number of different forms. A single match-key alone cannot resolve all of the differences that occur between data sources, hence the need for multiple match-keys. A series of match-keys have been developed, each of which is designed to resolve particular inconsistencies between match pairs. The strongest level of matching links pairs of records that are identical on all matching fields, known as exact matching. An example of a non-exact match-key is one constructed from the first three characters of an individual’s forename and surname (tri-grams), combined with their date of birth and postcode district.

Typically, the match-keys are processed in a stepwise manner starting with the most exact match-key and working down to the last match-key, that allows for the greatest variation in demographic information. More information on data linkage can be found in Data Matching Using Northern Ireland Administrative Data: A Worked Example.

4 Summary of linkage

4.1 Records added to LEO Index

The LEO Index contains all pre-processed records from the four data sources. Combined, this amounts to 1.8 million records. The breakdown by data source is shown in Table 1. The records added to the LEO Index will differ from the records received as any duplicate records are removed and additional records are created for those with alternative postcodes or surnames during the pre-processing stage as detailed in Section 3.2.

Table 1: Records to be processed for LEO Index
Dataset Records received Unique Student IDs
School Census 608,000 242,000
FE 873,000 317,000
TfS/AppsNI 61,000 47,000
HESA 570,000 235,000
Total 2,112,000 842,000
Note: Values have been rounded to the nearest 1,000. Totals may not sum due to rounding.

4.2 How records are grouped

The LEO index contains demographic characteristics (name, date of birth, sex, and postcode) along with the ULN, student identifier, a unique dataset ID (prefixed by data source and academic year), and the LEO ID. The LEO ID will be the same for all groups of records that have been linked using the methods outlined in Sections 3.4 and 3.5.

The linkage process groups records together by identifying matches between two or more records. For example, if Record A is matched to Record B, and Record B is matched to Record C, then all three records will share the same LEO ID, even though a direct match between A and C may not be found. This scenario can occur due to missing student identifiers, postcode differences, or spelling errors. By leveraging indirect links, the process ensures that all related records are grouped under a single LEO ID. Any additional records linked to A, B or C during subsequent matching steps will also be included in the same group.

4.3 Linkage using ULN, student identifier and exact matches

Links established during the initial linkage stage (Section 3.4) connect individuals within datasets and across datasets using exact matches, ULN, or student identifiers. There will be links between the School Census records and the DfE datasets, as expected, and there will also be links between different DfE dataset types as students may move from Further Education to Higher Education or between TfS/AppsNI and Further Education etc.

The prior expectation of this is unknown so cannot be easily used to determine the linkage rate. One method of measuring the linkage rate is to identify the quantity of School Census individuals who link to at least one DfE dataset.

Testing this linkage in the LEO NI feasibility assessment found that 63.7% of the School Census records had a link to a DfE record based on exact matches, ULN or student ID. Looking only at those born before 2003, and so would have been expected to have left school by the last academic year included in the feasibility dataset (2021/22), then 79.1% of this target cohort of School Census students had a link.

4.4 Linkage using match-keys

When linkage based on the initial matching of ULN, Student Identifier, and exact matches is supplemented with match-keys, the overall linkage rate increases. During the LEO NI feasibility assessment, this approach improved the initial linkage rate from 63.7% to 69.3%, an increase of 5.6 percentage points.

The direct impact is not so easily measured given the multiple years of School Census data now included in the LEO NI compared to the single year utilised in the feasibility assessment. However, a similar positive effect on the linkage would be expected when match-keys are applied. Detailed breakdowns of linkage rate by age and academic year are explored in Section 6.

4.5 Development of methodology

Initial clerical checks informed the development of match-keys and adaptations to standard matching processes to prevent records being linked in error. One viable method to determine records linked in error is within the School Census cohort. As all individuals in the School Census are unique, any two School Census individuals grouped together via a link to another dataset should be considered erroneous.

Such links can occur because the records:

  • Have very similar demographics for example, similar forename, surname, date of birth, and, during the LEO lifetime, an address in the same postcode area,
  • Have been issued with the same ULN in error in one of the applicable datasets; School Census, Further Education or TfS/AppsNI.

5 Match-key overview

5.2 Match-key variations

The following match-key variations were used alongside standard automated match-keys to improve linkage accuracy:

  • Trigrams, quad-grams, etc – These are used in non-exact match-keys where only the first section of characters from a name is used for matching rather than the full name. An example where this could be achieved in linking would be Deborah and Debbie using the first three letters (trigram) or Alex and Alexander using the first four letters (quad-gram).
  • Character Index – This method of matching ascertains a match when one name string is fully contained within another. This can resolve inconsistencies that a trigram and a quad-gram cannot, such as Elizabeth and Beth, Victoria and Tori. For this to match, all letters of one name must appear in the other.
  • Levenshtein Standard Distance – This matching method accounts for inconsistencies in the spelling of names that would not be picked up by the above. It measures the distance between two strings based on the number of transformations required to transform one string into another. As per previous discussions with ONS on the methodology used with Levenshtein distance, a threshold of 0.7 was used. If the distance is above this threshold, then the two variables are considered to match. Some examples are Johnathan and Jonathan (0.889), Michael and Miceal (0.714).
  • Forename Variation – This method refers to variations of common names used. It references a lookup table of forenames and their variations so that the full forename in one dataset can match a variation on another, for example, Thomas will match to Tommy.

5.3 Match-key clerical checks

Clerical checks enabled the development of the match-keys. After any alterations to match-keys, the verification process was repeated to ensure accuracy. For those match-keys with a volume of matches under 1,000, all matches were checked. If the volume was higher, a sample of several hundred was taken. Based on the clerical checking the accuracy of the final matching process was over 99%.

6 Results by data source

6.1 Linkage between data sources

The LEO NI index resulted in 1.8 million records being attributed to 552,000 unique individuals. Given there are multiple links developed between data sources as well as within data sources, it can be complicated to create a picture on the quality of the linkage.

In the first instance, we will calculate general linkage rates where one data source is linked to at least one other data source. Figure 1 illustrates the overall linkage rates between the LEO source datasets with the volume of individuals present in one data source where a link was identified in another source.

Figure 1: Linkage rates for LEO data sources

Figure 1 shows the varying linkage rates between the different data sources with the majority of School Census (57%) and TfS/AppsNI (80%) students linking to at least one other data source. In comparison, 45% of Further Education and 39% of Higher Education students linked to another data source.

LEO NI demonstrates the diverse educational pathways available to students. For example, a pupil enrolled at school level may also study a course at an FE institution and later progress to study in HESA. Mature students may also undertake various Further and Higher Education programmes (DfE pathways), even if their school attendance predates the LEO NI timespan.

Considering the interconnections of students throughout all data sources, Figure 2 illustrates the profile of each individual (LEO ID) in the LEO NI index where a link has been identified. It shows how individuals present in one data source connect to at least one other data source.

The x-axis represents mutually exclusive combinations of data sources linked to each LEO ID. For example, a bar for ‘School Census + FE + HESA’ indicates individuals linked across all three sources. These combinations are exclusive, meaning an individual linked to three sources will not appear in subsets showing only two of these sources. The bars display the volume of LEO IDs for each combination, highlighting the intersections identified within the LEO NI linkage.

The green bars (to the left of the main chart) indicate the total number of records in each data source that have at least one link to another source, as referenced in Figure 1. Blue dots and connecting lines below the x-axis identify which data sources are included in each intersection. This visualization highlights the complexity of educational pathways, showing how individuals move across different stages of education and training.

Figure 2: Linkage between LEO data sources

For all LEO IDs, the most common links were identified as being between the School Census and FE (49,000). This does not include LEO IDs in both School Census and FE that additionally had a link to TfS/AppsNI (20,000), or HESA (15,000), and also both TfS/AppsNI and HESA (1,000).

6.2 Assessing the linkage

A large proportion of individuals across DfE data sources do not link to a School Census record, 65% of FE, 39% of TfS/AppsNI and 71% of HESA. This is expected to some extent, the DfE datasets contain the population of all students at an institution or programme of study during the academic year. Within these datasets there will be a volume of students who would not have been expected to appear in the School Census during the timespan of this phase of the LEO NI project. Many of these can be identified through their age, however for students who may have entered from another country, this cannot be determined conclusively.

Similarly, a large proportion of School Census records (43%) have no link to any of the three DfE data sources. Within the School Census dataset, a number of younger students, particularly in later years, would still be expected to be attending school by the end of the LEO NI timespan.

The most meaningful way to assess the linkage is to examine School Census records that link to one of the three DfE data sources, and conversely, individual DfE data records that link back to a record in the School Census. Further categorising the linkage rates by age within the academic year, we can see that the records that do not link to another dataset, in most cases, would not have been expected to. To explain this, the following evaluations of the rate of the linkage are discussed:

  • School Census records linked to a DfE record by age cohort
  • Further Education records linked to School Census by age cohort
  • Training for Success and Apprenticeships NI records linked to School Census by age cohort
  • Higher Education records linked to School Census by age cohort.

These evaluations help to identify expected and unexpected non-linkages, providing insight into data completeness and population coverage across the LEO NI project.

6.3 School Census

The number of School Census students that matched to at least one of the DfE datasets during any academic year was 139,000 representing 57% of the School Census data. This linkage rate reflects expected patterns, as not all students in the School Census would have left school within the timeframe of this phase of the LEO NI project. Consequently, a complete match is not anticipated.

In addition, many pupils transition from school into pathways outside of DfE datasets, such as direct employment, unemployment, or relocation outside the United Kingdom. These scenarios explain why a proportion of School Census records remain unmatched. Understanding these patterns is essential for interpreting linkage coverage and assessing the representativeness of the linked dataset.

Table 2: School Census records linked to a DfE dataset by academic year
Academic year Proportion linked
2016/17 90%
2017/18 89%
2018/19 86%
2019/20 73%
2020/21 58%
2021/22 43%
2022/23 21%
2023/24 9%

Table 2 shows that for School Census records linking to at least one DfE data source across all years in the LEO NI linkage, the academic year 2016/17 recorded the highest linkage rate (90%). Whereas the lowest linkage rate occurred in 2023/24 with only 9% of the School Census records linking to a DfE source. This variation is expected, as more recent School Census years include a higher proportion of pupils who would still be in school and therefore less likely to have progressed into a DfE pathway.

A breakdown of linkage rates by each academic year and pupil age within the School Census provides a clearer understanding of the quality of the linkage and expected patterns. Table 3 presents linkage rates based on the ages of all School Census pupils at the end of each academic year in order to align with reported school leaving ages. Therefore, age is calculated as the pupil’s age on 1st July of the relevant academic year (e.g. age at 1st July 2018 for 2017/18).

Table 3 employs a colour scale to indicate linkage rates, where green represents higher linkage rates, yellow indicates mid-range values, and shades closer to red correspond to lower rates. It is important to note that low cohort volumes can lead to volatile linkage rates, most likely to be observed in the lowest and highest age brackets. For this analysis, a link is defined as a School Census student in a specific academic year being successfully linked to any record from a DfE pathway across any academic year from 2017/18 to 2023/24.

Table 3: School Census records linked to a DfE dataset by age at end of academic year
Age 2016/17 2017/18 2018/19 2019/20 2020/21 2021/22 2022/23 2023/24
15 90% 88% 81% 47% 39% 35% 10% 8%
16 90% 90% 88% 81% 47% 39% 35% 9%
17 90% 89% 89% 87% 78% 29% 16% 9%
18 90% 91% 90% 90% 88% 78% 23% 8%
19 80% 83% 87% 83% 81% 71% 38% 27%
20+ 57% 77% 78% 79% 89% 64% 53% 29%

The linkage rate is expected to be lowest among younger pupils in later academic years, as they may not yet have progressed to a DfE educational pathway. Matches within this group are more likely to occur with TfS/AppsNI or FE, which are typically entered after Year 12 (age 16), whereas entry to Higher Education usually occurs after Year 14 (age 18).

Data availability also affects linkage rates. The latest HESA data is for 2022/23, meaning pupils expected to enter HESA in 2023/24 are not yet captured. This gap is evident in Table 3, where linkage rates decline sharply for certain age groups. For example, 15 year olds in the 2019/20 School Census would reach Year 14 in 2022/23 and likely enter HESA in 2023/24, so the absence of this data contributes to the drop. A cascading effect occurs in later years, where linkage rates are impacted both by missing HESA data and by pupils still being of school age.

Examining individual ages, the highest linkage rate for those aged 15 was 90% for 2016/17, 88% for 2017/18 and 81% for 2018/19. This rate declines to 10% for 2022/23 reflecting that a large proportion of 15 year olds would not have expected to leave school by the 2023/24 academic year. For older age groups, linkage rates can appear erratic due to small cohort sizes. For example, the rate for 19 year olds in 2016/17 was 80% compared to 38% in 2022/23. In terms of volume, ages 15 and 16 make up the largest cohorts, followed by 17 year olds and 18 year olds. Those aged 19 and above are outliers, resulting in smaller samples and less stable rates.

A more meaningful measure of the linkage performance can be obtained by limiting the rate calculation to only those pupils in the School Census who could reasonably be expected to appear in the available DfE data. This requires accounting for differences in expected school leaving ages across DfE pathways (16 or 18) and the fact that the last academic year for which all datasets are complete is 2022/23. Any pupils who may enter HESA in 2023/24 should be excluded from the calculation, as linkage to this key source is not yet possible. For this reason, each age group is included only up to the last year in which linkage to HESA is feasible: up to 2018/19 for 15 year olds, 2019/20 for 16 year olds, 2020/21 for 17 year olds and 2021/22 for those aged 18 and above. When focusing on the pupils who fall within these ranges i.e. limited to those expected to link given the data available, an overall linkage rate of 87% to any DfE source has been achieved.

6.4 Further Education

To assess the linkage of FE students against the School Census it is important to focus on age groupings where a link would be expected. Students can enter FE directly from school or return at a later stage. According to DfE official statistics, around one third of students each year, will be aged 25 or older. As such, this group would be unlikely to link to School Census records. By examining cohorts across a range of ages, we gain a clearer understanding of linkage rates.

Table 4 shows the proportion of students aged between 11 and 27 at the end of the FE academic year, along with the corresponding linkage rate for this cohort. For this analysis, a link is defined as an FE student during the specific academic year being successfully linked to a record from any point in the School Census from 2016/17 to 2023/24.

It is important to note that FE non-regulated courses can be offered to students aged 11 and over, covering areas such as social needs. Linkage rates for earlier academic years are high due to concurrent or subsequent attendance within the School Census population. The sharp reduction to 0% for later years reflects that only learners aged at least 14 appear in the LEO NI School Census data in any academic year. Consequently, those who were aged 11 at the end of 2020/21 would not reach the minimum age for inclusion in the School Census until 2024/25, which is not yet included in the dataset.

Table 4 Further Education records linked to School Census by age at end of academic year

Age 2017/18 2018/19 2019/20 2020/21 2021/22 2022/23 2023/24
11 91% 90% 62% 0% 0% 0% 0%
12 87% 84% 81% 80% 0% 0% 0%
13 92% 95% 89% 94% 94% 0% 0%
14 78% 85% 76% 78% 74% 83% 5%
15 95% 96% 95% 94% 94% 96% 77%
16 96% 93% 94% 91% 93% 95% 95%
17 96% 96% 96% 97% 96% 95% 96%
18 35% 95% 95% 96% 95% 95% 93%
19 38% 54% 94% 94% 95% 94% 94%
20 7% 45% 56% 93% 92% 92% 93%
21 0% 10% 43% 56% 91% 90% 91%
22 0% 0% 10% 46% 57% 89% 88%
23 0% 0% 0% 8% 47% 55% 88%
24 0% 0% 0% 0% 8% 44% 53%
25 0% 0% 0% 0% 0% 7% 44%
26 0% 0% 0% 0% 0% 0% 8%
27 0% 0% 0% 0% 0% 0% 0%

The volume of students by age within the FE records show that those aged 17-19 at the end of the academic year are most common, comprising over 47% of students annually. This reflects the fact that the majority of FE courses are primarily targeted at these age groups and older. For pupils enrolled in school who transition to FE, this typically occurs after Year 12, when they will have turned 16 during the academic year, or possibly after completing Years 13 or 14.

The linkage rates for these students aged 17-19 are generally high, over 94% with the exception of 2017/18 for 18 and 19 year olds and 2018/19 for 19 year olds. Many students within these cohorts would have completed Year 12 schooling prior to the first school year of the LEO project (2016/17), and so these cohorts will have a lower proportion of linked students. Similarly, the cascading lower rates in subsequent years follow the same logic, that many FE students of older ages would not have been attending school during the School Census timespan. To get an overall linkage rate, a more meaningful value can be obtained by restricting the calculation to FE learners who fall within the age range for which a School Census record should realistically exist. Since the School Census only includes pupils aged 14 and above, and the earliest year available in the dataset is 2016/17, only learners who would have reached this age threshold during or before that year can be considered eligible to link.

Under this approach, the first FE group eligible for linkage comprises those aged 17 in 2017/18, as they would have been old enough to appear in the 2016/17 School Census. In the years that follow, the pool of eligible learners includes both those progressing into older ages (18, 19, and so on) and those newly reaching age 17, continuing through to 2023/24.

When the linkage rate is calculated using only this age appropriate subset i.e. learners for whom a corresponding School Census entry should be present the overall match rate achieved is 94%.

Overall, Table 4 illustrates that linkage rates are highest for pupil cohorts expected to have a link with the School Census. As we move further from the target pupil cohort, the linkage rates begin to decrease as the proportion of pupils expected to be in a School Census start to decrease.

6.5 Training for Success/Skills for Life and Work and Apprenticeships NI

Entrance into Training for Success/Skills for Life and Work (TfS) and Apprenticeships NI (AppsNI) typically occurs following Year 12, when pupils turn 16 during the previous academic year. For example, TfS targets young people aged 16‑17, with extended eligibility for those with a disability (under 22) or in care (under 24).

Because these programmes primarily serve older school leavers, linkage rates for 16 year olds can be erratic due to the low volume of students at that age. This means that small sample sizes can produce volatile rates. Across all academic years, 17 year olds form the largest cohort, followed by 18 and 19 year olds. This pattern aligns with the typical pathway from school into vocational and apprenticeship routes.

Table 5: TfS/AppsNI records linked to School Census by age at end of academic year

Age 2017/18 2018/19 2019/20 2020/21 2021/22 2022/23 2023/24
16 80% 75% 25% 92% 88% 83% 68%
17 96% 97% 97% 97% 97% 96% 95%
18 38% 95% 95% 95% 97% 96% 94%
19 22% 33% 96% 96% 98% 97% 97%
20 6% 30% 52% 96% 95% 95% 96%
21 0% 3% 39% 55% 93% 94% 94%
22 0% 0% 6% 42% 51% 93% 92%
23 0% 0% 0% 6% 39% 57% 89%
24 0% 0% 0% 0% 5% 36% 49%
25 0% 0% 0% 0% 0% 6% 28%
26 0% 0% 0% 0% 0% 0% 4%
27 0% 0% 0% 0% 0% 0% 0%

As with Further Education, a similar overall trend is observed within TfS/AppsNI. Linkage rates are generally high (exceeding 94%) for those aged 17, 18, and 19 with the exception of 2017/18 for 18 and 19 year olds and 2018/19 for 19 year olds. These exceptions are explained by the fact that many students in these groups would have completed Year 12 schooling prior to the first school year of the LEO project (2016/17), reducing the likelihood of a link to the School Census.

Similarly, the cascading lower linkage rates in subsequent years can be explained by the age profile of students entering TfS or AppsNI. Older entrants would have left school before the earliest School Census year available, meaning no corresponding record could exist. This naturally results in declining linkage rates as we move further from the years in which learners would have been present in the School Census.

To provide a fair assessment of linkage performance, the calculation is therefore restricted to learners who are old enough to have appeared in at least one of the School Census years included. The first group meeting this criterion comprises those aged 17 in 2017/18, as they would have been 16 in the earliest School Census year, 2016/17. In each subsequent year, the age eligible group consists of both learners carried forward from earlier years (now aged 18, 19, and above) and those newly reaching age 17, continuing through to 2023/24.

When focusing only on learners who fall within this age eligible range i.e. those for whom a School Census record should reasonably exist, the overall linkage rate achieved is 96%.

Overall, Table 5 illustrates that linkage rates are highest for those cohorts expected to have a connection to the School Census. As the age profile shifts away from typical school leaver ages, the target cohort, linkage rates decrease. This reflects the diminishing likelihood of overlap with the School Census population.

6.6 Higher Education

The age range of the HESA datasets is very broad, with many older students not expected to have recently attended school. For example, in 2019/20, 60% of NI-domiciled students in HESA were aged 21 or over, and 29% were aged 25 or over. This demographic profile is reflected in the low overall match rate for each academic year.

In most cases, pupils progress to HESA only after the completion of Year 14 of school. At the end of the academic year, these students would typically be 18 years old. When isolating this cohort, linkage rates are significantly higher. For instance, among students who linked in 2021/22, approximately 30% had a match to the School Census. Narrowing this to only those expected to be in Year 14 in the previous year (turning 18 during the academic year), the linkage rate rises to 90%.

Table 6 presents the number of students aged between 17 and 27 at the end of the HESA academic year and the corresponding linkage rate for this cohort. For this analysis, a link is defined as a HESA student during a specific academic year being successfully linked to a record from any point in the School Census from 2016/17 to 2023/24 and so on.

Table 6: HESA records linked to School Census by age at end of academic year

Age 2017/18 2018/19 2019/20 2020/21 2021/22 2022/23
17 16% 9% 31% 19% 21% 12%
18 11% 17% 13% 11% 14% 8%
19 87% 87% 91% 91% 90% 86%
20 6% 82% 85% 90% 88% 87%
21 0% 6% 80% 83% 87% 86%
22 0% 0% 7% 78% 79% 84%
23 0% 0% 0% 8% 66% 69%
24 0% 0% 0% 0% 7% 53%
25 0% 0% 0% 0% 0% 5%
26 0% 0% 0% 0% 0% 0%
27 0% 0% 0% 0% 0% 0%

A slightly different approach is required when identifying the group of learners for whom a School Census record should exist in relation to HESA data. Since the earliest School Census year included is 2016/17 and higher education entrants would typically need to have been at least 18 during that year to be observed, the first age group eligible for linkage comprises those aged 19 in 2017/18.

In the years that follow, the age eligible population is made up of learners progressing into older ages (now 20, 21, and above), together with those newly reaching age 19. This continues up to 2022/23, which is the most recent year for which HESA data are available, as 2023/24 has not yet been received.

Restricting the linkage calculation to only those learners who fall within these age ranges, those who could reasonably have appeared in a School Census year covered by the dataset, results in an overall School Census linkage rate of 84%.

7 Conclusion

This document has outlined the methodology and results of the linkage developed by TTP to build the LEO NI dataset, linking the Department of Education (DE) School Census data to three Department for the Economy (DfE) data sources. The assessment of linkage quality, through clerical checks and refinements to the methodology as outlined in the feasibility study, demonstrates that the linkage is of high quality, with erroneous links kept to a minimum.

As discussed, the results from the matching process are complex to interpret as a straightforward linkage rate for the reasons listed below:

  • Many pupils in the School Census remained in school during the study period and therefore would not be expected to link to a DfE record.
  • Not all individuals in the DfE datasets were of school age during the 2016/17 to 2023/24 school years so would not be expected to link to a School Census record.
  • DfE data sources, particularly HESA, include students that will not have attended school in NI.

Despite these complexities, the interpreted results show that the linkage achieved high success rates for the cohorts expected to link. The makeup of these target cohorts varies depending on the DfE dataset, discussed in Section 6, but are generally those who would have been the correct age to appear in the School Census during a preceding year. Some key results include:

  • 87% of School Census pupils who would have been expected to leave school by 2021/22 linked to a DfE dataset record

  • 96% of pupils expected to link in the TfS and AppsNI dataset did have a link to a School Census record

  • 94% of pupils expected to link in the FE dataset did have a link to a School Census record

  • 84% of pupils expected to link in the HESA dataset did have a link to a School Census record.

8 Contact Details

Published by: Trusted Third Party, Census Office, Northern Ireland Statistics and Research Agency

Email:

Accessibility contact

Please contact Dissemination Branch for assistance with accessibility requirements or alternative formats. Contact details are:

Email:

Telephone: +44 (0)300 200 7836

Dissemination Branch
NISRA
Colby House
Stranmillis Court
BELFAST
BT9 5RR