Updated: 8th August 2023
This methodology outlines the steps that have been undertaken to get IATI data into a format that is useful for partner country governments. It identifies how data is retrieved, reprocessed and harmonised, and finally output. The steps broadly align with the steps undertaken in the previous work outlined in subsection 1.3. This methodology will be further refined and updated during the course of this work, in agreement with the IATI Secretariat.
After initial experiments with various APIs, the agreed approach is to download all data and then process it without using the IATI Datastore or another API. This approach is preferable given that this exercise downloads substantially all IATI data anyway. There are significant increases in performance that can be achieved through this approach.
Downloading data from IATI Data Dump takes 53 seconds for a zipped file of 537 MB (9GB unzipped), which contains all IATI data.
The data is retrieved once per day.
All activities in IATI version 2.01 or above are included. 94% of files currently published on the IATI Registry use version 2.01 or above. Limiting processing to these files reduces the cost of maintenance of the software going forward, and is likely to exclude a very small amount of out of date or poor-quality data.
The data is not subjected to any validation processes. That is, we use both valid and invalid data. Where data quality issues arise, these will generally be raised with the relevant publisher, rather than attempting to implement technical workarounds.
Each file is processed to extract a number of fields from each transaction or budget. In some cases, we fall back to data provided at the activity level where it is not provided in the transaction or budget.
The IATI Identifier and the reporting organisation are extracted from the activity in all cases.
The unique identifier for the activity:
iati-activity/iati-identifier/text()
The title of the activity. We have attempted to get the relevant language version where available, for our supported languages (English, French, Spanish and Portuguese). For each language, we fall back to the English-language title, or alternatively, the first title available:
iati-activity/title/narrative[not(@xml:lang) or @xml:lang='en']/text()
The description of the activity. We have attempted to get the relevant language version where available, for our supported languages (English, French, Spanish and Portuguese). For each language, we fall back to the English-language description, or alternatively, the first description available:
iati-activity/description/narrative[not(@xml:lang) or @xml:lang='en']/text()
The name of the organisation publishing this IATI data; we map these reporting organisations from the list of reporting organisations as recorded on the IATI Registry and made available in the (unofficial) ReportingOrganisation codelist:
iati-activity/reporting-org/@ref()
The type of the reporting organisation publishing this IATI data:
iati-activity/reporting-org/@type
The following fields are extracted from each transaction. Where these four fields do not exist, the transaction is not processed.
The transaction value in the published currency:
transaction/value/text()
The date of the transaction (which is used to aggregate transactions and in the output):
transaction/transaction-date/@iso-date
The transaction value date in the published currency (which is used as the date for currency conversion):
transaction/value/@value-date
The transaction type (incoming fund, outgoing commitment, disbursement, expenditure):
transaction/transaction-type/@code
Note: initially, transactions other than incoming funds, commitments, disbursements and expenditure have been discarded. This decision could be revised subsequently depending on demand and subject to the need to keep the processing time at a reasonable level.
For some fields, the data comes from either the transaction or the activity, depending on the publisher’s data.
The transaction currency, or the activity default currency:
transaction/@currency or iati-activity/@default-currency
The transaction aid type, or the activity default aid type (only DAC aid types are included):
transaction/aid-type[not(@vocabulary) or @vocabulary='1'] /@code or iati-activity/default-aid-type[not(@vocabulary) or @vocabulary='1'] /@code
The transaction finance type, or the activity default finance type:
transaction/finance-type/@code or iati-activity/default-finance-type/@code
The transaction flow type, or the activity default flow type:
transaction/flow-type/@code or iati-activity/default-flow-type/@code
The transaction provider organisation, or the activity reporting organisation. The name of the organisation is followed be the organisation identifier in [square brackets], where available. Note that we follow a similar process as for the Title to get names in English, French, Spanish and Portuguese, where available:
Name of the organisation
transaction/provider-org/narrative[not(@xml:lang) or @xml:lang='en']/text()
Organisation identifier
transaction/provider-org/@ref
The transaction receiver organisation, or the activity implementing organisation(s). The name of the organisation is followed be the organisation identifier in [square brackets], where available. Note that we follow a similar process as for the Title to get names in English, French, Spanish and Portuguese, where available::
Name of the organisation
transaction/receiver-org/narrative[not(@xml:lang) or @xml:lang='en']/text()
Organisation identifier
transaction/receiver-org/@ref
Where there is no transaction-level provider or receiver organisation, we use an organisation from another part of the activity. We use different fallbacks depending on which transaction type we are processing:
Transaction Type | Provider org | Receiver org |
---|---|---|
1 - Incoming Funds | Funding Org | Reporting Org |
2 - Outgoing Commitment | Reporting Org | Implementing Org |
3 - Disbursement | Reporting Org | Implementing Org |
4 - Expenditure | Reporting Org | Implementing Org |
Where there are multiple funding or implementing organisations, these are concatenated (joined) together with commas.
For reporting organisation, we use:
iati-activity/reporting-org/text()
For funding organisation:
iati-activity/participating-org[@role='1']/text()
For implementing organisation:
iati-activity/participating-org[@role='4']/text()
Finally, two fields (recipient country/region and sector) are extracted either from the transaction or activity. At the activity level, these can be published multiple times with percentage splits. The methodology for handling multiple values is described in the following section.
The transaction recipient country, or the list of activity recipient countries (where there are multiple countries, a column has been added to indicate that the transaction is part of a multi-country project):
transaction/recipient-country/@code or iati-activity/recipient-country/@code
Alternatively, if there are no recipient countries, we look for DAC regions:
transaction/recipient-rergion[not(@vocabulary) or @vocabulary='1']/@code or iati-activity/recipient-region[not(@vocabulary) or @vocabulary='1']/@code
The transaction sector, or the list of activity sectors (NB only DAC sectors are included):
transaction/sector[not(@vocabulary) or @vocabulary='1']/@code or iati-activity/ sector[not(@vocabulary) or @vocabulary='1']/@code
The transaction humanitarian flag, or the activity humanitarian flag:
transaction/@humanitarian or iati-activity/@humanitarian
1
) then humanitarian
will be marked as 1
.0
) then humanitarian
will be marked as 0
.If there are no transaction-level flags:
1
) then humanitarian
will be marked as 1
.0
) then humanitarian
will be marked as 0
.As described in the previous section, individual transactions may map to multiple countries and sectors. In each case, the transaction is therefore split with the value proportionate to the percentage to this transaction for this country for this sector. (NB: Where there are no countries or DAC regions, the transaction is discarded. Where there are no sectors, the sector is output as blank - depending on which approach is clearer.)
In some cases, the published percentages may also not be correct. For example, they may not add up to 100, or there may be multiple sectors with no percentage specified. In these cases, the percentages have been adjusted and rebased so that the percentages add up to 100%. For example:
Sector | Percentage (published) | Percentage (corrected) |
---|---|---|
12220 Basic health care | 100% | 50% |
11220 Primary education | 100% | 50% |
A single transaction of USD 100 would then be split into two rows: one row for USD 50 for basic health care and a second row of USD 50 for primary education. If the same activity were classified with two recipient countries, it would be split again, now into four rows.
Care needs to be taken when correcting percentages for countries. The IATI Guidance has been interpreted differently by different organisations. Some have interpreted the Guidance as stating that all countries plus all regions must add up to 100%, whereas others have understood that countries must add up to 100% and regions must (separately) add up to 100%. The following logic is used:
<recipient-country code="LR" />
<recipient-region code="298" />
<recipient-country code="TD" percentage="70" />
<recipient-country code="LR" percentage="30" />
<recipient-region code="298" percentage="100" />
<recipient-country code="TD" percentage="50" />
<recipient-region code="298" percentage="50" />
As data is published in different currencies (depending on the publisher), individual transactions need to be converted to USD, Euro, and local currencies using the closest exchange rate date to the transaction value-date. Monthly exchange rates for 169 currencies are sourced from the IMF's International Financial Statistics2.
Forward spending data is also important to capture. It is more challenging, as unlike transactions, budgets are not classified by sector or country – so it is not possible to specifically state the proportion of a budget that is going to a particular country or sector. In order to make this assessment, certain data from the transaction or activity level needs to be applied to the budget data. For example, where there are no activity-level sectors, the proportion of the value of commitment transactions to different sectors is used to apply sector splits to budgets in a similar way as described in section 2.3, above.
This process of calculating the proportion of commitments is used for:
For the Provider Organisation field, the activity reporting organisation is used. For the Receiver Organisation field, the activity implementing organisation(s) is used.
Where budgets span more than one quarter, they are split into multiple rows that map to exactly one quarter. The value is split proportionately3. This is necessary in order to maintain comparability between transactions (which are marked with a single date) and budgets (which span a period, and which may not align with the government’s fiscal year).
Where revised and original budgets are both published for the same period, revised budgets are used instead of original budgets.
Transactions are aggregated up into one row per quarter, where the following other fields are all identical:
The transaction date is set to the last day of the quarter.
The target currencies are set as USD and Euro for all countries. An additional local currency (e.g. Kenyan shillings for the Kenya output) is also included; the currency is determined by the recipient country. The exchange rate date is the last day of the quarter.
The data is available in English, French, Spanish and Portuguese. All available Titles in these languages are pulled into the outputs along with all codes. Some Titles and Provider and Recipient Organisations are only available in English.
The data is processed on a server each day with the following steps:
iati-flattener
(carrying out the above methodology): extracting relevant parts from XML files, creating transaction and budget CSV files per country, and activity CSV files per reporting organisationThe files were previously processed on Github Actions and stored on Github Pages (both free services). However, this caused problems in terms of Github's usage limits.
All outputs are published on Github and are openly licensed according to the GNU Affero General Public License (AGPL) v3.04.