Meso Level: Technological Capabilities of Nations

Jupyter Notebook link

In the second part of the fourth section, the focus will lie in the study of the technological capabilities of countries. The methods used in this part of the analysis are closely related to the methodologies presented in the third section and the analyses made in the first part of this section, except for some small differences and additional analysis.

1.Characterisation of Countries

Capability Matrices

The first result produced relied on the representation of the biofuel research ecosystem of a country as a capability matrix. This capability matrix is the result of the same application of the term-pair methodology as previously presented at the macro level, but instead of filtering the documentation by year, the documentation was filtered by its location. For example, when creating the capability matrix for Denmark, the term-pair matrix is related to technological assets (patents, publications, projects) located in Denmark or owned (or even co-owned) by Danish organizations. To introduce this concept, taking Sweden and Denmark as examples, the normalized capability matrix for each one of these institutions was produced. In the following figure and table, a visualization of both matrices side by side, as well as some indicative properties is shown.

When looking at the properties of both matrixes, some observations can be made:

  • Both matrices are symmetrical and equal in dimensions which is expected given the same dictionary of biofuel related terms.

  • The maximum, minimum and mean values of both matrices are generally similar.

  • The standard deviation of the capability matrix of Denmark is 40% higher which would mean that Denmark has a wider usage of different term pairs. On the other hand, Sweden’s capabilities are more “focused”.

Capability Lists

Following the same approach as the macro analysis, a capability matrix can be transformed into a capability list by taking its upper triangle and adding each entry to a vector. In this level of the analysis, and as a proof of concept, the capability list of the United States of America and the capability list of China are presented side by side. However, due to the large number of entries in these lists (58482) the visualization of the differences becomes rather difficult. Consequently, in the final part of this subsection, this concept will be revisited in a more detailed manner.

2.Country Correlation Matrix and Profiles

Country Correlation Matrices

With the goal of applying the same engineering systems approach to the meso level, as was applied to the macro level, the Pearson correlation index was used as an indicator of the similarity between the capability lists of two countries. For example, the Pearson correlation index between the US and China lists has the value of approximately 0.65, or 65%. This could mean that the biofuel research between these two countries is 65% similar. To visualize this, the country correlation matrix was created.

After creating this matrix, and just like it was done for each year, a hierarchical clustering algorithm was applied to the matrix as a way of possibly identifying clusters of countries that are more similar between themselves. Moreover, this clustering technique also produced a dendogram as a way of quickly identifying the countries that are more related to another. For example, if this dendrogram was to be cut in the level n=2, forming clusters of two countries, Denmark would be connected to Portugal, and the United States would be clustered with Taiwan. Interestingly, one can observe three main cluster areas in the ordered matrix:

  • On the top left side of the matrix, an area of highly related countries that range from France to Serbia. (see axis of second figure)

  • On the bottom right side of the matrix, an area of related countries, which on average are less related than the top left but separated. (Belgium, Hong Kong, Hungary Tunisia...)

  • In the middle, a cross like area of countries which are not particularly related to each other or any other country. (El Salvador, UAE, Scotland...)

Country Correlation Profiles

While clustering is an interesting way of visualizing the general trends that would possibly occur between countries, it does not explicitly show what countries are more related to each other. To visualize this, country profiles were created. A country profile is built by “cutting” the capability matrix for a particular row (country) and ordering the results. In the figure below, the country profile of Denmark is presented. On the y axis, the Pearson correlation index (x100) is used as a measure of similarity between countries.

This graph is a simple way of quickly visualizing the most similar countries to Denmark in terms of biofuel research. For example, the most related country to Denmark is Spain, with an index of about 60%, following Portugal with an index of ~58%, etc. Interestingly, the most similar countries are not necessarily close in geography to Denmark, but close to themselves. Sweden for example is related to Denmark by a factor of 50%, and Norway by only ~30%. Following this method we could say that in terms of biofuel-related capability matrices Norway is as similar to Denmark as Colombia is.

3.Contextual Relationships

GDP per capita

Using the world bank as a source of data to get the values in $US of the Growth Domestic Product per capita, the GDP per capita difference for every country pair was calculated. After calculating this, the goal is to understand if the GDP per capita of two countries is telling of the technological similarity of those two countries. In the first plot produced, presented below, each data point is a pair of countries. In the x axis, the Pearson correlation (0-1) between the country pairs, and in the y axis, the absolute GDP per capita difference of those same country pairs. For readability purposes, if two countries have a capability similarity of less than 10%, or 0.1, this pair would be excluded from the graph.

When observing the graph, one can notice that most country pairs have less than a 40% capability correlation and less than 40000$US GDP per capita difference. On the other hand, when looking at the dashed guidelines in the graph, the further from the origin of the graph a guideline is, the less country pairs appear. Moreover, generally, countries that are more related (higher capability correlation), have a more similar GDP per capita. For example, Brazil and Zimbabwe, have a capability correlation of 88.60% (0.88), and a GDP per capita difference of 7620.87$US, which is rather low.

However, the graph above loses an important dimension: it is hard to distinguish country pairs just from the GDP per capita difference. For example, let us take as an example the country pairs Sweden-Singapore, and Romania-Brazil. These two country pairs have a low GDP per capita difference; however, the first pair is made of economically developed countries, and the second, generally underdeveloped countries. The graph above treats them equally.

In order to add an extra dimension to this visualization, the graph below was produced. Here, one can also see the average GDP per capita of each country pair as a color scale. For instants, Sweden-Singapore is light blue, and Romania-Brazil is red.

Collaboration

The second contextualization is not necessarily from an external data source, instead, it was obtained from the database itself. By querying the database, it was possible to retrieve, for each country pair, the number of technological assets where these countries collaborated.

By taking the number of shared assets between a country pair and the capability correlation between that same country pair, the graph below was produced. In it, 4 different areas can be observed (in italic, example pairs):

On one hand, most country-pairs are located in the “Different and not collaborating” quadrant. On the other hand, there is a high number of country pairs that are similar in terms of capability but are not collaborating.

When looking at the above graph, one can consider the number of shared assets indicator as an unfair index. This because not all countries possess the same number of assets. For example, the US has an extremely high number of documents, while other countries such as Costa Rica or Lebanon have a very low number of documents. For this reason, a new index, the normalized number of shared assets was created, as a way of valuing collaborations as a percentage of total documents produced by the country pair, its definition follows:

  • Old collaboration definition: Country i and country j have z assets that have both their name as location.

  • New normalized collaboration definition: normalized collaboration = (number of shared assets between country i and j)/(number of total possible collaborations between i and j)

For example, for the country pair Portugal-Denmark:

  • Number of assets Denmark: 351

  • Number of assets Portugal: 180

  • Number of shared assets: 25.0

  • Number of normalized shared assets: 0.13 (=25/180)

Below, the same graph, but with the normalized shared assets between each country pair is presented. One can notice that there is less saturation generally, and country pairs are more distributed. Moreover, some outliers appear such as France-Lebanon.

4.Comparing Countries

Coming back to the more general analysis, in the same way as two years were compared in terms of capability, two countries will now be compared in terms of term pairs usage. It is worth noting that this approach is simply a deep dive into the capability matrices of two different countries and looking at the most common term pairs in each of them.

As an example, the countries Brazil and Denmark will be compared, their capability correlation is around 30%. The first result is the top term pairs for each of these countries presented in two tables side by side. One can note that in the top term pairs of Brazil, there is a high number of term pairs related to sugar, sugarcane and cellulose. One the other hand, in the Denmark table, there is more stress on processing technologies (digestion, fermentation, hydrolysis), and outputs.

Top terms for Denmark:

Top terms for Brazil:

Similarly to what was done with the macro level analysis, the table of the most important term-pair usage differences was produced. One can note a high number of term pairs that are not used at all by Denmark, and used in Brazil: “sugar-sugarcane”, “advanced biofuel-cellulosic ethanol”, “sugarcane-ethanol”. On the other hand, there is lower number of terms that are only used in Denmark (“straw-hydrolysis”). Moreover, feedstocks and related term pairs are common in this table, with terms such as sugar, sugarcane, or straw, being divisive between countries.

Top term pairs usage differences in Denmark and Brazil:

5.Country Spectrums

Representing Country Spectrums

As a way of diving deeper into the country capability spectrums, understanding their composition, and making the analogy between term pairs and amino-acid pairs in DNA representations, in the following section the country spectrum concept was further developed.

Instead of focusing in the frequency of the appearance of a certain term pair, let us focus on whether a term-pair appears or not in the capability list of a country or not.

Below, for 7 countries, and the first 45 term pairs of the capability spectrum are represented. Even though this is a very small part of the spectrum (<1%), one can already see some term pairs that appear in several countries. “Natural Gas / Anaerobic Digestion” for instance, appears in Finland and Denmark. Moreover, there are a wide range of terms that only appear in one country. Such as terms related to “animal fats”, in the case of Spain.

Generalizing this capability spectrum concept to all of the countries all of the terms pairs, is a good way of visualizing the biofuels capability “DNA” of all of them. However, in order to improve the quality of this visualization, two adjustments were made:

  • The order of the countries in the left hand side was adjusted to reflect the result of the clustering in the country correlation matrix.

  • Only term pairs that were used by at least 2 countries were represented. This allowed the reduction of the original size of the capability spectrum from 58482 values, to 6236 values.

The uniqueness of countries

Taking the capability spectrum of a country as a starting point, the next and final step of the analysis seeks to understand how unique each country is in terms of usage of terms pairs. Denmark, for example, in its capability spectrum, uses a total of 256 different term pairs. Of these 256 term pairs, there are a total of 21 that are only used by documents located in Denmark:

Taking this approach and applying it to all of the countries in the database, a uniqueness index was developed. The uniqueness index of a country is the ratio between the total number of term pairs used by a country and the number of term pairs that are unique to that country. In the case of Denmark, this value would be equal to 21/256 = 0.082.

With this approach, a table of the top 20 most unique countries was created. In this table, presented below, one can see that the most unique country is the US, with an index of almost 0.50. This means that half the term pairs used by the US are only used by the US (!). The rest of the countries in the ranking have a relatively low number of term pairs, Lebanon, for example, with only 5 term pairs, of which 1 is unique. The top 20 most unique countries have either a very large number of term pairs or a very low number of term pairs.

Uniqueness ranking:

Last updated