Machine learning (ML) is gaining ground in just about every industry sector, including intellectual property (IP).


SwissCognitive Guest Blogger: Sébastien Ragot, Patent Attorney – “How original is this stuff? Feature extractors help us assess the originality of items”


Various applications are being reported. Of particular interest are ML extraction techniques, which allow digital assets to be mapped onto vectors in an unsupervised manner. This allows “distances” to be computed between IP assets. Now, the originality of IP assets such as copyrighted works or designs can be sensibly formulated as a function of the distances between these assets, using concepts of maximum entropy and surprisal analysis.

As a result, the originality of the compared assets can be automatically quantified, which can notably be useful to assess the validity and potential infringement of the corresponding IP rights.

What’s more, ML and cryptographic solutions can be complementarily exploited for the lifecycle management of IP rights. For example, a given party may interact with a blockchain to certify IP assets and then rely on ML to compare such assets with third-party assets, with a view to detecting possible infringement, as discussed in this paper.


The following describes an example of application, where distances between assets are calculated to determine their originality. The originality of IP assets can indeed be sensibly formulated as a function of the distances between these assets, as discussed in this paper. Now, ML feature extractors can be used to compute the distances between IP assets, as noted above. Thus, the originality of the compared assets can be automatically quantified based on such distances. This can be useful to IP professionals since originality is a criterion that is used in the appreciation of the validity of IP rights such as copyright and design rights. Beyond IP applications, however, the same approach can be used to compare any type of item.

What is originality?

The concept of originality plays an important role in human activities. People naturally tend to notice originality, whether in arts, at work, or in stores. In turn, the perceived originality of a thing (an object, an item, or a concept) often impacts its fate, i.e., how people react to it, whether they approve it, buy it, etc. In particular, originality can act as an amplifier, e.g., intensifying word-of-mouth phenomena, in either a positive or negative way.

By definition, the perceived originality is subjective: the human brain is naturally inclined to recognize originality, albeit without measuring it objectively. However, human capacities for perceiving originality are limited; they fade as the number of compared objects increases. Therefore, it is sometimes necessary to establish rules to assess originality. This is notably true in IP, where originality is a criterion that is used in the appreciation of the validity of copyright, industrial design rights, and related IP rights.

However, the concept of originality is neither universally nor unambiguously defined in the IP arena. In particular, there can be significant differences between the originality definitions used for assessing the validity of IP rights, even within the same jurisdictions. For instance, the originality of an IP asset is sometimes assessed objectively, insofar as this asset is compared to prior art. In other cases, however, originality can also be assessed subjectively, i.e., as a quality that is primarily tied with the personality of the author of the work or designer.

Can it be measured?

Measuring originality is generally considered a challenging problem, regardless of the field of study. A major difficulty already arises in the definition of originality in the specific context of the study. General dictionaries define originality as the quality of being novel or unusual, i.e., the quality of being special and not the same as anything else. Based on this definition, one understands that an object can be considered original if it deviates significantly from comparable objects. Thus, the originality of a thing may be regarded as a function of the distances between this thing and other comparable things.

The more important the differences (or dissimilarities) between two assets, the larger the distance between them. Thus, one may use distances computed thanks to ML extractors to evaluate the originality of the underlying assets. However, comparing such distances becomes rapidly intractable as the number of assets considered increases. Therefore, analytics are needed. Ideally, a simple formula would be desired to measure the originality of each IP asset, based on its distance to other assets.

The above paper proposes a simple statistical measure of the degree of originality of IP assets, based on concepts of maximum entropy and surprisal analysis. The resulting function can be applied to any type of digital content, provided that distances between the corresponding objects can be evaluated.

Examples of applications

The picture below shows originality scores obtained for selected cartoons, which are in fact different versions of the same basis.

Originality of items_1

The number of compared images is deliberately small, for the sake of intelligibility. The originality values are evaluated for each cartoon in view of all of the others in the selection, based on distances computed from semantic features extracted from the images. The resulting originality scores are relative, by construction: they depend on the set of comparands considered. The obtained originality scores illustrate that both sophisticating and simplifying can help increase originality. I.e., less can be more.

The same approach can be applied to any type of digital content, whether images (e.g., typeface designs, logos, paintings, or other artwork), technical datasheets, 3D printing files, sounds (e.g., songs, jingles), or text (e.g., domain names, brand names, novel titles, lyrics, books). The only prerequisite is to be able to compute distances between the compared assets. Now, a variety of algorithms are available, which can be used to compute such distances.

For example, word embedding algorithms allow distances to be computed between textual documents. This, in turn, makes it possible to compare the originality of such documents, as illustrated in the table below. In this very simple example, only four documents are considered, each consisting of a few words only, to allow an immediate comparison by the reader. The word embedding scheme transforms words into vectors that capture the meaning of the words: the closer the vectors, the more similar the meanings. Here, the documents all relate to cats and dogs, subject to one exception (“Jazz is music”), which is logically found to be the most original document in the set.

Originality of items_2

Originality values can be computed for each item in view of all the others, in a time-agnostic manner, i.e., irrespective of potential creation dates or other applicable dates, as people typically do when comparing items. However, originality values may also be evaluated in a time-ordered fashion, to make sure to solely take into account the prior art, as IP players do to estimate the originality of a given IP asset.

This is illustrated in the figure below, which shows the time evolution of typical mobile phone designs (front views). It further shows the corresponding originality values, evaluated in a time-ordered fashion. The originality values of the first two designs cannot be evaluated because at least two comparands are needed. That is, the originality of the third design can be evaluated with respect to the first two comparands. Next, the originality of the fourth design is evaluated in view of the first three comparands, and so on.

Originality of items_3

What is the formula for originality?

The originality score of an IP asset can be formulated as a function of the distances between this asset and a selected set of comparands, whether prior art or not. A particularly simple and suitably bounded expression can be devised, in which the originality of a given asset writes as the ratio of two average distances. More precisely, the originality  can be expressed as a weighted ratio of the harmonic mean of the distances between this asset and its comparands to the harmonic mean of the distances between the sole comparands, as explained here in detail.

This formulation is inspired by the Boltzmann distribution in statistical mechanics. IP assets are compared to particles that electrostatically repel each other, like same-sign point charges. The underlying idea is that human creativity would make it surprising to find two creations very close to each other, also in the space spanned by outputs of a feature extraction algorithm.


A merit of the above approach is that it allows reproducible, objective assessments (i.e., measurements based on objective, deterministic functions) to be achieved, irrespective of the number of considered comparands, which can be large in practice. In particular, such assessments can safely disregard posterior designs, which is useful where one needs to assess the originality of a given design or artwork in view of its sole prior art —something that is difficult to achieve for the human mind.

Obviously, automatic originality assessments cannot be used as a substitute for the judgment of IP professionals, especially as the definition of originality is not unambiguously defined in IP law. Rather, such assessments should be regarded as providing additional facts in the form of statistical measurements, which IP professionals may want to take into consideration when assessing the validity of IP rights.

Beyond automatic originality assessments, however, a variety of ML-based solutions are being developed for IP matters, which have the potential to help IP practitioners perform IP-related tasks. The goal is to design data processing pipelines that adequately capture the legal reasoning and yield results matching expectations from IP professionals. This is not an easy task, due to intricacies in legal reasoning and remaining discrepancies in IP laws. Still, a very promising aspect of ML extractors is that they make it possible to objectively compute distances between IP assets, something that may be leveraged for normative purposes.


About the Author:

Sébastien RagotSébastien Ragot is a partner with E. Blum & Co. AG. in Zurich. He holds a PhD. in Materials sciences and is registered as a Swiss and European Patent Attorney. His legal practice focuses on the drafting and prosecution of patent applications, with core technical skills in information technology and physics. As a technical computing enthusiast, he also develops machine learning solutions for intellectual property.