888-974-6633 sales@prefrent.com

Science and Vision

Where we're going with this.

Affinitomics™ essentially works by converting a flat feature space into dimensional signals that can be compared much more rapidly. What does this mean?

Most systems process text by simply creating an index. An index is a data structure that creates a record of every word in a system and then plots each document in which the word occurs. Sophisticated indexes also plot the location of every word within the document as well. In scientific terms, these systems are treating the words as “features.” These indexes are good for data retrieval if a collection is limited, or context is not important. They are not good if a collection is large, or the documents contain all the same features. A query under these circumstances will return too many answers to be useful.

Traditionally this problem has been solved by either inserting more features into the document (meta-data) to impart context or type. On the web, these emerged as keywords, and then tags and categories. This strategy works for collections that are a little bit larger than those served by a simple index, but still returns too many answers unless a query is incredibly complex or specific – such queries usually have to be constructed by a professional who understands SQL and REGEX. So to further address the problem, companies have turned to Schemas and ontologies. These are complex tree structures that constrain the answers that a query returns by mapping key features (generally recognized by

a natural language processing algorithm) to contextually accurate braces of the tree structure. In these cases it is incredibly important that schema is architected correctly and is not ambiguous. There are highly paid computer professionals who spend their entire careers as database architects for such systems.

But in the world of cloud computing, billions of users, and big data, even these systems start to be inefficient and return too much “noise,” too many answers of little or no value.

To combat this further, big companies have had to add complex text processing algorithms and more hardware to their technology stack (Google, eBay, Amazon, Facebook, Alibaba.). The drawback of these algorithms is that they are either closely held secrets, or need to evolve too fast to combat new exploits and uses. For end users and developers alike, this means there is a need for constant education and maintenance, both of which are expensive and time consuming.

It’s because of this, that there is a shift back toward AI to solve big data problems. This makes a lot of sense for tech giants that have specific domain issues that they are looking to conquer, and budgets in the hundreds of millions of dollars to solve them. These efforts are almost all point solutions that work for specific jobs – speech recognition, trend analysis, facial recognition, image classification. Up to this point there has been no solution to address all these issues. What’s needed is a scalable, agnostic (means you can apply it anywhere, for any type of system) AI layer that is effective, easy to deploy, and cheap to maintain. That’s why we built Affinitomics™

Instead of trying to add ever more increasing amounts of meta-data or complexity as a means of determining what a particular piece of data matches or what level of acuity at which a query should be interpreted, Affinitomics works to impart contextual relevance into the data itself, with as little extra work as possible – and it doesn’t care what type of data it is.

In this illustration (left), the rectangles in each collection (document) represent a “flat” feature. The size of the feature can be thought of as analogous to words. An index looking at these documents sees very little difference returning all of them – quickly which is different? We’ve helped by making it darker.

So let’s make a change. Let’s convert the feature space into the realm of signal processing. Let’s give key factors amplitude. Now we can look at a collection as a signal. This lets our brain (and computers) take a shortcut when determining the fitness (literally where something fits or if it is a fit) of an individual piece of data or collection when it comes to context and matching.

Now, looking at the same collection, it’s easy to see which belong, and which don’t – and the only difference is how the features were perceived in the same data.

In fact, collections one through three share rectangles of the same sizes in lines with the same patterns, but in different orders. Collection four’s only commonality with the others is number of patterned lines. The signal is easy to see when we’re not so busy with the questionable minutia. When you understand this, you understand the power of Affinitomics™.

Affinitomics™ helps a system understand the signals and meaning behind the content through contextual relevance.

It does it by adding a very simple concept – all features either describe a thing, impart context, or distinguish where it doesn’t belong. We call these Affinitomic elements Descriptors, Draws and Distances (3D – interestingly we didn’t plan that). When features are divided into these spaces, they serve to dimensionalize a contextual space. This gives a document or collection of data and the system it’s being used in an “awareness” of where the data fits, and what it goes with.

And, it does it without complex tree structures, neural nets, or multi-phase algorithms. Nearly any data can rapidly be mapped to an Affinitomic space. Even without amplitude (how important the individual data element is) Affinitomics rapidly matches, ranks, and relates data and collections of data within any knowledge space.

“Even without amplitude…” Amplitude is a concept of how important a feature is when describing data. In an article about dogs, dogs has a high amplitude. This concept spreads beyond text documents too. In a picture of your face, the vertices that make up the shape of your eye create a different signal, because of amplitude, than the vertices that make up the shape of your
dog’s eye.

How complex is it to map features to the Affinitomic space? It’s actually really easy. The explanation below applies to web pages, but can be construed to apply to a knowledge base, database, or collection of data – big or small.

When people tag documents or web pages, they generally put everything deemed pertinent to the web page or article into the tags or keywords. And since search engines rely on these tags, people involved in search engine optimization often attach quite a number of tags and keywords to the document. These tags – all stored in the same place and separated by commas – are what scientists call a “bag-of-words” or “bag-of-features.” They call them that because there is no structure to the meta-data that the tags provide. Scientists also call this “flat” because all the tags have the same value, and are all used the same way. When a page is searched, the search algorithm usually awards the tags and keywords a higher value if they are also found within the structure of the document. This is called concordance.

In the world of intelligent systems, concordance is barely a passing grade. It’s ok for sorting a response from a search engine, but not much else.

Affinitomics™ makes a simple change to this paradigm – the same tags are simply sorted based on their relationship to the subject matter of the document (or picture, or video, or song, etc.). This simple change makes a world of difference. It changes tags from a “bag-of-words” to a “dimensional feature space” – making them much more valuable and useful to any number of machine learning and artificial intelligence algorithms.

How are the tags sorted? That’s a good question with a deceptively simple answer. If you look at any set of tags you’ll discover that there are usually two, and sometimes three types.

  1. Some tags describe the subject’s particular features;
  2. Some describe what the subject goes with or occurs with; and,
  3. Sometimes, there are tags that describe what the subject doesn’t go with, conflicts with, or dislikes.

By dividing these tags into Descriptors (what it is), Draws (as in drawing closer), and Distances (as in keeping a distance from), the feature space becomes multi-dimensional, thus imparting more information for sorting and classification algorithms. Essentially this makes information self-aware – understanding what it is, what it matches, and what it doesn’t match or is antithetical to.

Affinitomics™ in a nutshell – understanding the how and why in 2 minutes

“Affinitomics™ sounds intimidating – It must take lots of training or a PhD to comprehend.”

This couldn’t be further from the truth. If you know how tags and keywords work, you can use Affinitomics. Skim this article, and you’ll understand Affinitomics and know how to use them. And we promise, no classes, no visits to MIT, and no scientists are required.

When people tag documents or web pages, they generally put everything deamed pertinent into the tags. And since search engines rely on these tags, people involved in search engine optimization often attach quite a number of tags and keywords to the document. These tags – all stored in the same place and separated by commas – are what scientists call a “bag-of-words” or “bag-of-features.” They call them that because there is no structure to the meta-data that the tags provide. Scientists also call this “flat” because all the tags have the same value, and are all used the same way. When a page is searched, the search algorithm usually awards the tags and keywords a higher value if they are also found within the structure of the document. This is called concordance. In the world of intelligent systems, concordance is barely a passing grade. It’s ok for sorting a response from a search engine, but not much else.

Affinitomics™ makes a simple change to this paradigm – the same tags are simply sorted based on their relationship to the subject matter of the document (or picture, or video, or song, etc.). This simple change makes a world of difference. It changes tags from a “bag-of-words” to a “dimensional feature space” – making them much more valuable and useful to any number of machine learning and artificial intelligence algorithms.

How are the tags sorted? That’s a good question with a deceptively simple answer. If you look at any set of tags you’ll discover that there are usually two, and sometimes three types.

1) Some tags describe the subject’s particular features; 2) some describe what the subject goes with or occurs with; and, 3) sometimes, there are tags that describe what the subject doesn’t go with, conflicts with, or dislikes.

By dividing these tags into Descriptors (what it is), Draws (as in drawing closer), and Distances (as in keeping a distance from), the feature space becomes multi-dimensional, thus imparting more information for sorting and classification algorithms. Essentially this makes information self-aware – understanding what it is, what it matches, and what it doesn’t match or is antithetical to.

As an example, the following are tags for a St. Bernard Dog: dog, big, k9, furry, eats a lot, good with kids, likes snow, chases cars, chases cats.

It’s easy to derive Affinitomics from these tags. “dog, big, k9, furry” are all easily recognizable as Descriptors. The Draws are easy to recognize as well, and we can take a shortcut in writing them that will differentiate them from Descriptors. They become: +eating, +kids, +snow. We also take a shortcut on what are easy to spot as Distances, and they become: -cars, -cats. By separating the tags into three types of Affinitomics, not only have they become more useful for the computer system, they are actually easier to write and take up less space.

Traditional Tags look like this: dog, big, k9, furry, eats a lot, good with kids, likes snow, chases cars, chases cats

Whereas the features in an Affinitomic Archetype look like this: dog, big, k9, furry, +eating, +kids, +snow, -cars, -cats

Here’s what it looks like on a WordPress page or post using aimojo

Affinitomics are even more valuable with attenuation – telling the system how much to value Draws and Distances. For example: How much does the dog like to eat?  Or which does it hate more; cars or cats? The attenuated Affinitomics for the St. Bernard answer those questions like this: dog, big, k9, furry, +eating2, +kids, +snow4, -cars2, -cats5 You’ll notice that it’s still less data than the tags, even though the Affinitomics now represent a three dimensional feature space which is far more valuable for knowledge retrieval, discovery, and machine learning. Because of this, Affinitomics can be evaluated, sorted, and grouped much faster and more accurately than tags. In addition, since the Affinitomics essentially make the information self-ranking and self-sorting, systems that use Affinitomics don’t require categories. There you have it. You now know how to create Affinitomic Archetypes – a fancy way of saying that you understand how and why you should sort your laundry, errr, tags.