Tech

Identifying Subjective Attributes Of Entities

Sharing is caring!

Identifying UGC Subjective Attributes Of Entities

This recently granted patent is about identifying subjective attributes of entities.

I haven’t seen a patent about subjective attributes of entities, or responses to those entities.

One important aspect of it is that it is user-generated content.

We get told that user-generated content (UGC) is becoming more common on the Web because of the increasing popularity of social networks, blogs, review websites, and so on.

We often see user gnnerated content in the form of comments, such as:

  • A Comment by a first user about content shared by a second user within a social network
  • User comments in response to an article in a columnist’s blog
  • A comment from a video clip posted on a content hosting website
  • Reviews (such as of products, movies)
  • Actions (such as Like!, Dislike!, +1, sharing, bookmarking, playlisting, etc.)
  • So forth

Under this patent, a way to identify and predict subjective attributes for entities (such as media clips, images, newspaper articles, blog entries, persons, organizations, commercial businesses, etc.) gets provided.

It starts with:

  • Identifying a first set of subjective attributes for a first entity based on a reaction to the first entity (such as comments on a website, a demonstration of approval of the first entity (such as “Like!, etc.)
  • Sharing the first entity
  • Bookmarking the first entity
  • Adding the first entity to a playlist
  • Training a classifier (such as a support vector machine, AdaBoost, a neural network, a decision tree on a set of input-output mappings, where the set of input-output mappings comprises an input-output mapping whose input is Providing a feature vector for the first entity, whose output gets based on the first set of subjective attributes
  • Providing a feature vector for a second entity to the trained classifier to get a second set of subjective attributes for the second entity

A memory and a processor get provided to identify and predict subjective attributes for entities.

A computer readable storage medium has instructions that cause a computer system to perform operations including:

  • Identifying a first set of subjective attributes for a first entity based on a reaction to the first entity
  • Obtaining a first feature vector for the first entity
  • Training a classifier on a set of input-output mappings, wherein the set of input-output mappings comprises an input-output mapping whose input get based on the first feature vector and whose output get based on the first set of subjective attributes
  • Obtaining a second feature vector for a second entity
  • Providing to the classifier, after the training, the second feature vector to get a second set of subjective attributes for the second entity

This patent on dentifying subjective attributes for entities =is found at:

Identifying subjective attributes by analysis of curation signals
Inventors: Hrishikesh Aradhye and Sanketh Shetty
Assignee: Google LLC
US Patent: 11,328,218
Granted: May 10, 2022
Filed: November 6, 2017

Abstract:

A system and method for identifying and predicting subjective attributes for entities (such as media clips, movies, television shows, images, newspaper articles, blog entries, persons, organizations, commercial businesses, etc.) get disclosed.

In one aspect, subjective attributes for a first media item get identified based on a reaction to the first media item, and relevancy scores for the subjective attributes with about the first media item get determined.

A classifier gets trained using (i) a training input comprising a set of features for the first media item, and a target output for the training input, the target output comprising the respective relevancy scores for the subjective attributes of the first media item.

Identifying And Predicting Subjective Attributes For Entities

Ways for identifying and predicting subjective attributes for entities (such as media clips, images, newspaper articles, blog entries, persons, organizations, commercial businesses, etc.).

Subjective attributes (such as “cute,” “funny,” “awesome,” etc.) get defined, and subjective attributes for a particular entity get identified based on user reaction to the entity, such as:

  • Comments on a website
  • Like!
  • Sharing the first entity with other users
  • Boomarking the first entity
  • Adding the first entity to a playlist
  • Etc

Relevancy Scores For The Subjective Attributes Get Determined About The Entity

If the subjective attribute “cute” appears in a significant proportion of comments for a video clip, then “cute” may get assigned a high relevancy score.

The entity is then associated with the identified subjective attributes and relevancy scores (such as via tags applied to the entity, via entries in a table of a relational database, etc.).

The above procedure gets performed for each entity in a given set of entities (such as video clips in a video clip repository, etc.), and an inverse mapping from subjective attributes to entities in the set gets generated based on the subjective attributes and relevancy scores.

The inverse mapping can then get used to identify all entities in the set that match a given subjective attribute (such as all entities that have gotten associated with the subjective attribute “funny”, etc.), thereby enabling:

  • Rapid retrieval of relevant entities for processing keyword searches
  • Populating playlists
  • Delivering advertisements
  • Generating training sets for the classifier
  • So forth

A classifier (such as a support vector machine [SVM], AdaBoost, a neural network, a decision tree, etc.) gets trained by providing a set of training examples, where the input for a training example comprises a feature vector obtained from a particular entity (such as a feature vector for a video clip.

It may contain numerical values about:

  • Color
  • Texture
  • Intensity
  • Metadata tags associated with the video clip
  • Etc

The output has relevancy scores for each subjective attribute in the vocabulary for the particular entity.

The trained classifier can then predict subjective attributes for entities not in the training set (such as a newly-uploaded video clip, a news article that has not yet received any comments, etc.).

This patent can classify entities according to subjective attributes such as “funny”, “cute”, etc. based on user reaction to the entities.

This patent can improve the quality of entity descriptions, such as tags for a video clip, which can improve the quality of searches and the targeting of advertisements.

A System Architecture To Identify Subjective Attributes

The system architecture includes a:

  • Server machine
  • Entity store
  • Client machines are connected to a network

The network may be a public network (such as the Internet), a private network (such as a local area network (LAN) or wide area network (WAN)), or a combination thereof.

The client machines may be wireless terminals (such as smartphones, etc.), personal computers (PC), laptops, tablet computers, or any other computing or communication devices.

The client machines may run an operating system (OS) that manages the hardware and software of the client machines.

A browser (not shown) may run on the client machines (such as on the OS of the client machines).

The browser may be a web browser that can access web pages and content served by a web server.

The client machines may also upload:

  • Web pages
  • Media clips
  • Blog entries
  • links to articles
  • So forth

The server machine includes a web server and a subjective attribute manager. The web server and subjective attribute manager may run on different machines.

The entity store is persistent storage that is capable of storing entities such as media clips (such as video clips, audio clips, clips containing both video and audio, images, etc.) and other types of content items (such as webpages, text-based documents, restaurant reviews, movie reviews, etc.), as well as data structures to tag, organize, and index the entities.

The entity store may get hosted by storage devices, such as main memory, magnetic or optical storage-based disks, tapes or hard drives, NAS, SAN, and so forth.

The entity store might get hosted by a network-attached file server, while in other implementations the entity store may get hosted by some other type of persistent storage such as that of the server machine or different machines coupled to the server machine via the network.

The entities stored in the entity store may include user-generated content that gets uploaded by client machines and may include content provided by service providers such as:

  • News organizations
  • Publishers
  • Libraries
  • So on

The server may serve web pages and content from the entity stores to clients.

The subjective attribute manager:

  • Identifies subjective attributes for entities based on user reaction (such as comments, Like!, sharing, bookmarking, playlisting, etc.)
  • Determines relevancy scores for subjective attributes about entities
  • Associates subjective attributes and relevancy scores with entities
  • Extracts features like image features such as color, texture, and intensity; audio features like amplitude, spectral coefficient ratios; textual features like word frequencies, average sentence length, formatting parameters; metadata associated with the entity; etc.) from entities to generate feature vectors
  • Trains a classifier based on the feature vectors and the subjective attributes’ relevancy scores
  • Uses the trained classifier to predict subjective attributes for new entities based on feature vectors of the new entities

A Subjective Attribute Manager

The subjective attribute manager may be the same as the subjective attribute manager and may include a:

  • Subjective attribute identifier
  • Relevancy scorer
  • Feature extractor
  • Classifier
  • Data store
  • .

The components can get combined together or separated into further components.

The data store may be the same as the entity store or a different data store (such as a temporary buffer or a permanent data store) to hold a subjective attribute vocabulary, entities that are to get processed, feature vectors associated with entities, subjective attributes and relevancy scores associated with entities, or some combination of these data.

Datastore may get hosted by storage devices, such as main memory, magnetic or optical storage-based disks, tapes or hard drives, and so forth.

The subjective attribute manager notifies users of the types of information that get stored in the data store and entity store and provides the users the opportunity to choose not to have such information collected and shared with the subjective attribute manager.

The Subjective Attribute Identifier

The subjective attribute identifier identifies subjective attributes for entities based on user reaction to the entities.

The subjective attribute identifier may identify subjective attributes via text processing of users’ comments to an entity that gets posted by a user on a social networking website.

Subjective attribute identifier may identify subjective attributes for entities based on other types of user reactions to entities, such as:

  • ‘Like!’ or ‘Dislike!’
  • Sharing the entity
  • Bookmarking the entity
  • Adding the entity to a playlist
  • So forth

The subjective attribute identifier may apply thresholds in determining which subjective attributes are associated with an entity (such as a subjective attribute should appear in at least N comments, etc.).

The relevancy scorer determines relevancy scores for subjective attributes about entities.

For example, when subjective attribute identifier has identified the subjective attributes “cute”, “funny”, and “awesome” based on comments to a media clip posted on a social networking website, relevancy scorer may determine relevancy scores for each of these three subjective attributes based on:

  • The frequency with which these subjective attributes appear in comments
  • The particular users that provided the subjective attributes
  • So forth

For example, if there are 40 comments and “cute” appears in 20 comments and “awesome” appears in 8 comments, then “cute” may get assigned a relevancy score that is higher than “awesome”.

The relevancy scores may be assigned based on the proportion of comments that a subjective attribute appears in (such as a score of 0.5 for “cute” and a score of 0.2 for “awesome”, etc.).

The relevancy scorer may keep only the k most relevant subjective attributes and discard other subjective attributes.

For example, if the subjective attribute identifier identifies seven subjective attributes that appear in user comments at least three times, then the relevancy scorer may, for example, retain only the five subjective attributes with the highest relevancy scores and discard the other two subjective attributes (such as by setting their relevancy scores to zero, etc.).

The relevancy score is a real number between 0.0 and 1.0 inclusive.

The feature extractor obtains a feature vector for an entity using techniques such as:

  • Principal components analysis
  • Semidefinite embeddings
  • Isomaps
  • Partial least squares
  • So forth

The computations associated with extracting features of an entity get performed by feature extractor itself.

In some other aspects these computations get performed by another entity, such as an Executable library of:

  • Image processing routines hosted by server machine [not depicted in the Figures]
  • Audio processing routines
  • Text processing routines
  • Etc

The results get provided to the feature extractor.

The classifier is a learning machine (such as support vector machines [SVMs], AdaBoost, neural networks, decision trees, etc.) that accepts as input a feature vector associated with an entity, and outputs relevancy scores (such as a real number between 0 and 1 inclusive, etc.) for each subjective attribute of the subjective attribute vocabulary.

The classifier consists of a single classifier.

The classifier may include multiple classifiers (such as a classifier for each subjective attribute in the subjective attribute vocabulary, etc.).

A set of positive examples and a set of negative examples get assembled for each subjective attribute in the subjective attribute vocabulary.

The set of positive examples for a subjective attribute may include feature vectors for entities that have gotten associated with that particular subjective attribute.

The set of negative examples for a subjective attribute may include feature vectors for entities that have not gotten associated with that particular subjective attribute.

When the set of positive examples and the set of negative examples are unequal in size, the larger set may get sampled to match the size of the smaller set.

After training, the classifier may predict subjective attributes for other entities not in the training set by providing feature vectors for these entities as input to the classifier.

A set of subjective attributes may get obtained from the output of the classifier by including all subjective attributes with non-zero relevancy scores. A set of subjective attributes may get obtained by applying the smallest threshold to the numerical scores (such as by considering all subjective attributes that have a score of at least, say, 0.2, as being a member of the set).

Identifying Subjective Attributes Of Entities

The method gets performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as gets run on a general-purpose computer system or a dedicated machine), or a combination of both.

The method gets performed by the server machine, while in some other implementations may get performed by another machine.

Various components of subjective attribute managers may run on separate machines (such as subjective attribute identifier and relevancy scorer may run on one machine while feature extractor and classifier run on another machine, etc.).

For simplicity of explanation, the method gets depicted and described as a series of acts.

But acts can occur in various orders and, and with other acts not presented and described herein.

Furthermore, not all illustrated acts may get required to install the methods in accordance with the disclosed subject matter.

In addition, those skilled in the art will understand and appreciate that the method could or get represented as a series of interrelated states via a state diagram or events.

Additionally, it should get appreciated that the methods disclosed in this specification are capable of getting stored on an article of manufacture to ease transporting and transferring such methods to computing devices.

The term article of manufacture, as used herein, gets intended to encompass a computer program accessible from any computer-readable device or storage media.

A vocabulary of subjective attributes gets generated.

In some aspects, the subjective attribute vocabulary may get defined, while in some other aspects, the subjective attribute vocabulary may get generated in an automated fashion by collecting terms and phrases that get used in users’ reactions to entities, while in yet other aspects, the vocabulary may get generated by a combination of manual and automated techniques.

The vocabulary gets seeded ith a small number of subjective attributes expected to apply to entities, and the vocabulary gets expanded over time as more terms or phrases that appear in user reactions get identified via automated processing of the reactions.

The subjective attribute vocabulary may be organized in a hierarchical manner, possibly based on “meta-attributes” associated with the subjective attributes (such as the subjective attribute “funny” may have a meta-attribute “positive”, while the subjective attribute “disgusting” may have a meta-attribute “negative”, etc.).

A set S of entities (such as all the entities in the entity store, a subset of entities in the entity store, etc.) is pre-processed.

Under one aspect, pre-processing of the set of entities comprises identifying user reactions to the entities, and then training a classifier based on the reactions.

When An Entity Is An Actual Physical Entity

It should get noted that when an entity is an actual physical entity (such as a person, a restaurant, etc.), the pre-processing of the entity gets performed via a “cyber proxy” associated with the physical entity (such as a fan page for an actor on a social networking website, a restaurant review on a website, etc.); but, the subjective attributes get considered to get associated with the entity itself (such as the actor or restaurant, not the actor’s fan page or the restaurant review).

An example of a method for performing get described in detail.

Atn entity E that is not in set S is received (such as a newly-uploaded video clip, a news article that has not yet received any comments, an entity in entity store that was not included in the training set, etc.).

Subject attributes and relevancy scores for entity E get obtained.

An implementation of a first example method get described in detail below and the implementation of a second example method gets described.

The subjective attributes and relevancy scores obtained are associated with entity E (such as by applying corresponding tags to the entity, by adding a record in a table of a relational database, etc.).

Execution continues back.

It should get noted that the classifier may be re-trained (such as after every 100 iterations of the loop, every N days, etc.) by a re-training process that may execute concurrently.

Pre-Processing A Set Of Entities

The method gets performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software (such as gets run on a general-purpose computer system or a dedicated machine), or a combination of both.

The method gets performed, while in some other implementations may get performed by another machine.

The training set gets initialized to the empty set. An entity E gets selected and removed from the set S of entities.

Subjective attributes for entity E are identified based on user reactions to entity E (such as user comments, Like!, bookmarking, sharing, adding to a playlist, etc.).

The identification of subjective attributes includes performing processing of user comments, such as by:

  • Matching words in user comments against subjective attributes in the vocabulary
  • Combining word matching and other natural language processing techniques such as syntactic and semantic analysis
  • Etc

Entities that Occur Near Locations

User reactions may get aggregated for entities that occur in many locations, such as:

  • Entities that appear in many users’ playlists
  • Entities that have gotten shared and appear in a plurality of users’ “newsfeeds” on a social networking website
  • Etc

The different locations may get weighted in their contribution to relevancy scores based on a variety of factors, such as a:

The particular user associated with the location (such as a particular user may be an authority on classical music and thus comments about an entity in his or her newsfeed may get weighted more than comments in another newsfeed, etc.), non-textual user reactions (such as “Like!”, “Dislike!”, “+1”, etc.).

In addition, the number of locations where the entity appears may also be used in determining subjective attributes and relevancy scores (such as relevancy scores for a video clip may be increased when the video clip is in hundreds of user playlists, etc.).

The block gets performed by subjective attribute identifier.

Relevancy scores for the subjective attributes get determined about entity E.

A relevancy score is determined for a particular subjective attribute based on the frequency with which the subjective attribute appears in user comments, the particular users that provided the subjective attributes in their comments (such as some users may be known from past experience to be more accurate in their comments than other users, etc.).

For example, if there are 40 comments and “cute” appears in 20 comments and “awesome” appears in 8 comments, then “cute” may get assigned a relevancy score that is higher than “awesome.”

The relevancy scores may be assigned based on the proportion of comments in which a subjective attribute appears (such as a score of 0.5 for “cute” and a score of 0.2 for “awesome”, etc.).

Under one aspect, the relevancy scores get normalized to fall in intervals [0, 1].

In accordance with some aspects, the subjective attributes identified may be discarded based on their relevancy scores (such as retaining the k subjective attributes with the highest relevancy scores, discarding any subjective attribute whose relevancy score is below a threshold, etc.).

subjective attributes identifier

It should get noted that in some aspects, a subjective attribute may get discarded by setting its relevancy score to zero.

Subjective Attributes And Relevancy Scores Are Associated With The Entities

The subjective attributes and relevancy scores are associated with the entities (such as via tagging, via entries in a table in a relational database, etc.).

A feature vector for entity E gets obtained.

In one aspect, the feature vector for a video clip or still image may contain numerical values about color, texture, intensity, etc., while the feature vector for an audio clip (or a video clip with sound) may include numerical values about amplitude, spectral coefficients, etc., while the feature vector for a text document may include:

  • Numerical values about word frequencies
  • Average sentence length
  • Formatting parameters
  • So forth

This may get performed by the feature extractor.

The feature vector obtained and the relevancy scores obtained get added to the training set.

The bock checks whether the set S of entities is empty; if S is non-empty, execution continues back, otherwise execution proceeds.

The classifier get trained on all the examples of the training set, such that the feature vector of a training example gets provided as input to the classifier, and the subjective attribute relevancy scores get provided as output.

Obtaining Subjective Attributes And Relevancy Scores For An Entity

A feature vector for entity E gets generated.

As described above, the feature vector for a video clip or still image may contain numerical values about color, texture, intensity, etc., while the feature vector for an audio clip (or a video clip with sound) may include numerical values about amplitude, spectral coefficients, etc., while the feature vector for a text document may include numerical values about word frequencies, average sentence length, formatting parameters, and so forth.

The feature vector gets provided to the trained classifier to get predicted subjective attributes and relevancy scores for entity E.

The predicted subjective attributes and relevancy scores get associated with entity E (such as via tags applied to entity E, via entries in a table of a relational database, etc.).

A Second Method For Obtaining Subjective Attributes And Relevancy Scores For An Entity

The method gets performed by processing logic that may comprise hardware (circuitry, dedicated logic, etc.), software, or a combination of both.

The method gets performed by the server machine, while some others may get performed by another machine.

A feature vector for entity E gets generated. The feature vector gets provided to the trained classifier to get predicted subjective attributes and relevancy scores for entity E.

The predicted subjective attributes obtained get suggested to a user (such as the user who uploaded the entity. A refined set of subjective attributes is obtained from the user, such as via a web page in which the user selects from among the suggested attributes and possibly adds new attributes, etc.).

A Default Relevancy Score For Entities

A default relevancy score gets assigned to any new subjective attributes that got added by the user.

The default relevancy score maybe 1.0 on a scale from 0.0 to 1.0, the default relevancy score may be based on the particular user (such as a score of 1.0 when the user is known from past history to be very good at suggesting attributes, a score of 0.8 when the user is known to be somewhat good at suggesting attributes, etc.).

The Block branches get based on whether the user removed any of the suggested subjective attributes (such as by not selecting the attribute).

Entity E gets stored as a negative example of the removed attribute(s) for future re-training of the classifier. The refined set of subjective attributes and corresponding relevancy scores are associated with entity E (such as via tags applied to entity E, via entries in a table of a relational database, etc.).

Sharing is caring!


Content © Credit website

Leave a Reply

Your email address will not be published.

Back to top button