General Overview of Google NLP
...
- Syntax Analysis
- Sentiment Analysis
- Entity Analysis
- Entity Sentiment Analysis
- Text Classification
Syntax Analysis
For a given text, Google’s syntax analysis will return a breakdown of all words with a rich set of linguistic information for each token. The information can be divided into two parts:
Part of speech. This part contains information about the morphology of each token. For each word, a fine-grained analysis is returned containing its type (noun, verb, etc.), gender, grammatical case, tense, grammatical mood, grammatical voice, and much more.
Example sentence: “A computer once beat me at chess, but it was no match for me at kickboxing.”A tag: DET 'computer' tag: NOUN number: SINGULAR 'once' tag: ADV 'beat' tag: VERB mood: INDICATIVE tense: PAST 'me' tag: PRON case: ACCUSATIVE number: SINGULAR person: FIRST at tag: ADP 'chess' tag: NOUN number: SINGULAR ',' tag: PUNCT 'but' tag: CONJ 'it' tag: PRON case: NOMINATIVE gender: NEUTER number: SINGULAR person: THIRD 'was' tag: VERB mood: INDICATIVE number: SINGULAR person: THIRD tense: PAST 'no' tag: DET 'match' tag: NOUN number: SINGULAR 'for' tag: ADP 'kick' tag: NOUN number: SINGULAR 'boxing' tag: NOUN number: SINGULAR '.' tag: PUNCT - Dependency trees. The second part of the return is called a dependency tree, which describes the syntactic structure of each sentence.
Sentiment Analysis
Google’s sentiment analysis will provide the prevailing emotional opinion within a provided text. The API returns two values: The “score” describes the emotional leaning of the text from -1 (negative) to +1 (positive), with 0 being neutral.
The “magnitude” measures the strength of the emotion.
Input Sentence | Sentiment Results | Interpretation |
---|---|---|
The train to London leaves at four o'clock | Score: 0.0 Magnitude: 0.0 | A completely neutral statement, which doesn't contain any emotion at all. |
This blog post is good. | Score: 0.7 Magnitude: 0.7 | A positive sentiment, but not expressed very strongly. |
This blog post is good. It was very helpful. The author is amazing. | Score: 0.7 Magnitude: 2.3 | The same sentiment, but expressed much stronger. |
This blog post is very good. This author is a horrible writer usually, but here he got lucky. | Score: 0.0 Magnitude: 1.6 | The magnitude shows us that there are emotions expressed in this text, but the sentiment shows that they are mixed and not clearly positive or negative. |
Entity Analysis
Entity Analysis is the process of detecting known entities like public figures or landmarks from a given text. Entity detection is very helpful for all kinds of classification and topic modeling tasks.
A salience score is calculated. This score for an entity provides information about the importance or centrality of that entity to the entire document text.
Example: “Robert DeNiro spoke to Martin Scorsese in Hollywood on Christmas Eve in December 2011.”.
Detected Entity | Additional Information |
---|---|
Robert De Niro | type : PERSON salience : 0.5869118 wikipedia_url : https://en.wikipedia.org/wiki/Robert_De_Niro |
Hollywood | type : LOCATION salience : 0.17918482 wikipedia_url : https://en.wikipedia.org/wiki/Hollywood |
Martin Scorsese | type : LOCATION salience : 0.17712952 wikipedia_url : https://en.wikipedia.org/wiki/Martin_Scorsese |
Christmas Eve | type : PERSON salience : 0.056773853 wikipedia_url : https://en.wikipedia.org/wiki/Christmas |
December 2011 | type : DATE Year: 2011 Month: 12 salience : 0.0 wikipedia_url : - |
2011 | type : NUMBER salience : 0.0 wikipedia_url : - |
Entity Sentiment Analysis
If there are models for entity detection and sentiment analysis, it’s only natural to go a step further and combine them to detect the prevailing emotions towards the different entities in a text.
Example: “The author is a horrible writer. The reader is very intelligent on the other hand.”
Entity | Sentiment |
---|---|
author | Salience: 0.8773350715637207 Sentiment: magnitude: 1.899999976158142 score: -0.8999999761581421 |
reader | Salience: 0.08653714507818222 Sentiment: magnitude: 0.8999999761581421 score: 0.8999999761581421 |
Text Classification
Classifies the input documents into a large set of categories. The categories are structured hierarchical, e.g. the Category “Hobbies & Leisure” has several sub-categories, one of which would be “Hobbies & Leisure/Outdoors” which itself has sub-categories like “Hobbies & Leisure/Outdoors/Fishing.”
Example: “The D3500’s large 24.2 MP DX-format sensor captures richly detailed photos and Full HD movies—even when you shoot in low light. Combined with the rendering power of your NIKKON lens, you can start creating artistic portraits with smooth background blur. With ease.”
Category | Confidence |
---|---|
Arts & Entertainment/Visual Art & Design/Photographic & Digital Arts | 0.95 |
Hobbies & Leisure | 0.94 |
Computers & Electronics/Consumer Electronics/Camera & Photo Equipment | 0.85 |
Anotate text
A convenience method that provides all the features that analyzeSentiment, analyzeEntities, and analyzeSyntax provide in one call.
https://cloud.google.com/natural-language/docs/reference/rest/v1/documents/annotateText
...
No Format |
---|
{ "entities": [ { # record "name": "1600 Pennsylvania Ave NW, Washington, DC", "type": "ADDRESS", "country": "US", "metadata": { # These fields can vary a lot for different entities, we need toI think it makes it cleaner if we leave this asif nested map<stringrecord, string>instead and not flattenof flattening. "country": "US", "sublocality": "Fort Lesley J. McNair", "locality": "Washington", "street_name": "Pennsylvania Avenue Northwest", "broad_region": "District of Columbia", "narrow_region": "District of Columbia", "street_number": "1600" }, "salience": 0, "mentions": [ { # record "content": "1600 Pennsylvania Ave NW, Washington, DC", "beginOffset": 60 "type": "TYPE_UNKNOWN" } ] } } ... ], "language": "en" } |
...