NoSQL Graph-based Database

NoSQL Graph-based Database

 

Create a graph-based database.

You are to design graph-based database in Neo4j that connect tweets to the ideas that are contained in the texts of the tweets expressed as a concept map. Your database will be the basis for an application that monitors tweets and creates/updates the concept map as new tweets are generated for a topic or from an organization or individual. You do not need to write the application, merely design the database that it will need.

These sites provide some information on what is a concept map. Essentially it is a generic set of nodes that can be connected by relationships of any type or meaning.

https://www.lucidchart.com/pages/concept-map  & https://en.wikipedia.org/wiki/Concept_map 

 

Your design must include the original text of the tweet and other relevant attributes, e.g., date, hash tags, URLs, tweet-id, and twitterer-id. This Neo4j sandbox example of The Russian Twitter Trolls on this page contains a good description of the data available for tweets: https://neo4j.com/sandbox-v2/   (Note: You might not need all the attributes of the tweets to meet the requirements for this project.)

Design criteria. (30 points)

A tweet, that includes origin date, twitter id of originator, text of the tweet, hash tags, and any URLs associated with the tweet.

A concept that corresponds to an idea derived from the text of a tweet. For example, a tweet might reference a company, stock, or location in the text, if so this company, stock, or location could become a node in the concept map. More likely is the reference of an idea, e.g., tariffs or GDP, may appear in the text and should become a concept in the map.

A relationship that relates tweets to the concepts contained in their text. Every tweet that includes a concept in its text should have a relationship with the concept in the map.

A set of relationships between concept nodes derived from the text of the tweets. See the links on concept maps to understand how to build one.

Data population. (30 points)

Collect set of at least 10 tweets and enter the data from them into your database.

Analyze each tweet to identify the concepts referenced. This means that you manually apply the concept mapping techniques to the text of the tweet. Add each concept to your database with the appropriate relationships to the tweets that include that concept.

Analyze each tweet to identify any relationships among the concepts referenced. Add each relationship discovered to your database, these are relationships only among concepts and correspond to creating a concept map from the knowledge contained in the tweets.

Queries (Note: each of these queries can be implemented through a series of steps using the Neo4j system) (30 points)

  1. List the tweets in order of occurrence.
  2. List the concepts in descending order by the number of tweets that reference them.
  3. List all of the tweets associated with any node in the concept map.
  4. Identify the most central node in the concept map for all relationships. Note: This link has examples of centrality: https://neo4j.com/docs/graph-algorithms/current/algorithms/centrality/
  5. Identify the tweet that references the most concepts.
  6. List all of the tweets associated with the most central node in the concept map.
  7. Identify the most central node in the concept map for a particular type of relationship among concepts.
  8. List the tweets related to a particular hast tag.
  9. List the concepts related to tweets with a particular hash tag.
  10. List the URLs from the tweets for any node in the concept map.

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s