1. Create an author-to-author tweet edge file from the original data set, stocktwit_graph_input.csv.
Create an edge file from the original data set, stocktwit_graph_input.csv. We just need two columns – source (Vertex 1) and target (Vertex 2) of an edge to create a graph. Select all rows – tweets for columns K- “from_person” and M – “to_person” (or J and L for numerical author IDs) and save it as “stocktwit_from_to” or another name you prefer.
2. Use Gephi to generate and save author (node) metrics. Select the metrics you like to explore and use for building models later. Include at least 5 different metrics. Save the metrics in a file named as stocktwit_node_yourname.csv. Submit this file. Include answers to the following questions in HW6_yourname.doc for submission.
a. Which three authors have the highest betweenness centrality?
b. Which three authors have the highest total degree?
c. Which three authors have the highest closeness?
3. Build the Node Table for Prediction
(1). Open the stocktwit_node.csv file in Excel, and create a new variable: Expert (i.e. suggested). It is the target variables we aim to classify or predict.
(2). Do not close the stocktwit_node.csv file. Open the stocktwit_graph_input.csv file. And then go to the stocktwit_node.csv.
(3). Note that the unit in the stocktwit_node.csv file is a node (i.e. each individual author) and the unit in the stocktwit_graph_input.csv file is a tweet (i.e. each message). So, in order to transfer the value of expert from the table of stocktwit_graph_input to the stocktwit_node table, we need to do data transformation.
To Expert, we need to assign one value to one author (i.e. whether they are expert or not – 1 stands for yes; 0 stands for no.).
Use the VLOOKUP function to assign the value of “suggested” from the table of stocktwit_graph_input to the column, “Expert”, in stocktwit_node table. The function for the first row should be like this:
= VLOOKUP(A2, stocktwit_graph_input.csv!$K$1:$AB$38200,18,FALSE),
where “A2” is the node name; “stocktwit_graph_input.csv!$K$1:$AB$38200” is the table range we look up; 18 is the column number from the table range that we aim to return the value, “FALSE” stands for an exact match of the value.
(4). Save the stocktwit_node.csv file. BTW, you can delete those rows who have missing value in Expert, because these nodes only appear in the “to_person” column, they do not have tweets.
Use filter function in excel to remove the #NAs.
4. In R, build and evaluate a classification model that uses the metrics in stocktwit_node_yourname.csv from step 2 as features to classify authors into “expert” stocktwit author (i.e., “suggested”=1)” or not (“suggested”=0) which is the target label variable.
(1). Using a seed of 100, randomly select 60% of the rows into training (e.g. called traindata). Divide the other 40% of the rows evenly into two holdout test/validation sets (e.g., called testdata1 and testdata2).
(2). Build the tree using the C50 function with default settings.
(3). Generate predictions (i.e. estimations) of the values of the target variable for the testing instances.
Generate a confusion matrix that shows the counts of true-positive, true-negative, false-positive and false-negative predictions for both testdata1 and testdata2. Consider 1 as positive class.
Generate seven performance metrics – Accuracy (percent of all correctly classified testing instances), and precision (percent of instances predicted to have a class are accurate), recall (also true positive) and F-measure (also F-score) of the two classes of expert.
(4). Would you recommend using the features from network analysis to identify experts in the Stocktwit community? Why or why not? Include answers to the following questions in HW6_yourname.doc for submission.
We value our customers and so we ensure that what we do is 100% original..
With us you are guaranteed of quality work done by our qualified experts.Your information and everything that you do with us is kept completely confidential.
You have to be 100% sure of the quality of your product to give a money-back guarantee. This describes us perfectly. Make sure that this guarantee is totally transparent.Read more
The Product ordered is guaranteed to be original. Orders are checked by the most advanced anti-plagiarism software in the market to assure that the Product is 100% original. The Company has a zero tolerance policy for plagiarism.Read more
The Free Revision policy is a courtesy service that the Company provides to help ensure Customer’s total satisfaction with the completed Order. To receive free revision the Company requires that the Customer provide the request within fourteen (14) days from the first completion date and within a period of thirty (30) days for dissertations.Read more
The Company is committed to protect the privacy of the Customer and it will never resell or share any of Customer’s personal information, including credit card data, with any third party. All the online transactions are processed through the secure and reliable online payment systems.Read more
By placing an order with us, you agree to the service we provide. We will endear to do all that it takes to deliver a comprehensive paper as per your requirements. We also count on your cooperation to ensure that we deliver on this mandate.Read more