Social Network Analysis for Computer Scientists

This is the old course page! See the new page for recent information!

This course deals with computer science (CS) aspects of social network analysis (SNA), and is open to all students in the master computer science programme at Leiden University.
If you want to participate and are in a different programme, then you should contact the lecturer in advance.

Next edition of the course is in Fall 2019. Information will be available near the end of summer break.
Comments from your evaluation forms of the fall 2018 edition of the course are available.
All grades have been submitted to the student administration on December 21, 2018.
If you did not get a sufficient grade for the project, you have received an e-mail about this.
If you did not pass the (retake) assignments, you did not receive a grade in uSis.

Course information

Lectures: Fridays from 11:00 to 12:45 (Sep. 7 - Dec. 7) in Snellius room 174
Lab sessions: Fridays from 9:00 to 10:45 in room 306/308
Prerequisites: a CS bachelor with courses on Algorithms, Data Structures and Data Mining
Literature: provided papers and book chapters (free and digitally available)
Examination: based on presentation, paper, programming, peer review and participation (no exam)
Study points: 6 ECTS

Lecturer: dr. Frank Takes - f.w.takes@liacs.leidenuniv.nl, room 157b
Assistant lecturer: Anna Latour MSc - a.l.d.latour@liacs.leidenuniv.nl, room 123
Course assistant: Antonio Barata MSc - a.p.pereira.barata@liacs.leidenuniv.nl, room 150
Student assistant: Hanjo Boekhout Bsc - h.d.boekhout@umail.leidenuniv.nl

Need help? Ask your questions during the lab sessions. Otherwise contact snacs@liacs.leidenuniv.nl

Network with 1458 nodes and 1948 edges.

Course schedule

	Date	First slot (9:00-10:45)	Second slot (11:00-12:45)
1.	Sep 7, 2018	No activities	Lecture 0: Course organization Lecture 1: Introduction and small world phenomenon
2.	Sep 14, 2018	Gephi tutorial Work on Assigment 1	Lecture 2: Advanced concepts & centrality (including course project information)
3.	Sep 21, 2018	NetworkX tutorial Work on Assigment 1	Lecture 3: Network projection & community detection Lecture 3.1: Presentation 0
4.	Sep 28, 2018	Work on Assigment 1	Lecture 4: Networks on the web
	Oct 1, 2018	Deadline for Assignment 1: Click here to upload your assignment
5.	Oct 5, 2018	DS Lab, Course project planning Work on Assigment 2	Lecture 5: Network dynamics & Processes on networks
6.	Oct 12, 2018	Work on Assigment 2	Lecture 6: Network motifs & Network science applications
7.	Oct 19, 2018	Work on Assigment 2	Student presentation 1 (11:00): Network sampling Student presentation 2 (12:00): Community detection 2
8.	Oct 26, 2018	Course project paper writing Work on Assigment 2	Student presentation 3 (11:00): Influence spread and virality 1 Student presentation 4 (room 174, 12:00): Top-k closeness centrality Student presentation 5 (room 313, 12:00): Shortest paths 1
	Oct 29, 2018	Deadline for Assignment 2: Click here to upload your assignment
9.	Nov 2, 2018	Work on course project paper	Student presentation 6 (room 174, 11:00): Influence spread and virality 2 Student presentation 7 (room 174, 12:00): Betweenness centrality Student presentation 8 (room 408, 11:00): Neighborhoods Student presentation 9 (room 408, 12:00): Diameter computation
	Nov 9, 2018	Deadline for the first half of the course project paper (for the peer review session)
10.	Nov 9, 2018	Work on course project	Peer review session
11.	Nov 16, 2018	Work on course project	Student presentation 10 (room 174, 11:00): Counting triangles Student presentation 11 (room 174, 12:00): Visualization algorithms 2 Student presentation 12 (room 408, 11:00): Community detection 1 Student presentation 13 (room 408, 12:00): Link prediction
	Nov 19, 2018	Deadline (optional) for preliminary version of course project paper: Click here to upload
12.	Nov 23, 2018	Work on course project code and experimental pipeline for parameter tuning, data processing, results, plotting, etc.	Student presentation 14 (room 174, 11:00): Shortest paths 2 Student presentation 15 (room 174, 12:00): Graph compression Student presentation 16 (room 408, 11:00): De-anonymization of networks Student presentation 17 (room 408, 12:00): Network data errors
	Nov 30, 2018	Deadline for a substantial amount of code / experimental pipeline of the course project (for the code review session)
13.	Nov 30, 2018	Code review session	Plenary student presentation 21 (room ~~408~~ 174, 11:00): Closeness centrality directly followed by course evaluation from 11:45 to 12:00 ~~Student presentation 18 (room 174, 11:00): Link prediction in signed networks~~ Student presentation 19 (room 174, 12:00): Personalized PageRank Student presentation 22 (room 408, 12:00): Visualization algorithms 1
	Dec 7, 2018	Work on course project paper	No lecture
	Dec 12, 2018	Deadline for the final version of the course project paper: Click here to upload
	Dec 19, 2018	Retake deadline for assignments (hand in via e-mail to the lecturer)
	Dec 19, 2018	Evaluation meetings in room 157b: 9:00-9:30 Closeness centrality 9:30-10:00 Betweenness centrality 10:00-10:30 Community detection 1 10:30-11:00 Closeness top-k 11:00-11:30 Virality 2 11:30-12:00 Network data errors 12:00-12:30 Community detection 2 12:30-13:00 Graph compression 13:00-13:30 Personalized Pagerank 13:30-14:00 Sampling 14:00-14:30 Shortest paths 2 ~~15:00-15:30 Signed link prediction~~ 15:30-16:00 Triangles 16:00-16:30 Visualisation 2
	Dec 20, 2018	Evaluation meetings in room 157b: 10:30-11:00 De-anonymization 11:30-12:00 Diameter computation 12:00-12:30 Link prediction 14:00-14:30 Neighborhoods 15:00-15:30 Shortest paths 1 15:30-16:00 Visualisation 1 16:00-16:30 Virality 1
	Dec. 21, 2018	Course end. Grades will be submitted to the student administration
	Jan. 31, 2019	Deadline for all remaining course project retakes and assignments

Lab session Friday November 30 in room 306/308 »

The main goal of this lab session is to in pairs of teams learn from the other team in terms of how to organize code, experiments and data for the course project.
Ask a course assistant to be assigned to a team upon entering the computer room. We start at 9.00.

What is code review about?

Peer review: evaluation of work by one or more individuals with similar competence
Pair programming
Four eyes see more than two
Go beyond your current knowledge and skills
Output: list of best practices

What to do?

Explain your work to the other team
Mention what you have done and not yet done
Introduce the other team to your code
Ask questions about the other team's code
Explain to the other team positive and less positive constructive points about their work
Together, derive useful ``best practices'' and add them to the Google Doc (see website)

Evaluation criteria

Correctness guarantees
Time and memory constraints
Are input and output data validated for consistency?
Is the output easily reusable for result tables or diagrams?
Is there a pipeline of experiments to run different algorithms on different datasets?
Is the data relevant and sufficient?
Is the data ``diverse'' in relevant dimensions?
What do you measure in each experiment? \\ Quality, running time, error?
Why is this data good for these experiments?
Is the data possibly biased and how may this affect the experiments?

Done? Contribute your findings to the list of best practices

Lab session Friday October 26 in room 306/308 »

The main goal of this lab session is to get started with the course project paper.

Course project
During the lab session three weeks ago, you have made a project planning. One upcoming deadline is the first version of your paper for the peer review session.

Template. Download the paper template and make sure you can compile it and have a way of collaborating on the paper with your team partner.
Related work. If you have not yet done so, spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper. You will likely want to include some of these in your paper as well.
Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Try to get this written down in the appropriate section(s) of your paper.
Writing. Make serious progress with writing the first 3 to 4 sections of your course project paper (also work on this in the coming two weeks).

Assignment 2
Take the opportunity to ask final questions about Assignment 2.

Lab session Friday October 5 in room 306/308 »

The main goal of this lab session is to get started with both the course project and with Assignment 2, and to get to know the data science lab.

Data Science Lab
The data science lab website provides necessary information and documentation. Become familiar with the lab and how to run code on it, and how to place data within the lab. Remember: your homedirectory is for your own code, /local is for local storage on the current machine you are are on, and /data is for storing data across data science lab machines. You may want to use the lab's facilities for course assignments and your course project.

Course project
Below are some topics you can discuss and investigate together with your project team partner.

Project schedule. Have a look at the deadlines for the course project in the schedule on this website, and create a sensible planning with your team partner.
Paper. Read your course project paper, and spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper.
Data repositories. Check out the data repositories of real (social) networks such as SNAP and KONECT and think of datasets suitable for your course project.
Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Write down in at most 300 words what you plan to do for your course project, and discuss this with the lecturer or an assistant for feedback.

Assignment 2
Get started with the practical part of Assignment 2.
Note that the large Twitter dataset are also available in the data science lab in the folder /data/SNACS/.

Lab session Friday September 21 in room 306/308 »

We will again use the UNIX/Linux workstations in Snellius room 306/304 for this lab session.
There is again no deliverable for this lab session.
The main goal of this lab session is to become familiar with NetworkX (a Python package to analyze networks for research purposes).
All relevant information on NetworkX can be found in the NetworkX online documentation.

Python instructions to use NetworkX: NetworkX should be installed on all machines.
Unfortunately, the past years have learned us that machines sometimes exhibit diversity in consistency in package availability on these machines. In some years, instructions to get NetworkX running could be found in /vol/share/software/datascience/2016-README.
If you run into issues, it may be productive to forget about preinstalled packages and use Anaconda and install required packages yourself.

To use Python in your own environment:
wget http://repo.continuum.io/archive/Anaconda3-4.3.0-Linux-x86_64.sh bash Anaconda3-4.3.0-Linux-x86_64.sh
Accept questions with 'yes'.
source anaconda3/bin/activate
You should now have your own environment in which you can install packages as you wish.

Take some time to do the NetworkX tutorial.
Have a look at functionality to read and write graphs from/to disk.
A lot of the common network metrics you may want to compute are implemented as NetworkX function. Become familiar with these.
Try to load the network from the first lab session (/vol/share/groups/liacs/scratch/SNACS/small-gephiready.tsv) into NetworkX, and investigate some characteristic properties of the network, such as the degree distribution. Use the read_edgelist function.
Get the Epinions network from the SNAP repository. Compute some common characteristics, and visualize them using appropriate distributions.
What are the differences between NetworkX and Gephi in terms of visualization and analysis capabilities?
Proceed with Exercise 2 of Assignment 1.

Looking for a challenge? Check out Graph-Tool, a python graph analysis toolkit that leaves the hard computation to parallel (OpenMP) C++ code. Or snap, which is entirely written in C++ (although there is now also a Python plugin).

Lab session Friday September 14 in room 306/308 »

We will use the UNIX/Linux Ubuntu workstations in the Snellius computer rooms for this lab session.
This ICT FAQ may provide links to more information on how to use these workstations.
If you really must, you can, at the risk of bugs and problems that course personnel can probably not help you solve, attempt to use Windows or the even more unstable version of Gephi for Macintosh.

Learning goals

The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format. At the end of this session you should be able to import and visualize raw network data with labeled nodes and labeled and/or weighted edges (directed or undirected), and you should understand how to map edge and node size and color to structural network properties such as the node degree. You should be able to export computed node data and be able to export a vector graphic PDF of your network. There is no deliverable for this lab session.

Instructions

Extract and run Gephi locally on a student workstation (running it from your homedirectory may be slow), for example on the local disk in /scratch/gephi. Installation instructions, if needed, can be found on the Gephi website.
Run Gephi by starting the Linux executable bin/gephi. If this does not immediately work, you may have to adjust one line in etc/gephi.conf so that Gephi can find the right Java version (and remove the # in front of the line):
jdkhome="/usr/lib/jvm/java-8-openjdk-amd64/"
The Gephi website has various official tutorials that you can walk through; in particular Quick start, Visualization and Layouts.
Note: The Quick Start Guide is for a slightly older version of Gephi. On p. 9-10, contrary to what is indicated in the getting started guide, the ranking module is part of the appearance module (since Gephi 0.9), and does not appear as a separate tab. The “Ranking result” table no longer exists, and you can skip that part of the tutorial. In the tutorial on Visualization, on p. 9 the 3-D visualisation option was bugged and has been removed in Gephi 0.9. You can skip that part of the tutorial.
You may instead benefit from using a more modern tutorial (courtesy of Derek Geene at UCD).
Try out some of the Gephi datasets and see if you can understand how they differ in terms of density, degree distribution, giant components, clustering and average distance.
Learn how to import raw data from this short data import tutorial. Visualize the network represented by the edge list /vol/share/groups/liacs/scratch/SNACS/small-gephiready.tsv.
Compute the degree for each node and export it to a file (which you can for example reuse in other plotting tools).
Make a new, very simple, edge list input file of a network with say 10 nodes and 20 edges, and import it into Gephi. Now add to the input file (in some order) link (un)direction, link weights, node labels and link labels. For example, have nodes represent students, and link represent friendships, and choose node and edge labels accordingly. Make sure that you are able understand the difference in file input and visualization output.
Play around with some different visualization algorithms such as ForceAtlas 2 (for example, with scaling set to a higher value and with stronger gravity checked) and Fruchterman Reingold. Export your colored visualization to PDF so that you have it as a vector graphic.
Get started with Exercise 2 (the practical part) of Assignment 1.

Course project

Teams work on a course project for 60% of the course grade, is about a certain topic related to social network analysis, and consists of:

Giving a 30-minute presentation (+15 minutes for questions) on one of the papers on their topic. Some topics include overview papers, these should not be presented. A paper presenting at least some experiments, should be chosen. At least a Powerpoint/PDF presentation and some demonstration of an implementation or visualization will be required. Teams are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
Gathering and implementing the algorithms and/or techniques from the different papers, and running experiments on at least five large real-world network datasets. Thus, some programming has to be done. Teams will also give feedback on the code produced by other teams in the code review session. Some papers introduce multiple techniques. In that case, choose a logical subset to compare, and motivate your choice.
Datasets can for example be found at SNAP, KONECT, BigDND, Networks Repository, ICON, LASAGNE and ASU. Certain topics need particular datasets (e.g. with timestamps, signed links, etc.), which should of course be taken into account when selecting datasets.
As a rule, the course project paper should always have one novel (new) aspect compared to the papers on which it is based. For example, a new tweak to an existing algorithm, a large number of datasets to test the algorithms to find a relation between the network characteristics and the performance, a new performance metric to evaluatie the algorithms, a new type of visualization of the algorithm or results, an improvement of a proof related to the algorithm, etc. In case of doubt about the contribution, contact the course staff; you can always ask for help.
Writing one 6 to 10 page paper. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper. Students will also give feedback on the paper produced by another team in the peer review session.

Instructions for the course project paper are available.

Project topics

This list of projects is shown below. Take a look at the paper, before choosing a topic. E-mail the lecturer at f.w.takes@liacs.leidenuniv.nl with your topic preference and names of the two team members. First come, first serve. Your topic is not confirmed until you receive an e-mail from me. You have until September 28 to e-mail your preferred topic. After that, a final division of topics and formation of teams will be made on October 6 during the lecture. Prefix [Chosen] indicates a topic is no longer available.

[Chosen] Shortest paths 1
[Chosen] Shortest paths 2
[Chosen] Diameter computation
[Chosen] Network sampling
[Chosen] Graph compression
[Chosen] Betweenness centrality 1
Betweenness centrality 2
[Chosen] Closeness centrality
[Chosen] Top-k closeness centrality
[Chosen] Link prediction
[Chosen] Influence spread and virality 1
[Chosen] Influence spread and virality 2
[Chosen] Community detection 1
[Chosen] Community detection 2
[Chosen] Visualization algorithms 1
[Chosen] Visualization algorithms 2
Network motifs 1
Temporal network motifs 2
[Chosen] De-Anonymization of networks
[Chosen] Neighborhoods
[Chosen] Link prediction in signed networks
[Chosen] Personalized PageRank
[Chosen] Network data errors
[Chosen] Counting triangles

This list is still updated regularly as teams are formed and topics are chosen.

Note: scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via SSH/VPN/etc.).
IEEE Explore papers can often be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar by typing in the title of the paper.

Reading material

Some students have expressed interest in additional reading material to help freshen up on skills and knowledge required for this course.

To brush up on your Python programming skills, you could do Lecture 0, 1 and 4 of these Scientific Python Lectures.
To freshen up your LaTeX skills, you could do for example this tutorial.
MIT has a nice online course on discrete mathematics. Relevant chapters: 5, 6 and 7.
The (Dutch) Leiden courses Algoritmiek (Algorithms) and Datastructuren (Data structures) might be also interesting to browse through.
Basic knowledge of the concepts in machine learning and data mining are provided, for example, at the Wikipedia machine learning portal.

Course description

See the e-Studyguide for a more general description.

Topics include: SNA from a CS perspective (graph representation, complexity issues, examples), Graph Structure (power law, small world phenomenon, clustering coefficient, hierarchies), Paths and Distances (neighborhoods, radius, diameter), Spidering and Sampling (BFS, forest fire, random walks), Graph Compression (graph grammars, bitwise tricks, encryption, hashing), Centrality (degree centrality, closeness centrality, betweenness centrality, rating and ranking), Centrality and Webgraphs (HITS, PageRank, structure of the web), Community Detection (spectral clustering, modularity), Visualization (force-based algorithms, Gephi, NodeXL), Graph Models (random graphs, preferential attachment), Link Prediction (structure, semantics, prediction algorithms, graph mining), Contagion (diffusion of information, spreading activation, gossipping) and Privacy and Anonymity ((de-)anonymizing graphs, ethical aspects, privacy issues) and various other topics that have been added over the years but are not yet in the list above.

The course was also given in 2014, 2015, 2016 and 2017.