This course deals with computer science (CS) aspects of social network analysis (SNA), and is open to all students in the master computer science programme at Leiden University. If you want to participate in the 2023 edition and are in a different programme, then you should contact the lecturer in advance.


Course information

Lectures: Fridays from 11:00 to 12:45 in Gorlaeus room C1
Lab sessions: Fridays from 9:00 to 10:45 in Snellius rooms 302/304, 306/308, etc.
Prerequisites: a CS bachelor with courses on Algorithms, Data Structures and Data Mining
Literature: provided papers and book chapters (free and digitally available)
Examination: based on presentation, paper, programming, peer review and participation (no exam)
Brightspace link: 2324-S1 Social Network Analysis for Computer Scientists
Study points: 6 ECTS

Lecturer: dr. Frank Takes (f.w.takes@liacs.leidenuniv.nl, room 157b)
Assistants: Hanjo Boekhout MSc (room 126), Rachel de Jong MSc (room 152),
Marton Menyhert MSc, Shaoxuan (Anthony) Zhang, Chao Zhao.

Need help? Ask your questions during the lab sessions. If it is more urgent, walk by the lecturer or assistant's offices. If they are not around, contact snacs@liacs.leidenuniv.nl.


[Network visualization image]

Network with 1458 nodes and 1948 edges.

Course schedule

  Date Lecture (11:00-12:45) Lab session (9:00-10:45)
1. Fri Sep 8, 2023 No lecture No lab session
2. Fri Sep 15, 2023 Lecture 0: Course information
Lecture 1: Introduction
Introduction to Gephi
Work on Assignment 1
3. Fri Sep 22, 2023 Lecture 2: Advanced concepts and centrality
Lecture 2.5: Course project
Introduction to NetworkX
Work on Assignment 1
4. Fri Sep 29, 2023 Lecture 3: Network projection and community detection Work on Assignment 1

Mon Oct 2, 2023


Deadline for Assignment 1 (AoE; hand in via Brightspace)

5. Fri Oct 6, 2023 Lecture 4: Structure of the web and propagation-based centrality
Example presentation
Course project planning session
Work on Assignment 2
6. Fri Oct 13, 2023 Lecture 5: Network evolution and model extensions Data Science Lab introduction
Work on Assignment 2
7. Fri Oct 20, 2023 No lecture Work on Assignment 2

Mon Oct 23, 2023


Deadline for Assignment 2 (AoE; hand in via Brightspace)

Session 1 (11:00-11:50)Session 2 (12:10-13:00)Lab session (9:00-10:45)
8. Fri Oct 27, 2022 Track A (Snellius 174, Frank)
1. Anomaly detection
33. Network embeddings
Track C (Snellius 174, Frank)
7. Anonymity in networks
15. Community detection
Course project paper writing tutorial
Work on course project
Track B (Snellius 412, Hanjo)
18. Core/periphery structure
47. Visualization algorithms
Track D (Snellius 412, Hanjo)
32. Network motifs
44. Shortest paths
9. Fri Nov 3, 2023 Track A (Snellius 174, Frank)
13. Community detection
21. Influence spread and virality
Track C (Snellius 174, Frank)
11. Centrality estimation
31. Network motifs
Work on course project
Track B (Snellius 412, Hanjo)
6. Anonymity in networks
14. Community detection
Track D (Snellius 412, Hanjo)
12. Centrality estimation
24. Influence spread and virality

Fri Nov 10, 2023


Deadline for a first (printed) draft version of the Course Project paper (for today's peer review session)

10. Fri Nov 10, 2023 Peer review session (starts 11:00 in Gorlaeus C1) Work on course project
Session 1 (11:00-11:50)Session 2 (12:10-13:00)Lab session (9:00-10:45)
11. Fri Nov 17, 2023 Track A (Snellius 174, Frank)
_
45. Temporal networks
Track C (Snellius 174, Frank)
27. Link prediction
35. Network embeddings
Work on course project
Track B (Snellius 412, Hanjo)
2. Anomaly detection
30. Network motifs
Track D (Snellius 412, Hanjo)
4. Anomaly detection
16. Community detection
Extra Track (402; 10:15):
18. Core/periphery structure
12. Fri Nov 24, 2023 Work on course project
Track C (Snellius 412, Hanjo)
19. Graph compression
23. Influence spread and virality
Track A (Snellius 412, Hanjo)
9. Centrality estimation
17. Core/periphery structure
Extra Track A (Snellius 401, Hanjo); 10:00
25. Link prediction
50. Graph compression
Track D (Snellius 174, Frank)
8. Anonymity in networks
28. Link prediction
Track B (Snellius 174, Frank)
10. Centrality estimation
42. Shortest paths
Extra Track D (Snellius 174, Frank); 10:00
51. Network embeddings
40. Sampling from networks

Fri Nov 26 24, 2023


Deadline for a preliminary version of the Course Project paper for course staff feedback (AoE; hand in via Brightspace)


Fri Dec 1, 2023


Deadline for completing a substantial part of the Course Project code & experimental pipeline (for today's code review session)

13. Fri Dec 1, 2023 Code review session; see instructions Work on course project
Session 1 (11:00-11:50)Session 2 (12:10-13:00)Lab session (9:00-10:45)
14. Fri Dec 8, 2023 Track C (Snellius 412, Hanjo)
3. Anomaly detection
43. Shortest paths
Track A (Snellius 412, Hanjo)
5. Anonymity in networks
29. Network motifs
Work on course project
Track D (Snellius 174, Frank/Rachel)
20. Graph compression
48. Visualization algorithms
Track B (Snellius 174, Frank/Rachel)
26. Link prediction
38. Sampling from networks
Session 1 (10:00-10:50)Session 2 (11:10-12:30)Lab session (9:00-10:45)
15. Fri Dec 15, 2023 Track C (Snellius 313, Hanjo)
39. Sampling from networks
46. Temporal networks
Track A (Snellius 412, Hanjo)
41. Shortest paths
36. Network embeddings
45. Temporal networks
Work on course project
Track D (Snellius 174, Frank)
52. Influence spread and virality
53. Visualization algorithms
Track B (Snellius 174, Frank)
22. Influence spread and virality
37. Sampling from networks

Dec 17, 2023


Deadline for final Course Project paper and accompanying code (AoE; hand in via Brightspace)

Dec 19, 2023 Deadline for retake assignment to replace failed assignment(s) (AoE; hand in via email)
Dec 22, 2023 Course end
Jan 31, 2024 Course project retake deadline

Code review (week 13). The main goal of this code review session is to is conduct a review of another team's code and experimental setup, and to receive feedback on one's own code.
What to do? Follow the instructions in the code review session slides. We use the same division into teams as for the peer review in week 10 (see below). In case team pairs are not complete, contact the course staff in Snellius 174 after the introductory lecture.
Deliverable. The mandatory deliverable for today is a contribution to the collaborative document (see link below).

» Enter your Best Practices from the Code Review 2023 in this Collaborative Document


Peer review (week 10).
The main goal of this peer review session is to conduct a review of another team's draft project paper, and to receive feedback on one's own paper. The mandatory deliverable for today is a short report on the feedback you were able to give the other team, to be handed in via Brightspace at the end of the session.

Team matching

Please find the team you are matched with, and ask help from the assistants if needed. Each line lists two teams that are matched up.

1. Anomaly detection2. Anomaly detection
3. Anomaly detection4. Anomaly detection
5. Anonymity in networks6. Anonymity in networks
7. Anonymity in networks8. Anonymity in networks
9. Centrality estimation10. Centrality estimation
11. Centrality estimation12. Centrality estimation
13. Community detection14. Community detection
15. Community detection16. Community detection
17. Core/periphery structure18. Core/periphery structure
19. Graph compression20. Graph compression
21. Influence spread and virality22. Influence spread and virality
23. Influence spread and virality24. Influence spread and virality
25. Link prediction26. Link prediction
27. Link prediction28. Link prediction
29. Network motifs30. Network motifs
31. Network motifs32. Network motifs
33. Network embeddings34. Network embeddings
51. Network embeddings
37. Sampling from networks38. Sampling from networks
39. Sampling from networks40. Sampling from networks
41. Shortest paths42. Shortest paths
43. Shortest paths44. Shortest paths
45. Temporal networks46. Temporal networks
47. Visualization algorithms48. Visualization algorithms
50. Graph compression52. Influence spread and virality
35. Network embeddings53. Visualization algorithms

The main goal of this lab session is to ensure that your team is all set for making serious progress writing the course project paper in the coming weeks.

Course project paper
Earlier, you have made a project planning. One upcoming deadline is the first version of your paper for the peer review session in 2 weeks.

  1. Template. If you have not yet done so, download the paper template and make sure you can compile it and understand how to use it. Remember to fill in the meta information.
  2. Related work. If you have not yet done so, spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper. You will likely want to include some of these in your paper as well. At least make sure that you are aware of the major works that cite your paper, and that you reference some of these works in your introduction when you sketch the context of your work.
  3. Project contribution. For the course project, you go beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Try to get your intended contribution written down in the appropriate section(s) of your paper.
  4. Collaboration. How will you work together on the paper? Git? Overleaf? (the university provides free pro licenses). How will you share code, data and results with each other, and ensure integration of it all?
  5. What is a good paper?. What defines a good paper? Discuss with your project partner to align your expections. When are you "done"? When is the paper "sufficient"? When is it "good"? And how about "excellent"?
  6. Writing. Make serious progress with writing the first 3 to 4 sections of your course project paper in the coming two weeks, so that you have something substantial for the peer review session.
  7. Planning. Look at the planning you made early October. Have you reserved time to review each other's texts? Are adjustments necessary?

As always: please ask course staff for help if you are still unsure about certain aspects of your project.

Need to freshen up or improve academic writing skills? See Academic writing: a practical guide and the The academic phrasebank. There is also the Leiden Science Skills platform.


The main goal of this lab session is to get to know the data science lab.

If you are working remotely, learn how-to set up remote access to the LIACS Research and Education Laboratory (REL)

Data Science Lab
The data science lab website provides necessary information and documentation. The emailadres to contact in case of access issues or other technical problems is rel@liacs.leidenuniv.nl.

Become familiar with the lab and how to run code on it, and how to place data within the lab. This may be handy for Assignment 2 and/or the course project. Remember:

  • /home/sXXXXXX, your homedirectory ~, is for your own code (don't put large stuff there),
  • /local is for local storage on the current machine you are are on,
  • /data is for storing data across data science lab machines.

Think of setting up passwordless login, consider using ProxyJump to avoid the gateway, and perhaps use sshfs to mount your remote homedirectory. Some IDEs also offer all of this functionality.

Assignment 2
Continue with the practical part of Assignment 2.


The main goal of this lab session is to get started with both the course project and with Assignment 2.

Course project
Below are some topics you can discuss and investigate together with your project team partner.

  1. Project schedule. Have a look at the deadlines for the course project in the schedule on this website, and create a sensible planning with your team partner.
  2. Paper. Read your course project paper, and spend some time at Google Scholar investigating a) what other papers exist on this topic, b) which relevant papers cite your paper, and c) what important references are presented in your paper.
  3. Data repositories. Check out the data repositories of real (social) networks such as SNAP and KONECT and think of datasets suitable for your course project.
  4. Project contribution. For the course project, you have to do something original. Ideally, this goes beyond the one paper that you were assigned, comparing techniques from multiple papers, for example comparing different algorithms or methods, using different validation metrics, or testing on (a) larger (number of diverse) datasets. Write down in at most 150 words what you plan to do for your course project, and feel free to discuss this with the lecturer or an assistant for feedback, this week or the next. Also see the generic instructions for the Course Project.

Done? Get started with Assignment 2.


networkx

The main goal of this lab session is to become familiar with NetworkX (a Python package to analyze networks for research purposes).
All relevant information on NetworkX can be found in the NetworkX online documentation.
For this lab session, you need a working Python environment. For this, there are two options:

  1. Use your own self-installed Python environment, and choose your own editor and way of running code (via the command line, an IDE or an interactive notebook). Proceed to Instructions for today
  2. Alternatively, you can use the desktop machines in the student computer rooms,
    or the university's remote SSH functionality to remotely work on these machines via the gateway sshgw.leidenuniv.nl, and then connecting to a student computer (see the REL page on SSH access, only accessible internally, for a list of computer names).
    To create a Python environment on the university computers or servers, it is easiest to use conda.
    1. Download Miniconda by running the following command in your terminal: wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    2. Install Miniconda: bash Miniconda3-latest-Linux-x86_64.sh
      The installer will ask you some questions. Normally, the standard location suffices and you do not have to add conda to path.
    3. After installation finishes, close your terminal so that the changes take effect. Upon opening a new terminal, activate the conda virtual environment with: source miniconda3/bin/activate
      If all goes well, you will see that (base) appears in front of your shell prompt. This means that conda's base environment is active.
    4. Next, create a new conda environment: conda create --name snacs. And activate it with: conda activate snacs
      You might use conda for other courses as well, and version conflicts can arise when you install many packages. Using separate environments will prevent this issue.
      NOTE: some packages can make your conda environment take up a lot of space. So try not to use an unnecessary number of environments with duplicate packages. You can also remove an environment that you don't need anymore, see the conda cheatsheet.
    5. Packages can easily be installed with conda. To install NetworkX, run: conda install networkx
    6. Getting started
      • Running code from the command line. Now you should be able to run your code from the command line by typing
        python3 scriptname.py
      • Running code in interactive Jupyter notebook from the university environment:
        conda install -c conda-forge jupyterlab
        jupyter-notebook
        When using Jupyter through SSH, make sure to use port forwarding whilst SSH-ing, using options -g -L 8888:127.0.0.1:8888. Also add options --no-browser and --port=8888 when starting up Jupyter.

Instructions for today: Lab session on NetworkX

  1. Take some time to do the NetworkX tutorial.
  2. Have a look at functionality to read and write graphs from/to disk, and in particular learn how to import an edge list. Understand the input format and ways of including things like Weight and other attributes of edges.
  3. A lot of the common network metrics you may want to compute are implemented as NetworkX function or NetworkX algorithm. Become familiar with these, for example by computing measures such as degree assortativity, clustering, density, diameter and average distance.
  4. While at it, why not take a look at how NetworkX relates to other data formats?
  5. Try to load the network from the first lab session (small-gephiready.tsv) into NetworkX, and investigate some characteristic properties of the network, such as the degree distribution and distance distribution. Use theread_edgelist function and remember to select the correct separator (tab), and pay attention to the header of the file.
  6. Get the Epinions network from the SNAP repository. You may need to fiddle with the precise header, but it is already in edge list format. Compute some common characteristics such as the degree distribution, and visualize them using appropriate figures of distributions, for example using Matplotlib.
  7. What are the differences between NetworkX and Gephi in terms of visualization and analysis capabilities?

Done? Proceed with Exercise 2 of Assignment 1.

Looking for a challenge? Check out these three alternatives (that you can also use instead of NetworkX throughout the course, if you prefer (but for which there is less help available)):

  • Graph-Tool, a python graph analysis toolkit that leaves the hard computation to parallel (OpenMP) C++ code.
  • SNAP, which is entirely written in C++ and has many interesting features.
  • Python igraph, a Python version of the R igraph package.

gephi

Note: as a result of lecture 1 not going through, you might not be familiar with all terms on concepts just yet. Ask questions from the assistants where needed, and otherwise wait for the upcoming lecture for more explanation and details.

About social network analysis tools and packages. There exist different tools and package for social network analysis. In this course, we will introduce you to two of them, with complementary advantages:

  1. Gephi, an easy-to-use tool with a graphical interface useful for visualization and quick analysis of relatively small network data (this week, see below).
  2. NetworkX, an extensive Python package for network analysis that can handle larger network datasets and computations (next week).

Learning goals. The main goal of this lab session is to become familiar with Gephi (experimental beta-software to visualize networks for research purposes) and its input format. At the end of this session you should be able to:

  • Know how to use Gephi for social network analysis
  • Import and visualize raw network data with labeled nodes and labeled and/or weighted edges (directed or undirected),
  • Understand how to map edge and node size and color to structural network properties such as the node degree and edge type,
  • Know how to apply filters to the visualization, for example to focus only on the giant component,
  • Export a vector graphic PDF of your network for reuse in for example a presentation or paper,
  • Export computed node data for reuse in another program.

There is no deliverable for this lab session, but you are assumed to know the tool afterwards. Practice more at home if needed.


Instructions for today: Lab session on Gephi.
The following steps and tutorials will help you get to know Gephi.

  1. Install and run Gephi locally on your computer. Installation instructions, if needed, can be found on the Gephi website.
    You will need Java JRE installed. (If you use the Windows machines in the Snellius computer rooms, install Gephi in D:\ and ignore registry key warnings). An older version may be preinstalled in Linux.
  2. Walk through the Gephi tutorials. The Gephi website has various official tutorials that you can walk through; in particular Quick start, Visualization and Layouts. However, you may benefit from using a more modern tutorial (courtesy of Derek Geene at UCD).
    Note: One reason to use a more modern tutorial is that the official Gephi Quick Start Guide is for a slightly older version of Gephi. On p. 9-10, contrary to what is indicated in the getting started guide, the ranking module is part of the appearance module (since Gephi 0.9), and does not appear as a separate tab. The “Ranking result” table no longer exists, and you can skip that part of the tutorial. In the tutorial on Visualization, on p. 9 the 3-D visualisation option was bugged and has been removed in Gephi 0.9. You can skip that part of the tutorial.

  3. Compute relevant real-world network dataset properties. Try out some of the Gephi datasets and see if you can understand how they differ in terms of density, degree distribution, giant components, clustering and average distance.

  4. Walk through a tutorial on data import and export. Learn how to import raw data from this short data import tutorial. Visualize the network represented by the edge list small-gephiready.tsv. Compute the degree for each node and export it to a node list file (which you can for example reuse in other plotting tools).

  5. Visualization and exporting as vector graphic. Play around with some different visualization algorithms such as ForceAtlas 2 (for example, with scaling set to a higher value and with stronger gravity checked) and Fruchterman Reingold. Try to visualize the network such that labels remain readable.
    Hints: From a random initialization, use the "ForceAtlas 2" visualization algorithm perhaps with "Stronger gravity" and usually with "Scaling" set to a much higher than default value. Increase the overall node size if needed. Once stable, you can improve the layout by enabling parameter "Prevent overlap". Run "Expansion" to make some more space between the nodes, then enable the display of node labels, properly size them, and finally run "Label Adjust" to prevent label overlap.
    Export your visualization to PDF so that you have it as a vector graphic.

  6. Create, import and export your own social network data. Make a new, very simple, edge list input file of a fictive network with say 10 nodes and 20 edges, and import it into Gephi. Now add to the edge input file (in some order) link (un)direction, link weights and link labels.
    Also import a node list file with node labels and other properties. Be sure to append it to your existing workspace and to use identifiers.
    For example, have nodes represent students, and link represent friendships, and choose node and edge labels accordingly. Make sure that you are able understand the difference in file input and visualization output, and that you are comfortable with importing and exporting data into and from Gephi.

Done? Get started with the practical part of Assignment 1. You can download the smaller datafiles medium.tsv and large.tsv. If you want to analyze huge.tsv, you will have to get it from the shared folder in the ISSC Linux or LIACS DS lab environment, as stated in the assignment.


Course project

Teams work on a course project for 60% of the course grade. The project is about a certain topic (see list below) related to social network analysis, and the project consists of:

  • Giving a 20 minute presentation (+10 minutes for questions) of a paper corresponding to the topic. At least a Powerpoint/PDF presentation and if already possible some demonstration of an implementation or visualization has to be given. Teams are also expected to provide feedback on some of the presentations given by their fellow students during the lectures.
    Presentation pre-check: it is highly recommended to gather feedback on a draft version of the slides of your presentation the Tuesday before your presentation, between 15:30 and 17:00 in room 126 (Hanjo) or 157b (Frank; who might sometimes be available from 16:00 onwards only).
  • Making a small contribution, i.e., doing something new compared to the paper on which the project is based. For example, a new tweak to an existing algorithm, a large number of datasets to test the algorithms to find a relation between the network characteristics and the performance, a new performance metric to evaluatie the algorithms, a new type of visualization of the algorithm or results, an improvement of a proof related to the algorithm, etc. In case of doubt about the contribution, contact the course staff well in advance; you can always ask for help.
  • Gathering and implementing the algorithms and/or techniques from the different papers, and running experiments on at least five large real-world network datasets. Teams will also give feedback on the code produced by other teams in the code review session. Some papers introduce multiple techniques. In that case, choose a logical subset to compare, and motivate your choice.
    Datasets can for example be found at SNAP, KONECT, BigDND, Networks Repository, ICON and Netzschleuder
    . Certain topics need particular datasets (e.g., with timestamps, signed links, etc.), which should of course be taken into account when selecting datasets.
  • Writing one 6 to 10 page paper. In the paper the different techniques are analyzed and compared in detail using extensive experiments. The paper, to be written in LaTeX, has to follow the format of an actual scientific paper. Students will also give feedback on the paper produced by another team in the peer review session. There is a mandatory SNACS course paper template.

Topics

This list of project topics is shown below. Please choose a topic (so, choose a number from 1 to 12) with your team (consisting of two students). Register your choice in Brightspace. Topics can be chosen up to 4 or 5 times; we will split the group into various parallel tracks.
A next step (somewhere around week 6) is that you choose which paper on the chosen topic you present.
If you are retaking the course because you failed the project last year, you cannot choose the same topic as last year.


  1. Anomaly detection:
  2. Anonymity in networks:
  3. Centrality estimation:
  4. Community detection:
  5. Core/periphery structure:
  6. Graph compression:
  7. Influence spread and virality:
  8. Link prediction:
  9. Network motifs:
  10. Network embeddings:
  11. Sampling from networks:
  12. Shortest paths:
  13. Temporal network analysis:
  14. Visualization algorithms:

Note: scientific papers (ACM, Elsevier, etc.) can often only be opened from within the university domain (or from home via university SSH/Citrix/VPN/etc.). IEEE Explore papers can often be opened by looking them up via computer.org. Alternative links and preprints of papers can often be found through Google Scholar by searching for "Title of the paper". Contact course staff if you have tried all of these options and are still not able to access the paper (do not pay!).


Reading material

Some students have expressed interest in additional reading material to help freshen up on skills and knowledge required for this course.


Course description

See the Studyguide / prospectus for a more general description.

Topics include: SNA from a CS perspective (graph representation, complexity issues, examples), Graph Structure (power law, small world phenomenon, clustering coefficient, hierarchies), Paths and Distances (neighborhoods, radius, diameter), Spidering and Sampling (BFS, forest fire, random walks), Graph Compression (graph grammars, bitwise tricks, encryption, hashing), Centrality (degree centrality, closeness centrality, betweenness centrality, rating and ranking), Centrality and Webgraphs (HITS, PageRank, structure of the web), Community Detection (spectral clustering, modularity), Visualization (force-based algorithms, Gephi), Graph Models (random graphs, preferential attachment), Link Prediction (structure, semantics, prediction algorithms, graph mining), Contagion (diffusion of information, spreading activation, gossipping) and Privacy and Anonymity ((de-)anonymizing graphs, ethical aspects, privacy issues) and various other topics that have been added over the years but are not yet in the list above.

The course was also given in 2014, 2015, 2016, 2017, 2018, 2019, 2020, 2021 and 2022.