An implementation of the PageRank algorithm written in C.
  • C 89.3%
  • Gnuplot 5.9%
  • Makefile 2.8%
  • Shell 2%
Find a file
2022-02-21 19:47:31 +01:00
datasets renamed a dataset 2022-02-21 19:46:24 +01:00
.gitignore updated .gitignore 2022-02-14 11:16:26 +01:00
benchmark.c updated benchmark output format 2022-02-14 14:32:15 +01:00
benchmark.h first commit 2022-02-13 23:36:03 +01:00
gnuplot_script.gnu updated plotting sript 2022-02-21 19:47:07 +01:00
main.c added "the" 2022-02-21 19:47:31 +01:00
Makefile added optimization flag 2022-02-14 11:18:36 +01:00
pagerank.c updated benchmark output format 2022-02-14 14:32:15 +01:00
pagerank.h updated benchmark output format 2022-02-14 14:32:15 +01:00
README.md updated Readme and added french version 2022-02-21 19:43:52 +01:00
README_FR.md updated Readme and added french version 2022-02-21 19:43:52 +01:00
run_script.sh updated gnuplot script 2022-02-14 14:32:31 +01:00
structures.h updated benchmark output format 2022-02-14 14:32:15 +01:00
utils.c first commit 2022-02-13 23:36:03 +01:00
utils.h first commit 2022-02-13 23:36:03 +01:00

PageRank Implementation in C

Description

C implementation of the PageRank algorithm. Used as an individual practical work for the High Performance Computing course held at UVSQ-ISTY, engineering's degree in Computer Science.

Dataset format

This program will run on any dataset that respects the following dataset format:

# commented line
NODE_1[TAB]NODE_2
NODE_1[TAB]NODE_2
NODE_1[TAB]NODE_2
NODE_1[TAB]NODE_2

NODE_x: should be an integer respecting the datatype uint (unsigned integer) of 4 bytes.

Benchmark

The benchmark is done by running the pagerank algorithm on multiple dumping factor values ranging from 0 to 1 with a step equals to 0.025.
The performance critrea is the time taken for each run. The performance written on the output file is the average of 10 (MAX_AVG_ITER) runs.

Benchmark output data format

The benchmark output data file will have the following format (this is sample data):

# damping; time; iters; node; prob
0.000; 0.00000080; 1; 6; 0.142857
0.025; 0.00000110; 4; 1; 0.145243
0.050; 0.00000160; 4; 1; 0.147638
0.075; 0.00000180; 5; 1; 0.150042
. . .
0.925; 0.00003570; 149; 5; 0.224650
0.950; 0.00005390; 226; 5; 0.243999
0.975; 0.00010860; 457; 5; 0.287968
1.000; 0.00011870; 500; 6; 0.525783

The 1st value describes: the damping factor chosen for the run.
The 2nd value describes: the time took for the algorithm to find the solution or reachs it's maximum iterations.
The 3rd value describes: the numbers of iterations took to converge or the maximum if it didn't.
The 4th value describes: the most important node found.
The 5th value describes: the highest probability found (equivalent to the most important node).

Code

utility scripts

  • gnuplot_script.gnu: a gnuplot script to plot the performance graph.
  • run_script.sh: a bash script that installs the gnuplot library if it's not installed, compiles the project, runs it using the input filename (passed as an argument) and creates the performance graph (plot.png).

main program

There are 3 different compilation versions:

  1. default (used to have a simple summary output)
  2. debug (used to verbose debugging outputs)
  3. benchmark (used to run a benchmark)

Compilation & Execution

Compilation

  • default version: make clean && make
  • debug version: make clean && make debug
  • benchmark version: make clean && make benchmark

Execution

  • default version: ./program [filename] [max_iter] [damping_factor]
  • debug version: ./program [filename] [max_iter] [damping_factor]
  • benchmark version: ./benchmark [filename]

Extra

You are able to run the plotting script using the gnuplot script on a data file respecting the benchmark data output format: gnuplot gnuplot_script.gnu