docs:rtlib:rtlibopts


The BBQUE_RTLIB_OPTS variable allows to activate a number of features which you will find very helpful during the application characterization.

The Unmanaged Mode

Especially during the applications profiling, running an integrated application bypassing the Barbeque scheduler can be a very efficient practice. For example, an application could be executed on a custom CPUs set, with a custom CPU quota or memory bandwidth. The Application Working Mode can also be explicitly selected.


The Unmanaged mode can be activated just by exporting the BBQUE_RTLIB_OPTS variable, with the flag U. Let's execute a real example, running the bbque-testapp application.

  1. To be able to run the testapp just by typing bbque-testapp, you need to source the BOSPShell.
  2. The Unamanged mode allows to execute Barbeque-integrated applications even when the daemon is not running. So, starting the daemon is not mandatory. This is generally true, except when using the C flag, which be soon introduced.

[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS='U'
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2
16:33:17,450 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
16:33:17,450 - INFO   rtlib_testapp   : Built: Jun 12 2014 13:30:57
16:33:17,450 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
16:34:28,756 - WARN   rpc             : Enabling UNMANAGED mode, selected AWM [0]
16:34:28,756 - WARN   rpc             : Running in UNMANAGED MODE
16:33:17,452 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
16:33:17,452 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
16:33:17,452 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
16:33:23,962 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...

As you can see in the log, when the U flag is exported the application will automatically configure as described by the AWM 0. That means that the onConfigure() method will receive the argument awm_id=0 and will behave accordingly. If you want to select a different AWM, just append the ID after the flag. For example, to select the AWM with ID=3, just exploit the U3 flag.

[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS="U3"
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2
16:36:25,027 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
16:36:25,027 - INFO   rtlib_testapp   : Built: Jun 12 2014 13:30:57
16:36:25,027 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
16:36:25,027 - WARN   rpc             : Enabling UNMANAGED mode, selected AWM [3]
16:36:25,027 - WARN   rpc             : Running in UNMANAGED MODE
16:36:25,028 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
16:36:25,028 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
16:36:25,028 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
16:36:33,024 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...

This feature can also be very interesting if you want to run the application with all the existent AWMs, maybe to test the application under certain environmental conditions, or to further profile your existent AWMs. In this case, we could run the application with an AWM rangin from 0 to 3.

[BOSPShell BOSP] \> for AWM in 0 1 2 3;do
> export BBQUE_RTLIB_OPTS="U$AWM"
> bbque-testapp -c 8 -w 2
> done

The Unmanaged Mode with CGroups control

This feature allows to manually tune the cgroup which will host the application. This is very helpful during application profiling because the scheduler bypassing lets the application to run on all the CPU of your device. So, you will want to manually perform some of the scheduler job and enforce the allocation. If you don't need this support, the best practice is to – at least – place the application in the MDEV, isolating your application from the non-integrated ones. To do this, start the daemon and manually place yourself in the MDEV writing your own PID in the res cgroup, in the tasks interface.

  1. You need to be root to write in the cgroup interfaces
  2. If you need to write in the res mount point, the daemon has to be active. Otherwise, res would not be mounted
  3. You could verify, for example with htop, that the application is running on the managed device. However, it is recommended to try this example with other integrated applications, because the bbque-testapp doesn't execute code (it just sleeps), resulting in near-zero CPU utilization.

[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS='U'
[BOSPShell BOSP] \> echo $$ > out/mnt/cgroup/bbque/res/tasks
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2

If you need more control on the cgroups, you will need to exploit the C flag. Essentially, this feature allows to create a custom mount point specifying its main characteristics. As discussed in this tutorial (have your read it?), the most important cgroup interfaces are:

  • cpuset.cpus: the list of CPUs exploitable by the tasks running in this cgroup
  • cpuset.mems: the memory nodes exploitable by the tasks running in this cgroup
  • cpu.cfs_period_us / cpu.cfs_quota_us: the maximum CPU quota, expressed in QUOTA microseconds every PERIOD microseconds
  • memory.limit_in_bytes: the maximum amount of memory exploitable by the tasks running in this cgroup

The usage of the C flag is quite easy: you have to set these attributes in the following order:

  1. cpuset.cpus
  2. cpu.cfs_period_us
  3. cpu.cfs_quota_us
  4. cpuset.mems
  5. memory.limit_in_bytes

For example, let's run the bbque-testapp on the CPUs 2 and 6, with at most 50% CPU quota and 10MB of memory usage.

  1. Are you root?
  2. Did you source the BOSPShell after you gained root privileges?
  3. Is your BPL file well suited for this example? Consider selecting different CPUs from your MDEV
  4. Again, if you want to check the execution with htop or similar tools, consider running the example with a real application

[BOSPShell BOSP] \> bbque-startd
[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS="U:C 2,6 100000 50000 0 10485760"
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2
17:37:55,723 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
17:37:55,723 - INFO   rtlib_testapp   : Built: Jun 12 2014 13:30:57
17:37:55,723 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
17:37:55,723 - WARN   rpc             : Enabling UNMANAGED mode, selected AWM [0]
17:37:55,723 - WARN   rpc             : Enabling CGroup FORCING mode
17:37:55,723 - WARN   rpc             : Running in UNMANAGED MODE
17:37:55,726 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
17:37:55,726 - NOTICE rpc             : Setup CGroup [/bbque/25805:bbque-testapp]...
17:37:55,726 - NOTICE rpc             : Create kernel CGroup [/bbque/25805:bbque-testapp]
17:37:55,728 - NOTICE rpc             : Forcing EXC [0] into CGroup [/bbque/25805:bbque-testapp]:
17:37:55,728 - NOTICE rpc             :    cpuset.cpus............. 2,6
17:37:55,728 - NOTICE rpc             :    cpuset.mems............. 0
17:37:55,728 - NOTICE rpc             :    cpu.cfs_period_us....... 100000
17:37:55,728 - NOTICE rpc             :    cpu.cfs_quota_us........ 50000
17:37:55,728 - NOTICE rpc             :    memory.limit_in_bytes... 10485760
17:37:55,731 - WARN   exc             : New TestWorkload with: cycle time 250[ms], cycles count 8
17:37:55,731 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
17:37:55,731 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
17:37:57,726 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...

Custom application duration

The D flag allows to stop an application execution after a certain amount of time or cycles. Specializing the flag, in fact, you can choose to stop the execution:

  1. when reaching a certain cycle number (flag Dc)
  2. when a cycle is finished and the total execution time has surpassed a threshold expressed in seconds (flag Ds).

Let's try this feature executing the bbque-testapp. The application will be run three times. The first time, it will be stopped after 5 cycles, the second time it will be stopped after 2 seconds of execution, the third time it will be stopped after 5 cycles or 2 seconds of execution.

  1. Dc specifies the number . This can be doneof the cycle at the beginning of which the application will be stopped. Thus, specifying 5 cycles means executing 4 full cycles.
  2. Ds doesn't stop the applications while they are executing. After passing the time threshold, the application will be stopped upon reaching the end of its current cycle. Thus, the application won't be stopped exactly after 2 seconds.

[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS='Dc5'
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2
17:55:01,286 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
17:55:01,286 - INFO   rtlib_testapp   : Built: Jun 12 2014 13:30:57
17:55:01,286 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
17:55:01,286 - WARN   rpc             : Enabling DURATION timeout 5 [cycles]
17:55:01,288 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
17:55:01,288 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
17:55:01,288 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
17:55:05,543 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...
17:55:05,543 - NOTICE rpc             : Execution statistics:
 
Cumulative execution stats for 'exc_00':
  TotCycles    :       4
  StartLatency :     506 [ms]
  AwmWait      :     506 [ms]
  Configure    :       0 [ms]
  Process      :    2996 [ms]
[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS='Ds2'
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2
17:57:13,591 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
17:57:13,592 - INFO   rtlib_testapp   : Built: Jun 12 2014 13:30:57
17:57:13,592 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
17:57:13,592 - WARN   rpc             : Enabling DURATION timeout 2 [s]
17:57:13,595 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
17:57:13,596 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
17:57:13,596 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
17:57:17,100 - WARN   exc             : Application termination due to DURATION ENFORCING
17:57:17,102 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...
17:57:17,102 - NOTICE rpc             : Execution statistics:
 
Cumulative execution stats for 'exc_00':
  TotCycles    :       3
  StartLatency :     506 [ms]
  AwmWait      :     506 [ms]
  Configure    :       0 [ms]
  Process      :    2247 [ms]
[BOSPShell BOSP] \> export BBQUE_RTLIB_OPTS='Dc5:Ds2'
[BOSPShell BOSP] \> bbque-testapp -c 8 -w 2
18:05:38,988 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
18:05:38,988 - INFO   rtlib_testapp   : Built: Jun 12 2014 13:30:57
18:05:38,989 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
18:05:38,989 - WARN   rpc             : Enabling DURATION timeout 5 [cycles]
18:05:38,989 - WARN   rpc             : Enabling DURATION timeout 2 [s]
18:05:38,991 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
18:05:38,991 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
18:05:38,992 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
18:05:42,501 - WARN   exc             : Application termination due to DURATION ENFORCING
18:05:42,502 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...
18:05:42,503 - NOTICE rpc             : Execution statistics:
 
Cumulative execution stats for 'exc_00':
  TotCycles    :       3
  StartLatency :     511 [ms]
  AwmWait      :     511 [ms]
  Configure    :       0 [ms]
  Process      :    2247 [ms]

Application statistics

A set of statistics is dumped after the applications execution without the need of exporting the BBQUE_RTLIB_OPTS variable. If you run the previous examples, you surely noticed the Execution statistics.

Cumulative execution stats for 'exc_00':
  TotCycles    :       3
  StartLatency :     511 [ms]
  AwmWait      :     511 [ms]
  Configure    :       0 [ms]
  Process      :    2247 [ms]

# EXC    AWM   Uses Cycles   Total |      Min      Max |      Avg      Var
#==================================+===================+==================
  exc_00 002      1      3    2247 |  749.303  749.360 |  749.333    0.001
#-------------------------+        +-------------------+------------------
  exc_00 002         onRun    2247 |  749.264  749.296 |  749.287    0.000
  exc_00 002     onMonitor       0 |    0.038    0.064 |    0.045    0.000
#-------------------------+--------+-------------------+------------------
  exc_00 002   onConfigure       0 |    0.466    0.466 |    0.466    0.000
  • TotCycles are the total executed cycles
  • StartLatency / AwmWait is the time elapsed from the application invocation to its first cycle execution. It comprehends mainly cgroup creation, recipe parsing and schedule choice. The cgroup creation is by far the most heavy contribution in terms of elapsed time (more than 200 ms). Usually, real applications have a start latency around 210-250 ms.
  • Configure is the time spent in the onConfigure() method
  • Process is the time spent in the onRun() method

Performance counters support

Let's give a look to some advanced profiling feature. The p flag enables performance counters sampling for your applications. There are four incremental levels of completeness, each adding more counters to the sampling list. Therefore, try exporting the flags p, p1, p2, p3.

[BOSPShell barbeque] \> export BBQUE_RTLIB_OPTS="p2"
[BOSPShell barbeque] \> bbque-testapp -c 8 -w 2
11:21:57,766 - INFO   rtlib_testapp   : Barbeque RTLib TestApp (ver. HEAD-HASH-NOTFOUND) ::
11:21:57,766 - INFO   rtlib_testapp   : Built: Jun 23 2014 11:03:25
11:21:57,766 - INFO   rtlib_testapp   : STEP 0. Initializing RTLib library, application [bbque-testapp]...
11:21:57,766 - NOTICE rpc             : Enabling Perf Counters [verbosity: 2]
11:21:57,768 - INFO   rtlib_testapp   : STEP 1. Registering [001] EXCs, using recipe [BbqRTLibTestApp]...
11:21:57,768 - INFO   rtlib_testapp   : STEP 3. Starting [001] EXCs control threads...
11:21:57,768 - INFO   rtlib_testapp   : STEP 4. Running [001] control threads...
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [05:1:01]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [06:1:03]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [07:1:04]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [08:1:02]
 
[...]
 
11:22:04,279 - NOTICE rpc             : Execution statistics:
 
 
Cumulative execution stats for 'exc_00':
  TotCycles    :       8
  StartLatency :     511 [ms]
  AwmWait      :     511 [ms]
  Configure    :       0 [ms]
  Process      :    5992 [ms]
 
# EXC    AWM   Uses Cycles   Total |      Min      Max |      Avg      Var
#==================================+===================+==================
  exc_00 002      1      8    5992 |  749.318  749.621 |  749.511    0.008
#-------------------------+        +-------------------+------------------
  exc_00 002         onRun    5992 |  749.304  749.568 |  749.466    0.007
  exc_00 002     onMonitor       0 |    0.014    0.053 |    0.044    0.000
#-------------------------+--------+-------------------+------------------
  exc_00 002   onConfigure       0 |    0.262    0.262 |    0.262    0.000
 
Perf counters stats for 'exc_00-2' (8 cycles):
 
                  0 L1-icache-loads           #    0.000 M/sec                    ( +-  0.00% )
           0.199142 task-clock                #    0.000 CPUs utilized            ( +- 25.68% )
                  0 context-switches          #    0.000 M/sec                    ( +-  0.00% )
                  0 CPU-migrations            #    0.000 M/sec                    ( +-  0.00% )
                  0 page-faults               #    0.000 M/sec                    ( +-  0.00% )
             136706 cycles                    #    0.686 GHz                      ( +- 20.94% )
             112249 stalled-cycles-frontend   #  563.664 M/sec                    ( +- 24.61% )
             100733 stalled-cycles-backend    #  505.835 M/sec                    ( +- 26.31% )
              41892 instructions              #    0.31  insns per cycle        
                                             #    2.68  stalled cycles per insn  ( +-  1.56% )
               9038 branches                  #   45.387 M/sec                    ( +-  1.23% )
                  0 branch-misses             #    0.00% of all branches          ( +-  0.00% ) [ 0.00%]
                  0 L1-dcache-loads           #    0.000 M/sec                    ( +-  0.00% ) [ 0.00%]
                  0 L1-dcache-load-misses      ( +-  0.00% ) [ 0.00%]
                  0 LLC-loads                 #    0.000 M/sec                    ( +-  0.00% ) [ 0.00%]
                  0 L1-icache-load-misses      ( +-  0.00% ) [ 0.00%]
                  0 dTLB-loads                #    0.000 M/sec                    ( +-  0.00% ) [ 0.00%]
                  0 dTLB-load-misses           ( +-  0.00% ) [ 0.00%]
                  0 iTLB-loads                #    0.000 M/sec                    ( +-  0.00% ) [ 0.00%]
                  0 iTLB-load-misses           ( +-  0.00% ) [ 0.00%]
 
         749.510718 cycle time [ms]                                          ( +-  0.01% )
 
11:22:04,281 - INFO   rtlib_testapp   : ===== RTLibTestApp DONE! =====
Formatted perf output

To easily study the results, you can dump all the counters in a more convenient format. To do this, just exploit the M flag.

Historical curiosities
M stands for MOST, which is a tool we exploited several times to perform Design Space Exploration and that drove us to develop this feature. However, any type of analysis script would benefit from this kind of results format.

[BOSPShell barbeque] \> export BBQUE_RTLIB_OPTS="p2:M"
[BOSPShell barbeque] \> bbque-testapp -c 8 -w 2
 
[...]
 
11:28:12,824 - INFO   rtlib_testapp   : STEP 5. Disabling [001] EXCs...
11:28:12,824 - NOTICE rpc             : Execution statistics:
 
 
.:: MOST statistics for AWM [exc_00:02]:
@exc_00:02:perf:cycles_cnt=8@
@exc_00:02:perf:cycles_min_ms=749.463@
@exc_00:02:perf:cycles_max_ms=749.570@
@exc_00:02:perf:cycles_avg_ms=749.521@
@exc_00:02:perf:cycles_std_ms=0.039@
@exc_00:02:perf:monitor_cnt=8@
@exc_00:02:perf:monitor_min_ms=0.040@
@exc_00:02:perf:monitor_max_ms=0.047@
@exc_00:02:perf:monitor_avg_ms=0.043@
@exc_00:02:perf:monitor_std_ms=0.003@
@exc_00:02:perf:configure_cnt=1@
@exc_00:02:perf:configure_min_ms=0.310@
@exc_00:02:perf:configure_max_ms=0.310@
@exc_00:02:perf:configure_avg_ms=0.310@
@exc_00:02:perf:configure_std_ms=0.000@
@exc_00:02:perf:L1-icache-loads=0@
@exc_00:02:perf:L1-icache-loads_pct=0.00@
@exc_00:02:perf:L1-icache-loads_pcu=-nan@
@exc_00:02:perf:task-clock=0.212880@
@exc_00:02:perf:cpu_utiliz=0.000@
@exc_00:02:perf:task-clock_pct=15.64@
@exc_00:02:perf:task-clock_pcu=100.00@
@exc_00:02:perf:context-switches=0@
@exc_00:02:perf:context-switches_pct=264.58@
@exc_00:02:perf:context-switches_pcu=100.00@
@exc_00:02:perf:CPU-migrations=0@
@exc_00:02:perf:CPU-migrations_pct=0.00@
@exc_00:02:perf:CPU-migrations_pcu=100.00@
@exc_00:02:perf:page-faults=0@
@exc_00:02:perf:page-faults_pct=0.00@
@exc_00:02:perf:page-faults_pcu=100.00@
@exc_00:02:perf:cycles=130843@
@exc_00:02:perf:ghz=0.615@
@exc_00:02:perf:cycles_pct=20.85@
@exc_00:02:perf:cycles_pcu=100.00@
@exc_00:02:perf:stalled-cycles-frontend=105230@
@exc_00:02:perf:stalled-cycles-frontend_pct=24.31@
@exc_00:02:perf:stalled-cycles-frontend_pcu=100.00@
@exc_00:02:perf:stalled-cycles-backend=93469@
@exc_00:02:perf:stalled-cycles-backend_pct=26.06@
@exc_00:02:perf:stalled-cycles-backend_pcu=100.00@
@exc_00:02:perf:instructions=43185@
@exc_00:02:perf:ipc=0.33@
@exc_00:02:perf:stall_cycles_per_inst=43185@
@exc_00:02:perf:instructions_pct=7.20@
@exc_00:02:perf:instructions_pcu=100.00@
@exc_00:02:perf:branches=9126@
@exc_00:02:perf:branches_pct=2.07@
@exc_00:02:perf:branches_pcu=100.00@
@exc_00:02:perf:branch-misses=0@
@exc_00:02:perf:branch-misses_pct=0.00@
@exc_00:02:perf:branch-misses_pcu=0.00@
@exc_00:02:perf:L1-dcache-loads=0@
@exc_00:02:perf:L1-dcache-loads_pct=0.00@
@exc_00:02:perf:L1-dcache-loads_pcu=0.00@
@exc_00:02:perf:L1-dcache-load-misses=0@
@exc_00:02:perf:L1-dcache-load-misses_pct=0.00@
@exc_00:02:perf:L1-dcache-load-misses_pcu=0.00@
@exc_00:02:perf:LLC-loads=0@
@exc_00:02:perf:LLC-loads_pct=0.00@
@exc_00:02:perf:LLC-loads_pcu=0.00@
@exc_00:02:perf:L1-icache-load-misses=0@
@exc_00:02:perf:L1-icache-load-misses_pct=0.00@
@exc_00:02:perf:L1-icache-load-misses_pcu=0.00@
@exc_00:02:perf:dTLB-loads=0@
@exc_00:02:perf:dTLB-loads_pct=0.00@
@exc_00:02:perf:dTLB-loads_pcu=0.00@
@exc_00:02:perf:dTLB-load-misses=0@
@exc_00:02:perf:dTLB-load-misses_pct=0.00@
@exc_00:02:perf:dTLB-load-misses_pcu=0.00@
@exc_00:02:perf:iTLB-loads=0@
@exc_00:02:perf:iTLB-loads_pct=0.00@
@exc_00:02:perf:iTLB-loads_pcu=0.00@
@exc_00:02:perf:iTLB-load-misses=0@
@exc_00:02:perf:iTLB-load-misses_pct=0.00@
@exc_00:02:perf:iTLB-load-misses_pcu=0.00@
@exc_00:02:memory:cache=0@
@exc_00:02:memory:rss=16384@
@exc_00:02:memory:rss_huge=0@
@exc_00:02:memory:mapped_file=0@
@exc_00:02:memory:writeback=0@
@exc_00:02:memory:pgpgin=4@
@exc_00:02:memory:pgpgout=0@
@exc_00:02:memory:pgfault=12@
@exc_00:02:memory:pgmajfault=0@
@exc_00:02:memory:inactive_anon=0@
@exc_00:02:memory:active_anon=0@
@exc_00:02:memory:inactive_file=0@
@exc_00:02:memory:active_file=0@
@exc_00:02:memory:unevictable=0@
@exc_00:02:memory:hierarchical_memory_limit=10485760@
@exc_00:02:memory:total_cache=0@
@exc_00:02:memory:total_rss=16384@
@exc_00:02:memory:total_rss_huge=0@
@exc_00:02:memory:total_mapped_file=0@
@exc_00:02:memory:total_writeback=0@
@exc_00:02:memory:total_pgpgin=4@
@exc_00:02:memory:total_pgpgout=0@
@exc_00:02:memory:total_pgfault=12@
@exc_00:02:memory:total_pgmajfault=0@
@exc_00:02:memory:total_inactive_anon=0@
@exc_00:02:memory:total_active_anon=0@
@exc_00:02:memory:total_inactive_file=0@
@exc_00:02:memory:total_active_file=0@
@exc_00:02:memory:total_unevictable=0@
11:28:12,826 - INFO   rtlib_testapp   : ===== RTLibTestApp DONE! =====

Raw event counters support

Raw performance counters monitoring is also supported. This support can be activated by exploiting the r flag. The syntax can be described as follows:

rN, label_1-counter_1, label_2-counter_2, ..., label_N-counter_N


To sample a raw counter, you have to provide its hexadecimal parameter code. Mind that you could also exploit a unit mask to select sub-events.

Let's sample. for example, the number of L2 accesses (EV_COUNTER F0H). In this case, we want also a breakdown specifying:

  • Demand Data Read requests that access L2 cache (UMASK 01H)
  • RFO requests that access L2 cache (UMASK 02H)
  • L2 cache accesses when fetching instructions (UMASK 04H)
  • L2 or LLC HW prefetches that access L2 cache (UMASK 08H)
  • L1D writebacks that access L2 cache (UMASK 10H)
  • L2 fill requests that access L2 cache (UMASK 20H)
  • L2 writebacks that access L2 cache (UMASK 40H)
  • Transactions accessing L2 pipe (UMASK 80H)

Here a real application will be exploited, just to provide significant numbers. The application is Bodytrack from the PARSEC benchmark v2.1. You can run this application, too. Just make sure you have selected it in the menuconfig, along with the native data-set installation.

export BBQUE_RTLIB_OPTS="r8, l2ddr-01f0, l2rfo-02f0, l2if-04f0, l2pref-08f0, l1dwb-10f0, l2fr-20f0, l2wb-40f0, l2p-80f0"
[BOSPShell barbeque] \> bosp-parsec21-bodytrack 
PARSEC Benchmark Suite Version 2.1
12:12:21,095 - NOTICE rpc             : CGroup controller [cpuset] available at [/home/slibutti/opt/BOSP/out/mnt/cgroup]
12:12:21,095 - NOTICE rpc             : CGroup controller [cpu] available at [/home/slibutti/opt/BOSP/out/mnt/cgroup]
12:12:21,095 - NOTICE rpc             : CGroup controller [cpuacct] available at [/home/slibutti/opt/BOSP/out/mnt/cgroup]
12:12:21,095 - NOTICE rpc             : CGroup controller [memory] available at [/home/slibutti/opt/BOSP/out/mnt/cgroup]
 
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [06:4:496]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [07:4:752]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [08:4:1264]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [09:4:2288]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [10:4:4336]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [11:4:8432]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [12:4:16624]
*****                   - INFO   bq.rtlib.perf   : Added new PERF counter [13:4:33008]
 
12:12:21,470 - WARN   exc             : Using 2 threads
 
Set AWG Size to 2
12:13:03,320 - NOTICE rpc             : Execution statistics:
 
 
Cumulative execution stats for 'ps21_btrack':
  TotCycles    :     260
  StartLatency :     211 [ms]
  AwmWait      :     211 [ms]
  Configure    :      17 [ms]
  Process      :   41703 [ms]
 
# EXC    AWM   Uses Cycles   Total    |      Min      Max |      Avg      Var
#
=====================================+===================+==================
ps21_btrack 001      1    260   41703 |   68.358  716.712 |  160.892 7369.765
#-------------------------+           +-------------------+------------------
ps21_btrack 001         onRun   41703 |   68.357  716.710 |  160.891 7369.765
ps21_btrack 001     onMonitor       0 |    0.001    0.002 |    0.001    0.000
#-------------------------+-----------+-------------------+------------------
ps21_btrack 001   onConfigure      17 |   17.528   17.528 |   17.528    0.000
 
Perf counters stats for 'ps21_btrack-1' (260 cycles):
 
            1473205 raw 0x1f0                  ( +- 18.30% ) [53.86%]
             225353 raw 0x2f0                  ( +- 25.71% ) [54.01%]
             421636 raw 0x4f0                  ( +- 25.95% ) [54.57%]
             975746 raw 0x8f0                  ( +- 18.75% ) [54.27%]
             313085 raw 0x10f0                 ( +- 21.47% ) [54.35%]
             886052 raw 0x20f0                 ( +- 20.47% ) [53.96%]
             160015 raw 0x40f0                 ( +- 27.02% ) [53.78%]
            4521546 raw 0x80f0                 ( +- 17.30% ) [54.03%]
 
         160.892158 cycle time [ms]                                          ( +- 53.36% )

Return to the tutorials.

docs/rtlib/rtlibopts.txt · Last modified: 2015/11/17 13:47 by slibutti

Page Tools