Echtzeit und Nicht-Echtzeit Integration von MATLAB und ...€¦ · Echtzeit und Nicht-Echtzeit...

25
Echtzeit und Nicht-Echtzeit Integration von MATLAB und Simulink in das neue Robotik Software Framework aRDx des DLR Berthold Bäuml Autonomous Learning Robots Lab Institute of Robotics and Mechatronics German Aerospace Center (DLR) [email protected]

Transcript of Echtzeit und Nicht-Echtzeit Integration von MATLAB und ...€¦ · Echtzeit und Nicht-Echtzeit...

  • Echtzeit und Nicht-Echtzeit Integration von MATLAB und Simulink in das neue Robotik Software

    Framework aRDx des DLR

    Berthold Bäuml !

    Autonomous Learning Robots Lab Institute of Robotics and Mechatronics

    German Aerospace Center (DLR) [email protected]

    mailto:[email protected]

  • “Agile Justin” — An Advanced Robotic Systemsensing • stereo cameras (2MPixel/25Hz)
RGB-D sensor (0.5MPixel/33Hz) • torque sensor (all DOF, 1kHz)
tactile skin on hands (3000taxel/750Hz) •IMU (6D, 500Hz)


    acting • 53 DOF = 8 (plattform) + 19 (torso) + 26 (hands) • torque control over all DOF • 1kHz,

  • “Playing Balls”: Perception and Action at the Limits

  • “Mars Habitat”: Dexterous Mobile Manipulation

  • Planning and Object Recognition

  • Tactile Skin

  • camera

    IMU

    skin

    kinect

    pose filter

    comp. combine

    skin recognition

    decombine

    object recognition

    planner task scheduler

    sim. torso

    sim. mobile platform

    sim. hands

    motion coordinator

    fused 3d modeller

    balltracker

    Onboard Linux

    Onboard QNX Planning Cluster

    Object Recognition

    GPU Cluster

    camera

    comp. comp.

    comp.

    comp.

    drive iPad

    Software Architecture for Advanced Robotic Systems

  • • communicating components central abstraction in robotic frameworks • packet/message based communication • dynamic connecting and disconnecting of components • concurrent, parallel execution • distributed over network of computing resources • components decoupled

    • as modules for team development • as process for robustness at runtime

    • popular examples: ROS, YARP, OROCOS

    Software Architecture for Advanced Robotic Systems

  • domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    computermicrocontroller/

    FPGArealtime PC

    (QNX)CPU/GPGPU cluster

    (Linux)cluster, internet

    server cloud

    communi-cation

    hardware bus (SPI, I2C, …)

    hard realtime, distributed, QoS, up to 1kHz, 5MB

    “optimal” transport, distributed, QoS, ~10Hz, up to 1GB

    “fast” transport, distributed

  • ROS (Robot Operating System, www.ros.org)

    • quasi-standard in robotics • communication

    • no hard realtime • no QoS • not optimal (e.g., copy-once for inter-process) • nested structs and dynamic 1D arrays • but no multidimensional arrays and beyond (e.g., recursive types)

    • languages • C/C++ for low level, planning/modelling • Python as “high level” language

    • not compiled, no parallel execution • -> bad performance (50x slower than C) • -> lot of higher level stuff has to implemented in C/C++

    • bindings to high level languages (diverse Lisps and Prolog, …)

    http://www.ros.org

  • domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    computermicrocontroller/

    FPGArealtime PC

    (QNX)CPU/GPGPU cluster

    (Linux)cluster, internet

    server cloud

    communi-cation

    hardware bus (SPI, I2C, …)

    hard realtime, distributed, QoS, up to 1kHz, 5MB

    “optimal” transport, distributed, QoS, ~10Hz, up to 1GB

    “fast” transport, distributed

  • aRDx (Agile Robot Development “Next Generation”)• communication

    • no single stack can fulfill all demands -> two stacks! • highly performant hard realtime stack

    • minimal latency and jitter, detailed QoS • optimal transport (zero-copy for IPC, copy-once to each host) • nested structs and static multidimensional arrays

    • high level directly by modern programming language • automatic serialization of

    • arbitrary complex (recursice data types) • code (sending closures)

    • (amlost) optimal transport (copy-once to each VM on each host) • language

    • C/C++ for low level • Racket (http://racket-lang.org): modern language from Lisp family

    • functional, compiled, parallel (to some extent) • only 5x slower than C (compared to Python’s 50x slower than C) • “programmable programming language” -> OOP, DSLs, Prolog, …

    http://racket-lang.org

  • A Comparison of the Raw Communication Performance of Robotic Frameworks

    Ê Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù ÙÙ

    Ù

    Ù

    Ù

    Á Á ÁÁ

    Á

    Á

    Á

    Á

    ‡ ‡ ‡ ‡‡

    Ï Ï Ï Ï ÏÏ Ï ÏÚ Ú Ú Ú Ú Ú Ú Ú

    Ê aRDx Ù aRDÁ Orocos ‡ ROSÏ ROS HfixedL Ú YARP

    1 102 104 106 10810-6

    10-4

    10-2

    1

    packet size @byteD

    roun

    d-tri

    ptim

    e@sD

    process

    Ê Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù Ù

    Ù

    Ù

    Ù

    Ù

    Á Á Á ÁÁ

    Á

    Á

    Á

    ‡ ‡

    ‡ ‡

    Ú Ú Ú Ú ÚÚ

    Ú

    Ú

    10-3 110-4

    10-2‡ ‡

    ‡ ‡

    *

    pause @sD

    1 102 104 106 10810-6

    10-4

    10-2

    1

    packet size @byteD

    host

    Ê Ê ÊÊ

    Ê

    Ê

    Ê

    Ê

    Ù Ù ÙÙ

    Ù

    Ù

    Ù

    Ù

    Á Á ÁÁ

    Á

    Á

    Á

    Á

    ‡ ‡

    ‡‡

    Ú Ú ÚÚ

    Ú

    Ú

    Ú

    Ú

    1 102 104 106 10810-4

    10-3

    10-2

    10-1

    1

    10

    packet size @byteD

    distributed

    Ê Ê Ê Ê Ê Ê ÊÙ Ù Ù Ù

    Ù

    Ù

    Ù

    Á Á ÁÁ

    Á

    Á

    ‡ ‡ ‡ ‡ ‡

    Ï Ï Ï Ï Ï Ï Ï ÏÚ Ú Ú Ú Ú Ú Ú

    1 102 104 106 10810-6

    10-4

    10-2

    1

    packet size @byteD

    roun

    d-tri

    ptim

    e@sD

    process

    Ê Ê Ê Ê Ê Ê Ê

    Ù Ù Ù Ù

    Ù

    Ù

    Ù

    Á Á Á ÁÁ

    Á

    Á

    ‡ ‡

    ‡ ‡

    Ú Ú Ú Ú Ú

    Ú

    ÚÚ

    1 102 104 106 10810-6

    10-4

    10-2

    1

    packet size @byteD

    host

    Ê Ê Ê

    Ê

    Ê

    Ê

    Ê

    Ù Ù Ù

    Ù

    Ù

    Ù

    Ù

    Á Á Á ÁÁ

    Á

    Á

    ‡ ‡

    Ú Ú ÚÚ

    Ú

    Ú

    Ú

    1 102 104 106 10810-4

    10-3

    10-2

    10-1

    1

    10

    packet size @byteD

    distributed

    Fig. 5. Results of the stress test benchmark for 1 (top) and 20 (bottom) clients and for the three domains (columns). Each plot shows the mean round-triptime (averaged over some 100 runs) over the packet size for the various frameworks. Please, be aware of the log-log-scaling of the plots. The performance ofaRDx is almost always the best – most dramatically for the host domain where no other framework can provide zero-copy semantics. Only for small packetsizes (up to 1KB) where the transfer time is dominated by the constant overhead of a framework aRDx is beaten by aRD’s minimalistic implementationand in the 1-client case and large packets YARP is about 10% faster presumably due to a slightly more clever configuration of the TCP sockets. In the20-client case aRDx beats in the distributed domain all other frameworks by a factor of 2 because it has to transfer the packets sent from the master to theremote client only once and, hence, in each round of the test instead of 20+20 packets only 2+20 packets have to be transmitted over the GigE network.The increased constant overhead of aRDx for the host compared to the process domain is about 5x and, hence, close to the theoretically expected 6x dueto the indirect communication through the daemon. Interestingly, although aRDx needs a quite complex logic to provide zero-copy semantics in the hostdomain, its constant overhead is still 4x smaller than that of all other (except aRD) frameworks. In what follows we discuss some feature and quirks ofthe other frameworks we came about. All these frameworks scale very well and roughly linear with the number of clients. For the process domain YARPcan provide zero-copy semantics. In this domain ROS with its nodelets also was expected to show constant transfer times but could do so only after wefixed the implementation (labeled ROS fixed) – standard ROS (labeled ROS) completely initializes the memory of newly constructed packets, hence, thetransfer time has to scale with the packet size. For the host and the distributed domain YARP and ROS perform very similar as both communicate overTCP sockets (side note: for YARP, because of instabilities, we could use the potentially more efficient mutlicast and shared memory modes) . In caseof the host domain and large packets (> 1MB) they even reach almost the performance of the shared memory based transport of aRD showing that theLinux loopback sockets are very efficient. In all tests the performance of Orocos was worst, although we always tried the optimal parameters. We suspectthat this comes due to the additional abstraction layer with ACE/TAO in its communication stack. For ROS we found another severe quirk in the host anddistributed domain and packet sizes of 10KB to 100KB. There the round-trip time dramatically increases 100x. A further analysis (showed that this effectdisappears completely when adding a pause of at least 100ms between each round of the test (see the inset in the 1-client plot depicting the round-triptime over the pause time for 1KB packet) . This means, ROS is not really stress resistent.

    with 100 clients running in the kHz range. Even for thedistributed domain the worst-case round-trip latencies are nolonger than 500µs.

    IV. CONCLUSIONS

    We presented the design considerations and implementa-tion details of the new highly performant, realtime capable,minimalistic and simple communication layer of our aRDxsoftware framework. In an in-depth benchmarking on Linuxof the raw communication performance of aRDx and thepopular robotic software frameworks ROS, YARP, Orocosand aRD it was shown that aRDx performs excellent in bothextreme performance aspects, namely latency and bandwidth,and partially dramatically outperforms the other frameworks.In addition due to the ”stress” character of our tests we coulduncover a number of severe quirks in all other frameworks.

    Running on QNX, aRDx provides hard realtime perfor-mance even for distributed applications.

    aRDx is already successfuly in use on our advanced andcomplex humanoid robot Agile Justin. In future publicationswe will describe its other, high level parts, like the dynamicand flexible but less performant communication layer or theadvanced mechanisms for startup and shutdown of largedistributed applications.

    REFERENCES

    [1] PR2 - personal robot 2. [Online]. Available:http://www.willowgarage.com

    [2] M. Quigley, B. Gerkey, K. Conley, J. Faust, T. Foote, J. Leibs,E. Berger, R. Wheeler, and A. Ng, “Ros: an open-source robot oper-ating system,” in Proceedings of the Open-Source Software workshopat the International Conference on Robotics and Automation (ICRA),2009.

    [3] icub. [Online]. Available: http://www.icub.org

  • domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    computermicrocontroller/

    FPGArealtime PC

    (QNX)CPU/GPGPU cluster

    (Linux)cluster, internet

    server cloud

    communi-cation

    hardware bus (SPI, I2C, …)

    hard realtime, distributed, QoS, up to 1kHz, 5MB

    “optimal” transport, distributed, QoS, ~10Hz, up to 1GB

    “fast” transport, distributed

  • • Simulink, • Simulink Coder -> HIL (Hardware in the Loop)! • HDL Coder • Stateflow • Control System Toolbox

    • Signal Processing Toolbox • Image Processing Toolbox • Optimization Toolbox • Computer Vision System Toolbox • Symbolic Math Toolbox • SimMechanics, Sim…

    domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    system modeling, data analysis

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    MathWorks Toolchainhas much to offer for wide range of domains!

    • model-based design of complex controllers • automatic realtime (!) code generation -> HIL (Hardware-In-the-Loop) • signal and image processing • thorough numerics • data analysis & visualization • simulation of complex mechatronic systems • …

  • domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    system modeling, data analysis

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    computermicrocontroller/

    FPGArealtime PC

    (QNX)workstation

    (Linux)CPU/GPGPU cluster

    (Linux)cluster, internet

    server cloud

    communi-cation

    hardware bus (SPI, I2C, …)

    hard realtime, distributed, QoS, up to 1kHz, 5MB

    data logging “optimal” transport, distributed, QoS, ~10Hz, up to 1GB

    “fast” transport, distributed

    binding to aRDx

  • features • automatic S-Function block generation • packets with 1D-arrays of basic types


    (future: multi-arrays and nested structs)

    • supports interpreted Simulink and generated code on realtime target

    • channel specification in S-Function parameter

    Simulink - aRDx Binding

    #lang racket/base (require generator/simulink) (require robot-packet) !(ardx-simulink-bridge #:ardx->simulink robot_state_packet #:simulink->ardx robot_control_packet)

    robot-sfun.rkt

    #lang generator/idesc !(define-gstruct robot_state_packet ([status gint] [angles (garray gdouble 19)] [torques (garray gdouble 19)])) !(define-gstruct robot_control_packet ([torques (garray double 19)]))

    robot-packets.rktrobot-sfun.c robot-packets.h

    status

    angles

    torques

    1

    19

    19

    19

    torques

    robot-sfun

    racket mex

    Simulink Coder (+model)

    realtime executable

  • vision

    world model

    dev camera GUI

    EC path planning

    dev input

    device

    3D viewer

    Monitoring Configuration

    torso dev

    state machine

    arm dev

    hand dev

    Linux

    Linux Linux

    Linux

    VxWorks

    VxWorks

    Linux/ Windows

    Edit Compile Debug

    QNX

    REALTIME

    CPU 1 CPU 2

    Stateflow

    interpreted Simulink

  • domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    system modeling, data analysis

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    computermicrocontroller/

    FPGArealtime PC

    (QNX)workstation

    (Linux)CPU/GPGPU cluster

    (Linux)cluster, internet

    server cloud

    communi-cation

    hardware bus (SPI, I2C, …)

    hard realtime, distributed, QoS, up to 1kHz, 5MB

    data logging “optimal” transport, distributed, QoS, ~10Hz, up to 1GB

    “fast” transport, distributed

    binding to aRDx

  • Matlab - Racket Binding

    import racket.* require(‘my-module’) !v = to_racket([1 2 3 4], ‘vector’) sv = racket(‘scale’, 0.2, v) robots_id = racket(‘make-robots’, 5, sv) robots = from_racket(robots_id) q = s(3).arm.joint_angle(2)

    command

    result

    aRDx channels

    RAM-Disk MAT-Files

    Matlab

    Racket!VM

    features • require (load) any Racket module • obeys Matlab’s lexical scope • call any racket function or macro • transparent conversion of basic types • references to any Racket objects • explicit conversion to/from Racket for

    arbitrarily nested arrays, structs implementation

    • Racket process for each Matlab process • command/result (basic) by aRDx channels • complex Matlab data by MAT-Files

    (define (scale s v) (vector-map (lambda (e) (* e s) v) !(define (make-robots num v) (for/vector ([i num]) (dot (Robot) ‘arm ‘joint-angle v)))

    my-module.rkt

  • domainlow level driver, joint controller

    robot controller, sensor

    preprocessing

    system modeling, data analysis

    world modeling, path planning

    “AI” (logic planner, cognitive model)

    computermicrocontroller/

    FPGArealtime PC

    (QNX)workstation

    (Linux)CPU/GPGPU cluster

    (Linux)cluster, internet

    server cloud

    communi-cation

    hardware bus (SPI, I2C, …)

    hard realtime, distributed, QoS, up to 1kHz, 5MB

    data logging “optimal” transport, distributed, QoS, ~10Hz, up to 1GB

    “fast” transport, distributed

    binding to aRDx

  • Example: Automatic Calibration of a Multisensorial Upper Body of a Humanoid Robot

  • Example: Automatic Calibration of a Multisensorial Upper Body of a Humanoid Robot

    • Simulink model for HIL control of full humanoid • Matlab script for flow control of the calibration

    • during recording (commanding robot movements, start/stop of recorders, ..)

    • batch processing of recorded data • Matlab for processing data and optimization

    • Image Processing Toolbox (marker detection) • Signal Processing Toolbox (noise filtering) • Matlab based MTK - Manifold ToolKit

    (openslam.org/MTK.html) (model fit)

    • Optimization Toolbox (finding optimal robot configurations)

    -> whole application can be implemented in Matlab/Simulink universe • fast development • dramatically less error-prone than, e.g., C/C++ • numerically rock-solid

    http://openslam.org/MTK.html

  • Conclusions

    • advanced robotic systems span wide range of software domains 
“from hardware driver to artificial intelligence”

    • DLR’s new aRDx software framework tackles this with • two communication stacks: static & realtime — flexible & fast • Racket as language

    • MathWorks Toolchain could cover large part of domains • tight binding of MathWorks Toolchain to aRDx allows use on advanced

    robotics systems (>50DOF, distributed computing, …)

    • Simulink & Coder to aRDx: hard realtime • Matlab to Racket: flexible, all functionality accessible

    • MathWorks Toolchain and aRDx perfect fit esp. in research: • rapid prototyping • even students can work with complex robots because they can often stay

    completely in the MathWorks universe (known from their studies)

  • "It shows off … capabilities of what's arguably … the most, capable dual-armed mobile humanoid robots in existence.” (IEEE Spectrum 2014)

    Best Video Award ICRA 2014

    “aRDx/Racket & MathWorks Tools

    inside”