README-DEVELOPERS

   1 # vim: tw=65
   2
   3 General help and instructions on writing code for Rubinius.
   4
   5
   6 0. Further Reading
   7 ==================
   8 At some point, you should read everything in doc/. It is not
   9 necessary to understand or memorise everything but it will
  10 help with the big picture at least!
  11
  12
  13 1. Files and Directories
  14 ========================
  15 Get to know your way around the place!
  16
  17 * .load_order.txt
  18   Explains the dependencies between files so the VM can load them
  19   in the correct order.
  20
  21 * kernel/
  22   The Ruby half of the implementation. The classes, methods etc.
  23   that make up the Ruby language environment are defined here.
  24   Further divided into..
  25
  26 * kernel/platform.conf
  27   kernel/platform/
  28   Platform-dependent code wrappers that can then be used in other
  29   kernel code. platform.conf is an autogenerated file that defines
  30   various platform-dependent constants, offsets etc.
  31
  32 * kernel/bootstrap/
  33   Minimal set of incomplete core classes that is used to load up
  34   the rest of the system. Any code that requires Rubinius' special
  35   abilities needs to be here too.
  36
  37 * kernel/core/
  38   Complete implementation of the core classes. Builds on and/or
  39   overrides bootstrap/. Theoretically this code should be portable
  40   so all Rubinius-dependent stuff such as primitives goes in
  41   bootstrap/ also.
  42
  43 * runtime/
  44   Contains run-time compiled files for Rubinius. You'll use these
  45   files when running shotgun/rubinius
  46
  47 * runtime/stable/*
  48   Known-good versions of the Ruby libraries that are used by the
  49   compiler to make sure you can recompile in case you break one
  50   of the core classes.
  51
  52 * shotgun/
  53   The C parts. This top-level directory contains most of the build
  54   process configuration as well as the very short main.c.
  55
  56 * shotgun/lib/
  57   All of the C code that implements the VM as well as the extremely
  58   bare-bones versions of some Ruby constructs.
  59
  60 * shotgun/external_libs/
  61   Libraries required by Rubinius, bundled for convenience.
  62
  63 * lib/
  64   All Ruby Stdlib libraries that are verified to work as well as
  65   any Rubinius-specific standard libraries. Of special interest
  66   here are three subdirectories:
  67
  68 * lib/bin/
  69   Some utility programs such as lib/bin/compile.rb which is used
  70   to compile files during the build process.
  71
  72 * lib/ext/
  73   C extensions that use Subtend.
  74
  75 * lib/compiler/
  76   This is the compiler (implemented completely in Ruby.)
  77
  78 * stdlib/
  79   This is the Ruby Stdlib, copied straight from the distribution.
  80   These libraries do not yet work on Rubinius (or have not been
  81   tried.) When a library is verified to work, it is copied to
  82   lib/ instead.
  83
  84 * bin/
  85   Various utility programs like bin/mspec and bin/ci.
  86
  87 * benchmark/
  88   All benchmarks live here. The rubinius/ subdirectory is not in
  89   any way Rubinius-only, all those benchmarks were just written
  90   as part of this project (the rest are from somewhere else.)
  91
  92 * spec/ and test/
  93   These contain the behaviour specification and verification files.
  94   See section 3 for information about specs. The test/ directory is
  95   deprecated but some old test code lives here.
  96
  97
  98 Notes: Occasionally working with kernel/ you may seem classes that
  99        are not completely defined or looks strange. Remember that
 100        some classes are set up in the VM and we are basically just
 101        reopening those classes.
 102
 103
 104 2. Working with Kernel classes
 105 ==============================
 106
 107 Any time you make a change here -- or anywhere else for that
 108 matter -- make sure you do a full rebuild to pick up the changes,
 109 then run the related specs, and then run bin/ci to make sure
 110 that also the *unrelated* specs still work (minimal-seeming
 111 changes may have broad consequences.)
 112
 113 There are a few special forms that are used in bootstrap/ as well
 114 as core/ such as @ivar_as_index@ (see 2.2) which maps instance
 115 variable names to internal fields. These impose special restrictions
 116 on their usage so it is best to follow the example of existing
 117 code when dealing with these. Broadly speaking, if something looks
 118 "unrubyish", there is probably a good reason for it so make sure
 119 to ask before doing any "cosmetic" changes -- and to run CI after.
 120
 121 If you modify a kernel class, you need to `rake build` after to
 122 have the changes picked up. With some exceptions, you should not
 123 regenerate the stable files. They will in most cases work just fine
 124 even without the newest code. `rake build:stable` is the command
 125 for that.
 126
 127 If you create a new file in one of the kernel subdirectories, it
 128 will be necessary to regenerate the .load_order.txt file in the
 129 equivalent runtime subdirectory in order to get your class loaded
 130 when Rubinius starts up. Use the rake task build:load_order to
 131 regenerate the .load_order.txt files.
 132
 133 Due to the dependencies inherent in writing the Core in Ruby, there
 134 is one idiom used that may confuse on first sight. Many methods are
 135 called #some_method_cv and the _cv stands for 'core version,' not
 136 one of the other things you thought it might be. The idea is that
 137 a simple version of a given method is used until everything is
 138 safely loaded, at which point it is replaced by the real version.
 139 This happens in WhateverClass.after_loaded (and it is NOT automated.)
 140
 141
 142 2.1 Safe Math Compiler Plugin
 143 -----------------------------
 144
 145 Since the core libraries are built of the same blocks as any other
 146 Ruby code and since Ruby is a dynamic language with open classes and
 147 late binding, it is possible to change fundamental classes like
 148 Fixnum in ways that violate the semantics that other classes depend
 149 on. For example, imagine we did the following:
 150
 151     class Fixnum
 152       def +(other)
 153         (self + other) % 5
 154       end
 155     end
 156
 157 While it is certainly possible to redefine fixed point arithmetic plus
 158 to be modulo 5, doing so will certainly cause some class like Array to
 159 be unable to calculate the correct length when it needs to. The dynamic
 160 nature of Ruby is one of its cherished features but it is also truly a
 161 double-edged sword in some respects.
 162
 163 In Stdlib, the 'mathn' library redefines Fixnum#/ in an unsafe and
 164 incompatible manner. The library aliases Fixnum#/ to Fixnum#quo,
 165 which returns a Float by default.
 166
 167 Because of this there is a special compiler plugin that emits a different
 168 method name when it encounters the #/ method. The compiler emits #divide
 169 instead of #/. The numeric classes Fixnum, Bignum, Float, and Numeric all
 170 define this method.
 171
 172 The `-frbx-safe-math` switch is used during the compilation of the Core
 173 libraries to enable the plugin. During regular 'user code' compilation,
 174 the plugin is not enabled. This enables us to support mathn without
 175 breaking the core libraries or forcing inconvenient practices.
 176
 177
 178 2.2 ivar_as_index
 179 -----------------
 180
 181 As described above, you'll see calls to @ivar_as_index@ kernel code.
 182 This maps the class's numbered fields to ivar names, but ONLY for
 183 that file.
 184
 185 You can NOT access those names using the @name syntax outside of that
 186 file. (Doing so will cause maddeningly odd behavior and errors.)
 187
 188 For instance, if you make a subclass of IO, you can NOT access @descriptor
 189 directly in your subclass. You must go through methods to access it only.
 190 Notably, you can NOT just use the @#attr_*@ methods for this. The methods
 191 must be completely written out so that the instance variable label can
 192 be picked up to be translated.
 193
 194
 195 2.3 Kernel- and user-land
 196 -------------------------
 197
 198 Rubinius is in many ways architected like an operating system, so some
 199 OS world terms may be easiest to describe the two modes that Rubinius
 200 operates under:
 201
 202 'Kernel-land' describes how code in kernel/ is executed. Everything else
 203 is 'user-land.'
 204
 205 Kernel-land has a number of restrictions to keep things sane and simple:
 206
 207 * #public, #private, #protected, #module_function require method names
 208   as arguments. The 0-argument version that allows toggling visibility
 209   in a class or module body is not available.
 210
 211 * Restricted use of executable code in class, module and script (file)
 212   bodies. @SOME_CONSTANT = :foo@ is perfectly fine, of course, but for
 213   example different 'memoizations' or other calculation should not be
 214   present. Code inside methods has no restrictions, broadly speaking,
 215   but keep dependency issues in mind for methods that may get called
 216   during the instantiation of the rest of the kernel code.
 217
 218 * @#after_loaded@ hooks can be used to perform more complex/extended
 219   setup or calculations for kernel classes. The @_cv@ methods mentioned
 220   above, for example, are replaced over the simpler bootstrap versions
 221   in the @#after_loaded@ hooks of the respective classes. @#after_loaded@
 222   is not magic, and will not be automatically called. If adding a new
 223   one, have kernel/loader.rb call it (at this point the system is
 224   fully up.)
 225
 226 * Kernel-land code does not use handle defining methods through
 227   @Module#__add_method__@ nor @MetaClass#attach_method@. It adds
 228   and attaches methods directly in the VM. This is necessary for
 229   bootstrapping.
 230
 231 * Any use of string-based eval in the kernel must go through discussion.
 232
 233
 234 3. Specs (Specifications)
 235 =========================
 236
 237 Probably the first or second thing you hear about Rubinius when speaking to
 238 any of the developers is a mention of The Specs. It is a crucial part of
 239 Rubinius.
 240
 241 Rubinius itself is being developed using the Behaviour-Driven Design
 242 approach (a refinement of Test-Driven Design) where each aspect of the
 243 behaviour of the code is first specified using the spec format and only then
 244 implemented to pass those specs.
 245
 246 In addition to this, we have undertaken the ambitious task of specifying the
 247 entirety of the Ruby language as well as its Core and Stdlib libraries in
 248 this format which both allows us to ensure our implementation is conformant
 249 with the Ruby standard and, more importantly, to actually *define* that
 250 standard since there currently is no formal specification of Ruby.
 251
 252 The de facto standard of BDD is set by "RSpec":http://rspec.info, the
 253 project conceived to implement the then-new way of coding. Their website is
 254 fairly useful as a tutorial as well, although the spec syntax (particularly
 255 as used in Rubinius) is not very complex at all.
 256
 257 Currently we actually use a compatible but vastly simpler implementation
 258 specifically developed as a part of Rubinius called MSpec (for mini-RSpec,
 259 as it was originally needed because the code in RSpec was too complex to be
 260 run on our not-yet-complete Ruby implementation.)
 261
 262 Specs live in the spec/ directory. The spec/ directory contains two copies of the RubySpecs (http://rubyspec.org). The spec/frozen directory is a git submodule and is synchronized with the tag files in spec/tags/frozen. The tag files are used to exclude known failures. The spec/frozen specs are used as a CI (continuous integration) process to ensure that new code does not cause regressions. The spec/ruby directory is a git clone of the RubySpecs and is included for the convenience of adding to the RubySpecs from within the Rubinius repository.
 263
 264 All the other directories under the spec directory except for spec/ruby and spec/frozen are for specs specific to Rubinius. The directories are self-explanatory. For example, the spec/compiler directory is for specs for the Rubinius compiler. Three directories could be confusing so those are described in more detail here.
 265
 266 The spec/core directory is for Rubinius specific extensions to MatzRuby core library. For example, Rubinius has a Tuple and ByteArray class. The specs for these are under the spec/core directory. This directory is also for Rubinius specific extensions to methods of the MatzRuby core library classes. For example, Rubinius handles coercion of Bignum and other numeric differently. The spec/core directory parallels the purpose of the spec/ruby/1.8/core directory from the RubySpecs.
 267
 268 The spec/language directory contains Rubinius specific specs related to the Ruby language. The directory parallels the spec/ruby/1.8/language directory.
 269
 270 The spec/library directory contains Rubinius specific standard library classes (e.g. Actor, VMActor) and Rubinius specific behavior of methods of MatzRuby standard library classes.
 271
 272 The specs are run with the bin/mspec command. See the help output from bin/mspec -h and refer to the links below for more details. In general, to run the specs for a specific spec file, use:
 273
 274   bin/mspec spec/some/spec_file.rb
 275
 276 The CI specs are run with rake spec or bin/mspec ci. CI is very important
 277 for any Rubinius developer: before each commit, bin/mspec ci should be run
 278 and should finish without error. It makes it very easy to ensure that your
 279 change did not break other, seemingly unrelated things because it exercises
 280 all areas of specs. A clean bin/mspec ci run gives confidence that your code
 281 is correct. Since there are a very large body of specs, the CI specs run by bin/mspec ci do not include the Ruby standard library specs. To run these specs as well, use rake spec:full or bin/mspec ci -B full.mspec.
 282
 283 For a deeper overview, tutorials, help and other information
 284 about Rubinius' specs, start here:
 285
 286 http://rubyspec.org
 287 http://rubinius.lighthouseapp.com/projects/5089/the-rubinius-specs
 288
 289
 290 4. Libraries and C: Primitives vs. FFI
 291 ======================================
 292
 293 There are two ways to "drop to C" in Rubinius. Firstly, primitives
 294 are special instructions that are specifically defined in the VM.
 295 In general they are operations that are impossible to do in the
 296 Ruby layer such as opening a file. Primitives should be used to
 297 access the functionality of the VM from inside Ruby.
 298
 299 FFI or Foreign Function Interface, on the other hand, is meant as
 300 a generalised method of accessing system libraries. FFI is able to
 301 automatically generate the bridge code needed to call out to some
 302 library and get the result back into Ruby. FFI functions at runtime
 303 as real machine code generation so that it is not necessary to have
 304 anything compiled beforehand. FFI should be used to access the code
 305 outside of Rubinius, whether it is system libraries or some type of
 306 extension code, for example.
 307
 308 There is also a specific Rubinius extension layer called Subtend.
 309 It emulates the extension interface of Ruby to allow old Ruby
 310 extensions to work with Rubinius.
 311
 312
 313 4.1 Primitives
 314 ==============
 315 Using the above rationale, if you need to implement a primitive:
 316
 317 * Give the primitive a sane name
 318 * Implement the primitive in shotgun/lib/primitives.rb using the
 319   name you chose as the method name.
 320 * Enter the primitive name as a symbol at the BOTTOM of the Array
 321   in shotgun/lib/primitive_names.rb.
 322 * `rake build`
 323
 324 This makes your primitive available in the Ruby layer using the
 325 special form @Ruby.primitive :primitive_name@. Primitives have a
 326 few rules and chief among them is that a primitive must be the
 327 first instruction in the method that it appears in. Partially for
 328 this reason all primitives should reside in a wrapper method in
 329 bootstrap/ (the other part is that core/ should be implementation
 330 independent and primitives are not.)
 331
 332 In addition to this, primitives have another property that may
 333 seem unintuitive: anything that appears below the primitive form
 334 in the wrapper method is executed if the primitive FAILS and only
 335 if it fails. There is no exception handling syntax involved. So
 336 this is a typical pattern:
 337
 338     # kernel/bootstrap/whatever.rb
 339     def self.prim_primitive_name()
 340       Ruby.primitive :primitive_name
 341       raise SomeError, "Whatever I was doing just failed."
 342     end
 343
 344     # kernel/core/whatever.rb
 345     def self.primitive_name()
 346       self.prim_primitive_name
 347       ...
 348     end
 349
 350 To have a primitive fail, the primitive body (in primitives.rb)
 351 should return FALSE; this will cause the code following the
 352 Ruby.primitive line to be run. This provides a fallback so that
 353 the operation can be retried in Ruby.
 354
 355 If a primitive cannot be retried in Ruby or if there is some
 356 additional information that needs to be passed along to create
 357 the exception, it may raise an exception using a couple of macros:
 358
 359 * RAISE(exc_class, msg) will raise an exception of type exc_class
 360   and with a message of msg, e.g.
 361
 362     RAISE("ArgumentError", "Invalid argument");
 363
 364 * RAISE_FROM_ERRNO(msg) will raise an Errno exception with the
 365   specified msg.
 366
 367 If you need to change the signature of a primitive, follow this
 368 procedure:
 369   1. change the signature of the kernel method that calls the
 370      VM primitive
 371   2. change any calls to the kernel method in the kernel/**
 372      code to use the new signature, then recompile
 373   3. run rake build:stable
 374   4. change the actual primitive in the VM and recompile again
 375   5. run bin/ci
 376
 377 4.2 FFI
 378 -------
 379
 380 Module#attach_function allows a C function to be called from Ruby
 381 code using FFI.
 382
 383 Module#attach_function takes the C function name, the ruby module
 384 function to bind it to, the C argument types, and the C return type.
 385 For a list of C argument types, see kernel/platform/ffi.rb.
 386
 387 Currently, FFI does not support C functions with more than 6
 388 arguments.
 389
 390 When the C function will be filling in a String, be sure the Ruby
 391 String is large enough. For the C function rbx_Digest_MD5_Finish,
 392 the digest string is allocated with a 16 character length.  The
 393 string is passed to md5_finish which calls rbx_Digest_MD5_Finish
 394 which fills in the string with the digest.
 395
 396   class Digest::MD5
 397     attach_function nil, 'rbx_Digest_MD5_Finish', :md5_finish,
 398                     [:pointer, :string], :void
 399
 400     def finish
 401       digest = ' ' * 16
 402       self.class.md5_finish @context, digest
 403       digest
 404     end
 405   end
 406
 407 For a complete additional example, see digest/md5.rb.
 408
 409
 410 5. Debugging: debugger, GDB, valgrind
 411 =====================================
 412
 413 With Rubinius, there are two distinct things that may need
 414 debugging (sometimes at the same time.) There is the Ruby
 415 code, for which 'debugger' exists. debugger is a full-speed
 416 debugger, which means that there is no extra compilation or
 417 flags to enable it but at the same time, code normally does
 418 not suffer a performance penalty from the infrastructure.
 419 This is achieved using a combination of bytecode substitution
 420 and Rubinius' Channel IO interface. Multithreaded debugging
 421 is supported (credit for the debugger goes to Adam Gardiner.)
 422
 423 On the C side, the trusty workhorse is the Gnu Debugger or
 424 GDB. In addition there is support built in for Valgrind, a
 425 memory checker/lint/debugger/analyzer hybrid.
 426
 427
 428 5.1 debugger
 429 ------------
 430 The nonchalantly named debugger is specifically the debugger
 431 for Ruby code, although it does also allow examining the VM
 432 as it runs. The easiest way to start it is to insert either
 433 a @breakpoint@ or @debugger@ method call anywhere in your
 434 source code. Upon running this method, the debugger starts
 435 up and awaits your command at the instruction where the
 436 @breakpoint@ or @debugger@ method used to be. For a full
 437 explanation of the debugger, refer to [currently the source
 438 but hopefully docs shortly.] You will see this prompt and
 439 there is a trusty command you can try to get started:
 440
 441     rbx:debug> help
 442
 443
 444 5.2 GDB
 445 -------
 446 To really be able to use GDB, make sure that you build Rubinius
 447 with DEV=1 set. This disables optimisations and adds debugging
 448 symbols.
 449
 450 There are two ways to access GDB for Rubinius. You can simply
 451 run shotgun/rubinius with gdb (use the builtin support so you
 452 do not need to worry about linking etc.):
 453
 454 * Run `shotgun/rubinius --gdb`, place a breakpoint (break main,
 455   for example) and then r(un.)
 456 * Alternatively, you can run and then hit ^C to interrupt.
 457
 458 You can also drop into GDB from Ruby code with @Kernel#yield_gdb@
 459 which uses a rather rude but very effective method of stopping
 460 execution to start up GDB. To continue past the @yield_gdb@,
 461 j(ump) to one line after the line that you have stopped on.
 462
 463 Useful gdb commands and functions (remember, using the p(rint)
 464 command in GDB you can access pretty much any C function in
 465 Rubinius):
 466
 467 * rbt
 468   Prints the backtrace of the Ruby side of things. Use this in
 469   conjunction with gdb's own bt which shows the C backtrace.
 470
 471 * p _inspect(OBJECT)
 472   Useful information about a given Ruby object.
 473
 474
 475 5.3 Valgrind
 476 ------------
 477 Valgrind is a program for debugging, profiling and memory-checking
 478 programs. The invocation is just  `shotgun/rubinius --valgrind`.
 479 See http://valgrind.org for usage information.
 480
 481 5.4 Tracing
 482 -----------
 483
 484 Excessive tracing can rapidly fill your screen up with crap.  To enable it,
 485
 486   RBX=rbx.debug.trace shotgun/rubinius ...
 487
 488 === END ===