Documentation/security/credentials.rst

   1 ====================
   2 Credentials in Linux
   3 ====================
   4
   5 By: David Howells <dhowells@redhat.com>
   6
   7 .. contents:: :local:
   8
   9 Overview
  10 ========
  11
  12 There are several parts to the security check performed by Linux when one
  13 object acts upon another:
  14
  15  1. Objects.
  16
  17      Objects are things in the system that may be acted upon directly by
  18      userspace programs.  Linux has a variety of actionable objects, including:
  19
  20         - Tasks
  21         - Files/inodes
  22         - Sockets
  23         - Message queues
  24         - Shared memory segments
  25         - Semaphores
  26         - Keys
  27
  28      As a part of the description of all these objects there is a set of
  29      credentials.  What's in the set depends on the type of object.
  30
  31  2. Object ownership.
  32
  33      Amongst the credentials of most objects, there will be a subset that
  34      indicates the ownership of that object.  This is used for resource
  35      accounting and limitation (disk quotas and task rlimits for example).
  36
  37      In a standard UNIX filesystem, for instance, this will be defined by the
  38      UID marked on the inode.
  39
  40  3. The objective context.
  41
  42      Also amongst the credentials of those objects, there will be a subset that
  43      indicates the 'objective context' of that object.  This may or may not be
  44      the same set as in (2) - in standard UNIX files, for instance, this is the
  45      defined by the UID and the GID marked on the inode.
  46
  47      The objective context is used as part of the security calculation that is
  48      carried out when an object is acted upon.
  49
  50  4. Subjects.
  51
  52      A subject is an object that is acting upon another object.
  53
  54      Most of the objects in the system are inactive: they don't act on other
  55      objects within the system.  Processes/tasks are the obvious exception:
  56      they do stuff; they access and manipulate things.
  57
  58      Objects other than tasks may under some circumstances also be subjects.
  59      For instance an open file may send SIGIO to a task using the UID and EUID
  60      given to it by a task that called ``fcntl(F_SETOWN)`` upon it.  In this case,
  61      the file struct will have a subjective context too.
  62
  63  5. The subjective context.
  64
  65      A subject has an additional interpretation of its credentials.  A subset
  66      of its credentials forms the 'subjective context'.  The subjective context
  67      is used as part of the security calculation that is carried out when a
  68      subject acts.
  69
  70      A Linux task, for example, has the FSUID, FSGID and the supplementary
  71      group list for when it is acting upon a file - which are quite separate
  72      from the real UID and GID that normally form the objective context of the
  73      task.
  74
  75  6. Actions.
  76
  77      Linux has a number of actions available that a subject may perform upon an
  78      object.  The set of actions available depends on the nature of the subject
  79      and the object.
  80
  81      Actions include reading, writing, creating and deleting files; forking or
  82      signalling and tracing tasks.
  83
  84  7. Rules, access control lists and security calculations.
  85
  86      When a subject acts upon an object, a security calculation is made.  This
  87      involves taking the subjective context, the objective context and the
  88      action, and searching one or more sets of rules to see whether the subject
  89      is granted or denied permission to act in the desired manner on the
  90      object, given those contexts.
  91
  92      There are two main sources of rules:
  93
  94      a. Discretionary access control (DAC):
  95
  96          Sometimes the object will include sets of rules as part of its
  97          description.  This is an 'Access Control List' or 'ACL'.  A Linux
  98          file may supply more than one ACL.
  99
 100          A traditional UNIX file, for example, includes a permissions mask that
 101          is an abbreviated ACL with three fixed classes of subject ('user',
 102          'group' and 'other'), each of which may be granted certain privileges
 103          ('read', 'write' and 'execute' - whatever those map to for the object
 104          in question).  UNIX file permissions do not allow the arbitrary
 105          specification of subjects, however, and so are of limited use.
 106
 107          A Linux file might also sport a POSIX ACL.  This is a list of rules
 108          that grants various permissions to arbitrary subjects.
 109
 110      b. Mandatory access control (MAC):
 111
 112          The system as a whole may have one or more sets of rules that get
 113          applied to all subjects and objects, regardless of their source.
 114          SELinux and Smack are examples of this.
 115
 116          In the case of SELinux and Smack, each object is given a label as part
 117          of its credentials.  When an action is requested, they take the
 118          subject label, the object label and the action and look for a rule
 119          that says that this action is either granted or denied.
 120
 121
 122 Types of Credentials
 123 ====================
 124
 125 The Linux kernel supports the following types of credentials:
 126
 127  1. Traditional UNIX credentials.
 128
 129         - Real User ID
 130         - Real Group ID
 131
 132      The UID and GID are carried by most, if not all, Linux objects, even if in
 133      some cases it has to be invented (FAT or CIFS files for example, which are
 134      derived from Windows).  These (mostly) define the objective context of
 135      that object, with tasks being slightly different in some cases.
 136
 137         - Effective, Saved and FS User ID
 138         - Effective, Saved and FS Group ID
 139         - Supplementary groups
 140
 141      These are additional credentials used by tasks only.  Usually, an
 142      EUID/EGID/GROUPS will be used as the subjective context, and real UID/GID
 143      will be used as the objective.  For tasks, it should be noted that this is
 144      not always true.
 145
 146  2. Capabilities.
 147
 148         - Set of permitted capabilities
 149         - Set of inheritable capabilities
 150         - Set of effective capabilities
 151         - Capability bounding set
 152
 153      These are only carried by tasks.  They indicate superior capabilities
 154      granted piecemeal to a task that an ordinary task wouldn't otherwise have.
 155      These are manipulated implicitly by changes to the traditional UNIX
 156      credentials, but can also be manipulated directly by the ``capset()``
 157      system call.
 158
 159      The permitted capabilities are those caps that the process might grant
 160      itself to its effective or permitted sets through ``capset()``.  This
 161      inheritable set might also be so constrained.
 162
 163      The effective capabilities are the ones that a task is actually allowed to
 164      make use of itself.
 165
 166      The inheritable capabilities are the ones that may get passed across
 167      ``execve()``.
 168
 169      The bounding set limits the capabilities that may be inherited across
 170      ``execve()``, especially when a binary is executed that will execute as
 171      UID 0.
 172
 173  3. Secure management flags (securebits).
 174
 175      These are only carried by tasks.  These govern the way the above
 176      credentials are manipulated and inherited over certain operations such as
 177      execve().  They aren't used directly as objective or subjective
 178      credentials.
 179
 180  4. Keys and keyrings.
 181
 182      These are only carried by tasks.  They carry and cache security tokens
 183      that don't fit into the other standard UNIX credentials.  They are for
 184      making such things as network filesystem keys available to the file
 185      accesses performed by processes, without the necessity of ordinary
 186      programs having to know about security details involved.
 187
 188      Keyrings are a special type of key.  They carry sets of other keys and can
 189      be searched for the desired key.  Each process may subscribe to a number
 190      of keyrings:
 191
 192         Per-thread keying
 193         Per-process keyring
 194         Per-session keyring
 195
 196      When a process accesses a key, if not already present, it will normally be
 197      cached on one of these keyrings for future accesses to find.
 198
 199      For more information on using keys, see ``Documentation/security/keys/*``.
 200
 201  5. LSM
 202
 203      The Linux Security Module allows extra controls to be placed over the
 204      operations that a task may do.  Currently Linux supports several LSM
 205      options.
 206
 207      Some work by labelling the objects in a system and then applying sets of
 208      rules (policies) that say what operations a task with one label may do to
 209      an object with another label.
 210
 211  6. AF_KEY
 212
 213      This is a socket-based approach to credential management for networking
 214      stacks [RFC 2367].  It isn't discussed by this document as it doesn't
 215      interact directly with task and file credentials; rather it keeps system
 216      level credentials.
 217
 218
 219 When a file is opened, part of the opening task's subjective context is
 220 recorded in the file struct created.  This allows operations using that file
 221 struct to use those credentials instead of the subjective context of the task
 222 that issued the operation.  An example of this would be a file opened on a
 223 network filesystem where the credentials of the opened file should be presented
 224 to the server, regardless of who is actually doing a read or a write upon it.
 225
 226
 227 File Markings
 228 =============
 229
 230 Files on disk or obtained over the network may have annotations that form the
 231 objective security context of that file.  Depending on the type of filesystem,
 232 this may include one or more of the following:
 233
 234  * UNIX UID, GID, mode;
 235  * Windows user ID;
 236  * Access control list;
 237  * LSM security label;
 238  * UNIX exec privilege escalation bits (SUID/SGID);
 239  * File capabilities exec privilege escalation bits.
 240
 241 These are compared to the task's subjective security context, and certain
 242 operations allowed or disallowed as a result.  In the case of execve(), the
 243 privilege escalation bits come into play, and may allow the resulting process
 244 extra privileges, based on the annotations on the executable file.
 245
 246
 247 Task Credentials
 248 ================
 249
 250 In Linux, all of a task's credentials are held in (uid, gid) or through
 251 (groups, keys, LSM security) a refcounted structure of type 'struct cred'.
 252 Each task points to its credentials by a pointer called 'cred' in its
 253 task_struct.
 254
 255 Once a set of credentials has been prepared and committed, it may not be
 256 changed, barring the following exceptions:
 257
 258  1. its reference count may be changed;
 259
 260  2. the reference count on the group_info struct it points to may be changed;
 261
 262  3. the reference count on the security data it points to may be changed;
 263
 264  4. the reference count on any keyrings it points to may be changed;
 265
 266  5. any keyrings it points to may be revoked, expired or have their security
 267     attributes changed; and
 268
 269  6. the contents of any keyrings to which it points may be changed (the whole
 270     point of keyrings being a shared set of credentials, modifiable by anyone
 271     with appropriate access).
 272
 273 To alter anything in the cred struct, the copy-and-replace principle must be
 274 adhered to.  First take a copy, then alter the copy and then use RCU to change
 275 the task pointer to make it point to the new copy.  There are wrappers to aid
 276 with this (see below).
 277
 278 A task may only alter its _own_ credentials; it is no longer permitted for a
 279 task to alter another's credentials.  This means the ``capset()`` system call
 280 is no longer permitted to take any PID other than the one of the current
 281 process. Also ``keyctl_instantiate()`` and ``keyctl_negate()`` functions no
 282 longer permit attachment to process-specific keyrings in the requesting
 283 process as the instantiating process may need to create them.
 284
 285
 286 Immutable Credentials
 287 ---------------------
 288
 289 Once a set of credentials has been made public (by calling ``commit_creds()``
 290 for example), it must be considered immutable, barring two exceptions:
 291
 292  1. The reference count may be altered.
 293
 294  2. Whilst the keyring subscriptions of a set of credentials may not be
 295     changed, the keyrings subscribed to may have their contents altered.
 296
 297 To catch accidental credential alteration at compile time, struct task_struct
 298 has _const_ pointers to its credential sets, as does struct file.  Furthermore,
 299 certain functions such as ``get_cred()`` and ``put_cred()`` operate on const
 300 pointers, thus rendering casts unnecessary, but require to temporarily ditch
 301 the const qualification to be able to alter the reference count.
 302
 303
 304 Accessing Task Credentials
 305 --------------------------
 306
 307 A task being able to alter only its own credentials permits the current process
 308 to read or replace its own credentials without the need for any form of locking
 309 -- which simplifies things greatly.  It can just call::
 310
 311         const struct cred *current_cred()
 312
 313 to get a pointer to its credentials structure, and it doesn't have to release
 314 it afterwards.
 315
 316 There are convenience wrappers for retrieving specific aspects of a task's
 317 credentials (the value is simply returned in each case)::
 318
 319         uid_t current_uid(void)         Current's real UID
 320         gid_t current_gid(void)         Current's real GID
 321         uid_t current_euid(void)        Current's effective UID
 322         gid_t current_egid(void)        Current's effective GID
 323         uid_t current_fsuid(void)       Current's file access UID
 324         gid_t current_fsgid(void)       Current's file access GID
 325         kernel_cap_t current_cap(void)  Current's effective capabilities
 326         void *current_security(void)    Current's LSM security pointer
 327         struct user_struct *current_user(void)  Current's user account
 328
 329 There are also convenience wrappers for retrieving specific associated pairs of
 330 a task's credentials::
 331
 332         void current_uid_gid(uid_t *, gid_t *);
 333         void current_euid_egid(uid_t *, gid_t *);
 334         void current_fsuid_fsgid(uid_t *, gid_t *);
 335
 336 which return these pairs of values through their arguments after retrieving
 337 them from the current task's credentials.
 338
 339
 340 In addition, there is a function for obtaining a reference on the current
 341 process's current set of credentials::
 342
 343         const struct cred *get_current_cred(void);
 344
 345 and functions for getting references to one of the credentials that don't
 346 actually live in struct cred::
 347
 348         struct user_struct *get_current_user(void);
 349         struct group_info *get_current_groups(void);
 350
 351 which get references to the current process's user accounting structure and
 352 supplementary groups list respectively.
 353
 354 Once a reference has been obtained, it must be released with ``put_cred()``,
 355 ``free_uid()`` or ``put_group_info()`` as appropriate.
 356
 357
 358 Accessing Another Task's Credentials
 359 ------------------------------------
 360
 361 Whilst a task may access its own credentials without the need for locking, the
 362 same is not true of a task wanting to access another task's credentials.  It
 363 must use the RCU read lock and ``rcu_dereference()``.
 364
 365 The ``rcu_dereference()`` is wrapped by::
 366
 367         const struct cred *__task_cred(struct task_struct *task);
 368
 369 This should be used inside the RCU read lock, as in the following example::
 370
 371         void foo(struct task_struct *t, struct foo_data *f)
 372         {
 373                 const struct cred *tcred;
 374                 ...
 375                 rcu_read_lock();
 376                 tcred = __task_cred(t);
 377                 f->uid = tcred->uid;
 378                 f->gid = tcred->gid;
 379                 f->groups = get_group_info(tcred->groups);
 380                 rcu_read_unlock();
 381                 ...
 382         }
 383
 384 Should it be necessary to hold another task's credentials for a long period of
 385 time, and possibly to sleep whilst doing so, then the caller should get a
 386 reference on them using::
 387
 388         const struct cred *get_task_cred(struct task_struct *task);
 389
 390 This does all the RCU magic inside of it.  The caller must call put_cred() on
 391 the credentials so obtained when they're finished with.
 392
 393 .. note::
 394    The result of ``__task_cred()`` should not be passed directly to
 395    ``get_cred()`` as this may race with ``commit_cred()``.
 396
 397 There are a couple of convenience functions to access bits of another task's
 398 credentials, hiding the RCU magic from the caller::
 399
 400         uid_t task_uid(task)            Task's real UID
 401         uid_t task_euid(task)           Task's effective UID
 402
 403 If the caller is holding the RCU read lock at the time anyway, then::
 404
 405         __task_cred(task)->uid
 406         __task_cred(task)->euid
 407
 408 should be used instead.  Similarly, if multiple aspects of a task's credentials
 409 need to be accessed, RCU read lock should be used, ``__task_cred()`` called,
 410 the result stored in a temporary pointer and then the credential aspects called
 411 from that before dropping the lock.  This prevents the potentially expensive
 412 RCU magic from being invoked multiple times.
 413
 414 Should some other single aspect of another task's credentials need to be
 415 accessed, then this can be used::
 416
 417         task_cred_xxx(task, member)
 418
 419 where 'member' is a non-pointer member of the cred struct.  For instance::
 420
 421         uid_t task_cred_xxx(task, suid);
 422
 423 will retrieve 'struct cred::suid' from the task, doing the appropriate RCU
 424 magic.  This may not be used for pointer members as what they point to may
 425 disappear the moment the RCU read lock is dropped.
 426
 427
 428 Altering Credentials
 429 --------------------
 430
 431 As previously mentioned, a task may only alter its own credentials, and may not
 432 alter those of another task.  This means that it doesn't need to use any
 433 locking to alter its own credentials.
 434
 435 To alter the current process's credentials, a function should first prepare a
 436 new set of credentials by calling::
 437
 438         struct cred *prepare_creds(void);
 439
 440 this locks current->cred_replace_mutex and then allocates and constructs a
 441 duplicate of the current process's credentials, returning with the mutex still
 442 held if successful.  It returns NULL if not successful (out of memory).
 443
 444 The mutex prevents ``ptrace()`` from altering the ptrace state of a process
 445 whilst security checks on credentials construction and changing is taking place
 446 as the ptrace state may alter the outcome, particularly in the case of
 447 ``execve()``.
 448
 449 The new credentials set should be altered appropriately, and any security
 450 checks and hooks done.  Both the current and the proposed sets of credentials
 451 are available for this purpose as current_cred() will return the current set
 452 still at this point.
 453
 454 When replacing the group list, the new list must be sorted before it
 455 is added to the credential, as a binary search is used to test for
 456 membership.  In practice, this means :c:func:`groups_sort` should be
 457 called before :c:func:`set_groups` or :c:func:`set_current_groups`.
 458 :c:func:`groups_sort)` must not be called on a ``struct group_list`` which
 459 is shared as it may permute elements as part of the sorting process
 460 even if the array is already sorted.
 461
 462 When the credential set is ready, it should be committed to the current process
 463 by calling::
 464
 465         int commit_creds(struct cred *new);
 466
 467 This will alter various aspects of the credentials and the process, giving the
 468 LSM a chance to do likewise, then it will use ``rcu_assign_pointer()`` to
 469 actually commit the new credentials to ``current->cred``, it will release
 470 ``current->cred_replace_mutex`` to allow ``ptrace()`` to take place, and it
 471 will notify the scheduler and others of the changes.
 472
 473 This function is guaranteed to return 0, so that it can be tail-called at the
 474 end of such functions as ``sys_setresuid()``.
 475
 476 Note that this function consumes the caller's reference to the new credentials.
 477 The caller should _not_ call ``put_cred()`` on the new credentials afterwards.
 478
 479 Furthermore, once this function has been called on a new set of credentials,
 480 those credentials may _not_ be changed further.
 481
 482
 483 Should the security checks fail or some other error occur after
 484 ``prepare_creds()`` has been called, then the following function should be
 485 invoked::
 486
 487         void abort_creds(struct cred *new);
 488
 489 This releases the lock on ``current->cred_replace_mutex`` that
 490 ``prepare_creds()`` got and then releases the new credentials.
 491
 492
 493 A typical credentials alteration function would look something like this::
 494
 495         int alter_suid(uid_t suid)
 496         {
 497                 struct cred *new;
 498                 int ret;
 499
 500                 new = prepare_creds();
 501                 if (!new)
 502                         return -ENOMEM;
 503
 504                 new->suid = suid;
 505                 ret = security_alter_suid(new);
 506                 if (ret < 0) {
 507                         abort_creds(new);
 508                         return ret;
 509                 }
 510
 511                 return commit_creds(new);
 512         }
 513
 514
 515 Managing Credentials
 516 --------------------
 517
 518 There are some functions to help manage credentials:
 519
 520  - ``void put_cred(const struct cred *cred);``
 521
 522      This releases a reference to the given set of credentials.  If the
 523      reference count reaches zero, the credentials will be scheduled for
 524      destruction by the RCU system.
 525
 526  - ``const struct cred *get_cred(const struct cred *cred);``
 527
 528      This gets a reference on a live set of credentials, returning a pointer to
 529      that set of credentials.
 530
 531  - ``struct cred *get_new_cred(struct cred *cred);``
 532
 533      This gets a reference on a set of credentials that is under construction
 534      and is thus still mutable, returning a pointer to that set of credentials.
 535
 536
 537 Open File Credentials
 538 =====================
 539
 540 When a new file is opened, a reference is obtained on the opening task's
 541 credentials and this is attached to the file struct as ``f_cred`` in place of
 542 ``f_uid`` and ``f_gid``.  Code that used to access ``file->f_uid`` and
 543 ``file->f_gid`` should now access ``file->f_cred->fsuid`` and
 544 ``file->f_cred->fsgid``.
 545
 546 It is safe to access ``f_cred`` without the use of RCU or locking because the
 547 pointer will not change over the lifetime of the file struct, and nor will the
 548 contents of the cred struct pointed to, barring the exceptions listed above
 549 (see the Task Credentials section).
 550
 551
 552 Overriding the VFS's Use of Credentials
 553 =======================================
 554
 555 Under some circumstances it is desirable to override the credentials used by
 556 the VFS, and that can be done by calling into such as ``vfs_mkdir()`` with a
 557 different set of credentials.  This is done in the following places:
 558
 559  * ``sys_faccessat()``.
 560  * ``do_coredump()``.
 561  * nfs4recover.c.