Newsgroups: sci.crypt From: schneier@chinet.chinet.com (Bruce Schneier) Subject: BLOWFISH - Paper from Dec 94 Conference Message-ID: Organization: Chinet - Public Access UNIX Date: Fri, 8 Apr 1994 16:50:28 GMT Lines: 686 Quite a few people have been asking me about this paper, so I have decided to post it here. It will appear in the conference proceedings, which will be published by Springer-Verlag sometime soon. The three figures are not here; sorry. Bruce ************************************************************************* Description of a New Variable-Length Key, 64-Bit Block Cipher (Blowfish) Bruce Schneier Counterpane Systems, 730 Fair Oaks Ave, Oak Park, IL 60302 schneier@chinet.com Abstract: Blowfish, a new secret-key block cipher, is proposed. It is a Feistel network, iterating a simple encryption function 16 times. The block size is 64 bits, and the key can be any length up to 448 bits. Although there is a complex initialization phase required before any encryption can take place, the actual encryption of data is very efficient on large microprocessors. The cryptographic community needs to provide the world with a new encryption standard. DES [16], the workhorse encryption algorithm for the past fifteen years, is nearing the end of its useful life. Its 56-bit key size is vulnerable to a brute-force attack [22], and recent advances in differential cryptanalysis [1] and linear cryptanalysis [10] indicate that DES is vulnerable to other attacks as well. Many of the other unbroken algorithms in the literaturebKhufu [11,12], REDOC II [2,23, 20], and IDEA [7,8,9]bare protected by patents. RC2 and RC4, approved for export with a small key size, are proprietary [18]. GOST [6], a Soviet government algorithm, is specified without the S-boxes. The U.S. government is moving towards secret algorithms, such as the Skipjack algorithm in the Clipper and Capstone chips [17]. If the world is to have a secure, unpatented, and freely- available encryption algorithm by the turn of the century, we need to develop several candidate encryption algorithms now. These algorithms can then be subjected to years of public scrutiny and cryptanalysis. Then, the hope is that one or more candidate algorithms will survive this process, and can eventually become a new standard. This paper discusses the requirements for a standard encryption algorithm. While it may not be possible to satisfy all requirements with a single algorithm, it may be possible to satisfy them with a family of algorithms based on the same cryptographic principles. AREAS OF APPLICATION A standard encryption algorithm must be suitable for many different applications: Bulk encryption. The algorithm should be efficient in encrypting data files or a continuous data stream. Random bit generation. The algorithm should be efficient in producing single random bits. Packet encryption. The algorithm should be efficient in encrypting packet-sized data. (An ATM packet has a 48- byte data field.) It should implementable in an application where successive packets may be encrypted or decrypted with different keys. Hashing. The algorithm should be efficient in being converted to a one-way hash function. PLATFORMS A standard encryption algorithm must be implementable on a variety of different platforms, each with their own requirements. These include: Special hardware. The algorithm should be efficiently implementable in custom VLSI hardware. Large processors. While dedicated hardware will always be used for the fastest applications, software implementations are more common. The algorithm should be efficient on 32-bit microprocessors with 4 kbyte program and data caches. Medium-size processors. The algorithm should run on microcontrollers and other medium-size processors, such as the 68HC11. Small processors. It should be possible to implement the algorithm on smart cards, even inefficiently. The requirements for small processors are the most difficult. RAM and ROM limitations are severe for this platform. Also, efficiency is more important on these small machines. Workstations double their capacity almost annually. Small embedded systems are the same year after year, and there is little capacity to spare. If there is a choice, the extra computation burden should be on large processors rather than small processors. ADDITIONAL REQUIREMENTS These additional requirements should, if possible, be levied on a standard encryption algorithm. The algorithm should be simple to code. Experiences with DES [19] show that programmers will often make implementation mistakes if the algorithm is complicated. If possible, the algorithm should be robust against these mistakes. The algorithm should have a flat keyspace, allowing any random bit string of the required length to be a possible key. There should be no weak keys. The algorithm should facilitate easy key-management for software implementations. Software implementations of DES generally use poor key management techniques. In particular, the password that the user types in becomes the key. This means that although DES has a theoretical keyspace of 2^56, the actual keyspace is limited to keys constructed with the 95 characters of printable ASCII. Additionally, keys corresponding to words and near words are much more likely. The algorithm should be easily modifiable for different levels of security, both minimum and maximum requirements. All operations should manipulate data in byte-sized blocks. Where possible, operations should manipulate data in 32-bit blocks. DESIGN DECISIONS Based on the above parameters, we have made these design decisions. The algorithm should: Manipulate data in large blocks, preferably 32 bits in size (and not in single bits, such as DES). Have either a 64-bit or a 128-bit block size. Have a scalable key, from 32 bits to at least 256 bits. Use simple operations that are efficient on microprocessors: e.g., exclusive-or, addition, table lookup, modular- multiplication. It should not use variable-length shifts or bit-wise permutations, or conditional jumps. Be implementable on an 8-bit processor with a minimum of 24 bytes of RAM (in addition to the RAM required to store the key) and 1 kilobyte of ROM. Employ precomputable subkeys. On large-memory systems, these subkeys can be precomputed for faster operation. Not precomputing the subkeys will result in slower operation, but it should still be possible to encrypt data without any precomputations. Consist of a variable number of iterations. For applications with a small key size, the trade-off between the complexity of a brute-force attack and a differential attack make a large number of iterations superfluous. Hence, it should be possible to reduce the number of iterations with no loss of security (beyond that of the reduced key size). If possible, have no weak keys. If not possible, the proportion of weak keys should be small enough to make it unlikely to choose one at random. Also, any weak keys should be explicitly known so they can be weeded out during the key generation process. Use subkeys that are a one-way hash of the key. This would of long passphrases for the key without compromising security. Have no linear structures (e.g., the complementation propertie of DES) that reduce the complexity of exhaustive search [4]. Use a design that is simple to understand. This will facilitate analysis and increase the confidence in the algorithm. In practice, this means that the algorithm will be a Feistel iterated block cipher [21]. Most of these design decisions are not new. Almost all block ciphers since Lucifer [5,21] are Feistel ciphers, and all have a flat keyspace (with the possible exception of a few weak keys). FEAL [13,14,15] and Khufu [11] use a variable number of iterations. Khufu [11] has a large number of subkeys that are a one-way function of the key. RC2 [18] has a variable-length key. GOST [6] uses a 32-bit word length and a 64-bit block size. MMB [2] uses a 32-bit word length and a 128-bit block size. BUILDING BLOCKS There are a number of building blocks that have been demonstrated to produce strong ciphers. Many of these can be efficiently implemented on 32-bit microprocessors. Large S-boxes. Larger S-boxes are more resistant to differential cryptanalysis. An algorithm with a 32-bit word length can use 32-bit S-boxes. Khufu and REDOC III both use a 256-entry, 32-bit wide S-box [11,20]. Key-dependent S-boxes. While fixed S-boxes must be designed to be resistant to differential and linear cryptanalysis, key-dependent S-boxes are much more resistant to these attacks. They are used in the Khufu algorithm [11]. Variable S-boxes, which could possibly be key dependent, are used in GOST [6]. Combining operations from different algebraic groups. The IDEA cipher introduced this concept, combining XOR mod 2^16, addition mod 2^16, and multiplication mod 2^16+1 [7]. The MMB cipher uses a 32-bit word, and combines XOR mod 232 with multiplication mod 2^32-1 [2]. Key-dependent permutations. The fixed initial and final permutations of DES have been long regarded as cryptographically worthless. Khufu XORs the text block with key material at the beginning and the end of the algorithm [11]. BLOWFISH Blowfish is a variable-length key block cipher. It does not meet all the requirements for a new cryptographic standard discussed above: it is only suitable for applications where the key does not change often, like a communications link or an automatic file encryptor. It is significantly faster than DES when implemented on 32-bit microprocessors with large data caches, such as the Pentium and the PowerPC. DESCRIPTION OF THE ALGORITHM Blowfish is a variable-length key, 64-bit block cipher. The algorithm consists of two parts: a key-expansion part and a data- encryption part. Key expansion converts a key of at most 448 bits into several subkey arrays totaling 4168 bytes. Data encryption occurs via a 16-round Feistel network. Each round consists of a key-dependent permutation, and a key- and data-dependent substitution. All operations are XORs and additions on 32-bit words. The only additional operations are four indexed array data lookups per round. Subkeys: Blowfish uses a large number of subkeys. These keys must be precomputed before any data encryption or decryption. 1. The P-array consists of 18 32-bit subkeys: P[1], P[2],..., P[18]. 2. There are four 32-bit S-boxes with 256 entries each: S[1,0], S[1,1],..., S[1,255]; S[2,0], S[2,1],..,, S[2,255]; S[3,0], S[3,1],..., S[3,255]; S[4,0], S[4,1],..,, S[4,255]. The exact method used to calculate these subkeys will be described later. Encryption: Blowfish is a Feistel network consisting of 16 rounds (see Figure 1). The input is a 64-bit data element, x. Divide x into two 32-bit halves: x[L], x[R] For i = 1 to 16: x[L] = x[L] XOR P[i] x[R] = F(x[L]) XOR x[R] Swap x[L] and x[R] Swap x[L] and x[R] (Undo the last swap.) x[R] = x[R] XOR P[17] x[L] = x[L] XOR P[18] Recombine x[L] and x[R] Function F (see Figure 2): Divide x[L] into four eight-bit quarters: a, b, c, and d F(x[L]) = ((S[1,a] + S[2,b] mod 2^32) XOR S[3,c]) + S[4,d] mod 2^32 Decryption is exactly the same as encryption, except that P[1], P[2],..., P[18] are used in the reverse order. Implementations of Blowfish that require the fastest speeds should unroll the loop and ensure that all subkeys are stored in cache. Generating the Subkeys: The subkeys are calculated using the Blowfish algorithm. The exact method is as follows: 1. Initialize first the P-array and then the four S-boxes, in order, with a fixed string. This string consists of the hexadecimal digits of pi (less the initial 3). For example: P[1] = 0x243f6a88 P[2] = 0x85a308d3 P[3] = 0x13198a2e P[4] = 0x03707344 2. XOR P[1] with the first 32 bits of the key, XOR P[2] with the second 32-bits of the key, and so on for all bits of the key (possibly up to P[14]). Repeatedly cycle through the key bits until the entire P-array has been XORed with key bits. (For every short key, there is at least one equivalent longer key; for example, if A is a 64-bit key, then AA, AAA, etc., are equivalent keys.) 3. Encrypt the all-zero string with the Blowfish algorithm, using the subkeys described in steps (1) and (2). 4. Replace P[1] and P[2] with the output of step (3). 5. Encrypt the output of step (3) using the Blowfish algorithm with the modified subkeys. 6. Replace P[3] and P[4] with the output of step (5). 7. Continue the process, replacing all entries of the P- array, and then all four S-boxes in order, with the output of the continuously-changing Blowfish algorithm. In total, 521 iterations are required to generate all required subkeys. Applications can store the subkeys rather than execute this derivation process multiple times. MINI-BLOWFISH The following mini versions of Blowfish are defined solely for cryptanalysis. They are not suggested for actual implementation. Blowfish-32 has a 32-bit block size and subkey arrays of 16-bit entries (each S-box has 16 entries). Blowfish-16 has a 16-bit block size and subkey arrays of 8-bit entries (each S-box has 4 entries). DESIGN DECISIONS The underlying philosophy behind Blowfish is that simplicity of design yields an algorithm that is both easier to understand and easier to implement. Through the use of a streamlined Feistel networkba simple S-box substitution and a simple P-box substitutionbI hope that the design will not contain any flaws. A 64-bit block size yields a 32-bit word size, and maintains block-size compatibility with existing algorithms. Blowfish is easy to scale up to a 128-bit block, and down to smaller block sizes. Cryptanalysis of the mini-Blowfish variants may be significantly easier than cryptanalysis of the full version. The fundamental operations were chosen with speed in mind. XOR, ADD, and MOV from a cache are efficient on both Intel and Motorola architectures. All subkeys fit in the cache of a 80486, 68040, Pentium, and PowerPC. The Feistel network that makes up the body of Blowfish is designed to be as simple as possible, while still retaining the desirable cryptographic properties of the structure. Figure 3 is round i of a general Feistel network: R[n,i] are reversible functions of text and key, and N[i] is a non-reversible function of text and key. For speed and simplicity, I chose XOR as my reversible function. This let me collapse the four XORs into a single XOR, since: 1] = R[1,i+1] XOR R[2,i-1] XOR R[3,i] XOR R[4,i] This is the P-array substitution in Blowfish. The XOR can also be considered to be part of the non-reversible function, N[i], occurring at the end of the function. (Although equivalent, I chose not to illustrate them in this way because it simplifies description of the subkey-generation process.) There are two XORs that remain after this reduction: R[1] in the first round and R[2] in the last round. I chose not to eliminate these in order to hide the input to the first non-reversible function. I considered a more complicated reversible function, one with modular multiplications and rotations. However, these operations would greatly increase the algorithm's execution time. Since function F is the primary source of the algorithm's security, I decided to save time-consuming complications for that function. Function F, the non-reversible function, gives Blowfish the best possible avalanche effect for a Feistel network: every text bit on the left half of the round affects every text bit on the right half. Additionally, since every subkey bit is affected by every key bit, the function also has a perfect avalanche effect between the key and the right half of the text after every round. Hence, the algorithm exhibits a perfect avalanche effect after three rounds and again every two rounds after that. I considered adding a reversible mixing function, more complicated than XOR, before the first and after the last round. This would further confuse the entry values into the Feistel network and ensure a complete avalanche effect after the first two rounds. I eventually discarded the addition as a time- consuming complication with no clear cryptographic benefits. The non-reversible function is designed for strength, speed, and simplicity. Ideally, I wanted a single S-box with 2^32 32-bit words, but that was impractical. My eventual choice of 256-entry S-boxes was a compromise between my three design goals. The small-number of bits to large-number of bits may have weaknesses with respect to linear cryptanalysis, but these weaknesses are hidden both by combining the output of four S-boxes and making them dependent on the key. I used four different S-boxes instead of one S-box primarily to avoid symmetries when different bytes of the input are equal, or when the 32-bit input to function F is a bytewise permutation of another 32-bit input. I could have used one S-box and made each of the four different outputs a non-trivial permutation of the single output, but the four S-box design is faster, easier to program, and seems more secure. The function that combines the four S-box outputs is as fast as possible. A simpler function would be to XOR the four values, but mixing addition mod 2^32 and XOR combines two different algebraic groups with no additional instructions. The alternation of addition and XOR ends with an addition operation because an XOR combines the final result with x[R]. If the four indexes chose values out of the same S-box, a more complex combining function would be required to eliminate symmetries. I considered using a more complex combining function in Blowfish (using modular multiplications, rotations, etc.), but chose not to because the added complication seemed unnecessary. The key-dependent S-boxes protect against differential and linear cryptanalysis. Since the structure of the S-boxes is completely hidden from the cryptanalyst, these attacks have a more difficult time exploiting that structure. While it would be possible to replace these variable S-boxes with four fixed S-boxes that were designed to be resistant to these attacks, key-dependent S-boxes are easier to implement and less susceptible to arguments of "hidden" properties. Additionally, these S-boxes can be created on demand, reducing the need for large data structures stored with the algorithm. Each bit of x[L] is only used as the input to one S-box. In DES many bits are used as inputs to two S-boxes, which strengthens the algorithm considerably against differential attacks. I feel that this added complication is not as necessary with key- dependent S-boxes. Additionally, larger S-boxes would take up considerably more memory space. Function F does not depend on the iteration. I considered adding this dependency, but did not feel that it had any cryptographic merit. The P-array substitution can be considered to be part of this function, and that is already iteration-dependent. The number of rounds is set at 16 primarily out of desire to be conservative. However, this number affects the size of the P- array and therefore the subkey-generation process; 16 iterations permits key lengths up to 448 bits. I expect to be able to reduce this number, and greatly speed up the algorithm in the process, as I accumulate more cryptanalysis data. In algorithm design, there are two basic ways to ensure that the key is long enough to ensure a particular security level. One is to carefully design the algorithm so that the entire entropy of the key is preserved, so there is no better way to cryptanalyze the algorithm other than brute force. The other is to design the algorithm with so many key bits that attacks that reduce the effective key length by several bits are irrelevant. Since Blowfish is designed for large microprocessors with large amounts of memory, I chose the latter. The subkey generation process is designed to preserve the entire entropy of the key and to distribute that entropy uniformly throughout the subkeys. It is also designed to distribute the set of allowed subkeys randomly throughout the domain of possible subkeys. I chose the digits of pi as the initial subkey table for two reasons: because it is a random sequence not related to the algorithm, and because it could either be stored as part of the algorithm or derived when needed. There is nothing sacred about pi; any string of random bitsbdigits of e, RAND tables, output of a random number generatorbwill suffice. However, if the initial string is non-random in any way (for example, ASCII text with the high bit of every byte a 0), this non-randomness will propagate throughout the algorithm. In the subkey generation process, the subkeys change slightly with every pair of subkeys generated. This is primarily to protect against any attacked of the subkey generation process that exploit the fixed and known subkeys. It also reduces storage requirements. The 448 limit on the key size ensures that the every bit of every subkey depends on every bit of the key. (Note that every bit of P[15], P[16], P[17], and P[18] does not affect every bit of the ciphertext, and that any S-box entry only has a .06 probability of affecting any single ciphertext block.) The key bits are repeatedly XORed with the digits of pi in the initial P-array to prevent the following potential attack: Assume that the key bits are not repeated, but instead padded with zeros to extend it to the length of the P-array. An attacker might find two keys that differ only in the 64-bit value XORed with P[1] and P[2] that, using the initial known subkeys, produce the same encrypted value. If so, he can find two keys that produce all the same subkeys. This is a highly tempting attack for a malicious key generator. To prevent this same type of attack, I fixed the initial plaintext value in the subkey-generation process. There is nothing special about the all-zeros string, but it is important that this value be fixed. The subkey-generation algorithm does not assume that the key bits are random. Even highly correlated key bits, such as an alphanumeric ASCII string with the bit of every byte set to 0, will produce random subkeys. However, to produce subkeys with the same entropy, a longer alphanumeric key is required. The time-consuming subkey-generation process adds considerable complexity for a brute-force attack. The subkeys are too long to be stored on a massive tape, so they would have to be generated by a brute-force cracking machine as required. A total of 522 iterations of the encryption algorithm are required to test a s9 steps to any brute-force attack. POSSIBLE SIMPLIFICATIONS I am exploring several possible simplifications, aimed at decreasing memory requirements and execution time. These are outlined below: Fewer and smaller S-boxes. It may be possible to reduce the number of S-boxes from four to one. Additionally, it may be possible to overlap entries in a single S-box: entry 0 would consist of bytes 0 through 3, entry 1 would consist of bytes 1 through 4, etc. The former simplification would reduce the memory requirements for the four S-boxes from 4096 bytes to 1024 bytes, the latter would reduce the requirements for a single S-box from 1024 bytes to 259 bytes. Additional steps may be required to eliminate the symmetries that these simplifications would introduce. Additionally, four different 10- or 12-bit indexes into a single large S-box could be used instead of the current series of S-boxes. Fewer iterations. It is probably safe to reduce the number of iterations from 16 to 8 without compromising security. The number of iterations required for security may be dependent on the length of the key. Note that with the current subkey generation procedure, an 8-iteration algorithm cannot accept a key longer than 192 bits. On-the-fly subkey calculation. The current method of subkey calculation requires all subkeys to be calculated advance of any data encryption. In fact, it is impossible to calculate the last subkey of the last S-box without calculating every subkey that comes before. An alternate method of subkey calculation would be preferable: one where every subkey can be calculated independently of any other. High-end implementations could still precompute the subkeys for increased speed, but low-end applications could only compute the required subkeys when needed. CONCLUSIONS I conjecture that the most efficient way to break Blowfish is through exhaustive search of the keyspace. I encourage all cryptanalytic attacks, modifications, and improvements to the algorithm. Attacks on mini versions of Blowfish, those with a 32- or even a 16-bit block size, are also encouraged. Source code in C and test data can be provided to anyone wishing to implement the algorithm, in accordance with U.S. export laws. The software magazine Dr. Dobbs Journal is sponsoring $1000 contest for the best cryptanalysis of Blowfish received before April 1995. Please contact me for details. Blowfish is unpatented, and will remain so in all countries. The algorithm is hereby placed in the public domain, and can be freely used by anyone. ACKNOWLEDGEMENTS Much of the motivation for this algorithm, as well as the design criteria, was developed with Niels Fergusen. I would also like to thank Eli Biham, Agnes Chan, Peter Gutmann, Angel Johnston, Lars Kundsen, and Matt Robshaw for their helpful suggestions. REFERENCES 1. E. Biham and A. Shamir, Differential Cryptanalysis of the Data Encryption Standard, Springer-Verlag, 1993. 2. T.W. Cusick and M.C. Wood, "The REDOC-II Cryptosystem," Advances in CryptologybCRYPTO '90 Proceedings, Springer- Verlag, 1991, pp. 545-563. 3. J. Deamen, R. Govaerts, and J. Vandewalle, "Block Ciphers Based on Modular Arithmetic," Proceedings of the 3rd Symposium on State and Progress of Research in Cryptography, Rome, Italy, 15-16 Feb 1993, pp. 80-89. 4. J.-H. Evertse, "Linear Structures in Blockciphers," Advances in CryptologybEUROCRPYT '87, Springer-Verlag, 1988, pp. 249- 266. 5. H. Feistel, "Cryptography and Computer Privacy," Scientific American, v. 228, n. 5, May 73, pp. 15-23. 6. GOST 28147-89, "Cryptographic Protection for Data Processing Systems," "Cryptographic Transformation Algorithm," Government Standard of the U.S.S.R., Inv. No. 3583, UDC 681.325.6:006.354. (in Russian) 7. X. Lai, J. Massey, and S. Murphy, "Markov Ciphers and Differential Cryptanalysis," Advances in CryptologybEUROCRYPT '91 Proceedings, Springer-Verlag, 1991, pp. 17-38. 8. J.L. Massey and X. Lai, "Device for Converting a Digital Block and the Use Thereof," International Patent PCT/CH91/00117, 16 May 1991. 9. J.L. Massey and X. Lai, "Device for the Conversion of a Digital Block and Use of Same," U.S. Patent 5,214,703, 25 May 1993. 10. M. Matsui, "Linear Cryptanalysis Method for DES Cipher," Advances in CryptologybCRYPTO '93 Proceedings, Springer- Verlag, 1994, in preparation. 11. R.C. Merkle, "Fast Software Encryption Functions," Advances in CryptologybCRYPTO '90 Proceedings, Springer-Verlag, 1991, pp. 476-501. 12. R.C. Merkle, "Method and Apparatus for Data Encryption," U.S. Patent 5,003,597, 26 Mar 1991. 13. S. Miyaguchi, "The FEAL-8 Cryptosystem and Call for Attack," Advances in CryptologybCRYPTO '89 Proceedings, Springer- Verlag, 1990, pp. 624-627. 14. S. Miyaguchi, "Expansion of the FEAL Cipher," NTT Review, v. 2, n. 6, Nov 1990. 15. S. Miyaguchi, "The FEAL Cipher Family," Advances in CryptologybCRYPTO '90 Proceedings, Springer-Verlag, 1991, pp. 627-638. 16. National Bureau of Standards, Data Encryption Standard, U.S. Department of Commerce, FIPS Publication 46, Jan 1977. 17. National Institute of Standards and Technology, "Clipper Chip Technology," 30 Apr 1993. 18. RSA Laboratories, Answers to Frequently Asked Questions About Today's Cryptography, Revision 2.0, RSA Data Security Inc., 5 Oct 1993. 19. B. Schneier, "Data Guardians," MacWorld, Feb 1993, 145-151. 20. B. Schneier, Applied Cryptography, John Wiley & Sons, New York, 1994. 21. J.L Smith, The Design of Lucifer, A Cryptographic Device for Data Communication, RC 3326, White Plains: IBM Research. 22. M.J. Weiner, "Efficient DES Key Search," Advances in CryptologybCRYPTO '93 Proceedings, Springer-Verlag, in preparation. 23. M.C. Wood, "Method of Cryptographically Transforming Electronic Digital Data from One Form to Another," U.S. Patent 5,003,596, 26 Mar 1991.