These notes are a modified version of the class lecture notes. Additions - diagrams and additional notes - are indicated by "-" + Inference Controls + The goal - suppose you want people to be able to get sta- tistical information (e.g. averages) out of a database, but not get individual data. E.g. the average salary of all people living in zip 94720. + System can be designed to answer only such statisti- cal queries, but not individual ones. - Not personal ones. E.g. Not the personal annual salary of the Dean of Students + The problem - can design sets of queries that will gen- erate individual info. E.g. (a) average salary of all X. (b) average salary of X-delta, where delta describes only one individual. (c) size of X. + These three queries permit us to deduce delta's salary. + No good solution to this problem. - While the example given is easy to spot, one can set up a series of linear queries/equations, and deduce anything + Can do some things: + Randomize data (slightly) - i.e. introduce small - .16 - errors. - Similar to a student proposed solution of giving "averages" based on every other sample. Introduces an error. - Or just introduce straight up noise, approximate large numbers by +-1%. Errors add up and make it dificult to pull out accurate individual information using linear equations. (due to the nature of the equations with errors being multiplied out. You could concoct and example like the one in class, but it's fairly trivial to do yourself with a calculator if you really cared) -Still not that good because as you do the same query over and over, you can reduce the error to a smaller and smaller degree. + Permit only queries on predefined groups - e.g. zip codes. - Limit number of queries from single person per topic or on specific groups. (make the groups big enough so that it is hard to separate individual information) + The Confinement Problem - Old problem with shared computing that is coming back due to the costs issue and maintanence. google web apps. etc. + Problem of mutually suspicious customer and service. Want to insure that the service can only reach informa- tion provided by customer, and that the service is pro- tected from the customer. + Idea is concept of information utility. Idea currently resurfacing as server based software. + Two problems remain: service may not perform as adver- tised, and it may leak - i.e. transmit confidential data. -E.g. tax information between user and tax service + List of possible leaks: + If the service has memory, it can collect data. + E.g. it can write into a permanent file. + It can write to a temporary file which can be read by the spy. - (anyone with super user priveledges) + The service can send a message to a process con- trolled by its owner. + The information can be encoded in the bill rendered for service. + If the file system has interlocks, the service can lock and unlock a file, and the spy can watch to see if the file is locked. Can use like morse code. + The service can vary the paging rate (which affects performance). -or even vary cpu rate; calculating pi, etc etc - .17 - + Viruses + Really only appear in PCs. PCs transfer around execut- able files and code - e.g. in email. - windows coded without the "bad guys" in mind. So there's no distinction between code and data. + User executes this code, and bad things happen. + Virus usually replicates itself elsewhere + and does something unpleasant to your machine. + General technique is to search for known viruses by look- ing for their object code. -akin to antibodies; the problem is only known virus patterns. The first few have to be infected first. If it's a quickly propagating virus, the time lag for the patch might be a bit long + Problem is that viruses encrypt themselves. + Solution is to search for decryption code + Viruses may change the decryption code. + Solution is to interpretively execute the suspected virus code for some portion of time, to see if the code decrypts itself into something that is recognized as common virus. + There is no good defense against an unknown virus, since the code patterns can't be recognized. - "honey pot approach" one can have several unprotected computers attached to the network, and every time they get infected, you quickly see what did it and how, and issue a patch. - windows: one of it's other main problems is that it's popular and used by a lot of stupid people - Buffer overflow: call something and you pass a perameter, having information and a length. Put in a huge string, and it will be put into a buffer and overwrite code in the program. The favored way to break into code. (people just don't check for this when coding) - .18 - Topic: Encryption + I recommend Kahn, "The Codebreakers". See also Whitfield Diffie and Martin Hellman, "Privacy and Authentication: An Introduction to Cryptography", Proc. IEEE, 67, 3, March, 1979, pp. 397-427. + Popular approach to security in computer systems: encryption. Store and transmit information in an encoded form. + Cryptography - the use of transformation of data intended to make the data useless to one's opponents. + Note that encryption is not new - has been used since times of the Romans - ``Caesar Cipher''. Key1 Key2 V V clear text -> encrypt ---> cipher text ---> decrypt ---> clear text V listener - .19 - 30:30 to 9:30 + The basic mechanism: + Start with text to be protected. Initial readable text is called clear text. + diagram + Encrypt the clear text so that it doesn't make any sense at all. The nonsense text is called cipher text. The encryption is controlled by a secret password or number; this is called the encryption key. + The encrypted text can be stored in a readable file, or transmitted over unprotected channels. + To make sense of the cipher text, it must be decrypted back into clear text. This is done with some other algo- rithm that uses another secret password or number, called the decryption key. + All of this only works under three conditions: + The encryption function cannot easily be inverted (cannot get back to clear text unless you know the decryption key). + The encryption and decryption must be done in some safe place so the clear text can't be stolen. + The keys must be protected. In most systems, can compute one key from the other (usually the encryption and de- cryption keys are identical), so can't afford to let ei- ther key leak out. -Interesting problem is key distribution, how to safely give the key to only the correct people + Types of Crytographic Systems: - .20 - + (Simple) Substitution: There is a function f(x) which maps each letter of the plaintext (or group of letters) into f(x). f(x) must be 1-1 or one to many. If f(x)=x+1, then called a Caesar Cipher. -example: Smith ---> Tnjui + Solved by using tables of frequencies of letters, doubles, triples, etc. - e and t are most commmon, use the sample and frequency distribution - also combinations of letters are used: th is a common combo. q almost never shows up without u + Mapping "to many" disguises frequency. - ie: use 8 bit characters, 256 possibilities. e can now be 12 different bit patterns + Transposition: Permute (or transpose) the input in blocks to obtain the output. - example diagram: The quick brown fox jumps --------- |T|h|e| |q| --------- |u|i|c|k| | --------- |b|r|o|w|n| --------- |f|o|x| |j| --------- |u|m|p|s| | --------- -now it is: Tubfuhirombcoxp kw s q n j - To solve: frequency analysis again, using different block sizes and shapes together with known letter combinations (pairs, triples, quadruples, etc) + Look for permutations that rejoin commonly used letter pairs, such as "th". + Polyalphabetic Ciphers - substition cipher, where f(i,x) is a function of i, which is the sequence number of the letter in the text. Typically periodic in i. Can get long periods by using two functions with relatively prime periods. - Example: Caesar's cipher on alternate letters Reverse Caesar's cipher on the remaining letters QUICK BROWN FOX RTJOL CQPVO EPW || |L__> X=X-1 L___> X=X+1 - Encryption and history: The side with the decryption has an enormous advantage in battles - such as WW2 + Solved in two steps. First look for repeated strings, and count the number of letters between them. Least common denominator of distance between strings is the period. (Or can look at frequencies of letters K apart, until they look ok, then K is period of cipher.) Then solve each of N ciphers separately, using frequency methods. -Unfortunately as the cipher is more complex, you will need a much bigger sample (this is what makes polyalphabetic much more difficult to solve) -Usually the more complicated, the better. However there is usually a trade off between confusing the spy and confusing the user. Sometimes introducing more complexities doesn't actually make solving it more diffcult, but just decrypting it more tedious + Old fashioned coding machines (e.g. Hagelin machines) worked as polyalphabetic cipher - had rotating wheels with relatively prime number of cogs. Code was pro- duct of path through wheels. - .21 - + Running Key Cipher - use key as long as message - e.g. text of book. (but not random) -Old spy movies: All spies carry around a book (Pride and Prejudice perhaps?), and these are used to decrypt the messages - they say, "start out on page 231, line 8" and use the letters as the key, xoring or what not with the message. + Solve: use probably word; substitute it everywhere (i.e. XOR it with the cipher text) and see if a recognizable word pops out. If so, work backward and forward by context. Or, use frequency methods - but frequencies are now products of key and message fre- quencies, so quite hard. - You "guess" and see what (hopefully) words come out of your guess, telling you if you've succeeded or not - unfortunately you'll need very large samples, which they won't give you if they are smart. + Codes - take linguistic units of input (e.g. words) and use a code book (large table) to map them into output - e.g. letter groups. (Can also encode phrases.) -used to sell these books, and lots of letters/words map to one letter, making it a cheaper message to send (when it was more expensive back then to communicate "bales of cotton prices" for example) -So people with the same code book, can encrypt a several sentence message into a few letters + Hard to solve. Try frequency counts. Also known plaintext method. - The easiest way to break it is to probably just capture the physical code book - Known plaintext method: If you have a copy of the encrypted message and a copy of the decrypted message it gives you a really long way to crack the code. -figure out the key with a sample message then can get K. From K, we can decrypt messages. + Key U is xor of U1 and U2. U1 and U2 held by different federal agencies. Can get both U1 and U2 only with court ordered wiretap.