These notes are a modified version of the class lecture notes.
Additions - diagrams and additional notes -  are indicated by "-"


     +   Inference Controls

         +   The goal - suppose you want people to be able to get sta-

             tistical  information  (e.g. averages) out of a database,

             but not get individual data.  E.g. the average salary  of

             all people living in zip 94720.

             +   System can be designed to answer only such  statisti-

                 cal queries, but not individual ones.

		 -   Not personal ones. E.g. Not the personal annual salary
			of the Dean of Students

         +   The problem - can design sets of queries that  will  gen-

             erate individual info.  E.g. (a) average salary of all X.

             (b) average salary of X-delta, where delta describes only

             one individual.  (c) size of X.

             +   These three  queries  permit  us  to  deduce  delta's

                 salary.

         +   No good solution to this problem.

		- While the example given is easy to spot, one can set up 
		a series of linear queries/equations, and deduce anything

             +   Can do some things:

                 +   Randomize data (slightly) - i.e. introduce  small


                                  - .16 -


                     errors.

		     - Similar to a student proposed solution of giving
			"averages" based on every other sample. Introduces 
			an error.

		     - Or just introduce straight up noise, approximate 
			large numbers by +-1%. Errors add up and make it
			dificult to pull out accurate individual information 
			using linear equations. (due to the nature of the 
			equations with errors being multiplied out. You
			could concoct and example like the one in class,
			but it's fairly trivial to do yourself with a 
			calculator if you really cared) 

				-Still not that good because as you do the same
				 query over and over, you can reduce the error 
				 to a smaller and smaller degree.

                 +   Permit only queries on predefined groups  -  e.g.

                     zip codes.
			
			- Limit number of queries from single person per topic
			or on specific groups. (make the groups big enough so
			that it is hard to separate individual information)


     +   The Confinement Problem

	   - Old problem with shared computing that is coming back due
		to the costs issue and maintanence. google web apps. etc.

         +   Problem of  mutually  suspicious  customer  and  service.

             Want  to  insure that the service can only reach informa-

             tion provided by customer, and that the service  is  pro-

             tected from the customer.

             +   Idea  is  concept  of  information   utility.    Idea

                 currently resurfacing as server based software.

         +   Two problems remain:  service may not perform  as  adver-

             tised, and it may leak - i.e. transmit confidential data.

		-E.g. tax information between user and tax service


         +   List of possible leaks:

             +   If the service has memory, it can collect data.

                 +   E.g. it can write into a permanent file.

                 +   It can write to a temporary  file  which  can  be

                     read by the spy.

			- (anyone with super user priveledges)

             +   The service can send a  message  to  a  process  con-

                 trolled by its owner.

             +   The information can be encoded in the  bill  rendered

                 for service.

             +   If the file system has interlocks,  the  service  can

                 lock  and unlock a file, and the spy can watch to see

                 if the file is locked.  Can use like morse code.

             +   The service can vary the paging rate  (which  affects

                 performance).

		-or even vary cpu rate; calculating pi, etc etc


                                  - .17 -


     +   Viruses

         +   Really only appear in PCs.  PCs transfer  around  execut-

             able files and code - e.g. in email.

	- windows coded without the "bad guys" in mind. So there's no
	distinction between code and data.

         +   User executes this code, and bad things happen.

             +   Virus usually replicates itself elsewhere

             +   and does something unpleasant to your machine.

         +   General technique is to search for known viruses by look-

             ing for their object code.

		-akin to antibodies; the problem is only known virus
		patterns. The first few have to be infected first.
		If it's a quickly propagating virus, the time
		lag for the patch might be a bit long


             +   Problem is that viruses encrypt themselves.

                 +   Solution is to search for decryption code

             +   Viruses may change the decryption code.

                 +   Solution  is  to   interpretively   execute   the

                     suspected virus code for some portion of time, to

                     see if the code decrypts  itself  into  something

                     that is recognized as common virus.

         +   There is no good defense against an unknown virus,  since

             the code patterns can't be recognized.

		- "honey pot approach"
		one can have several unprotected computers attached
		to the network, and every time they get infected, you 
		quickly see what did it and how, and issue a patch.
		

	- windows: one of it's other main problems is that it's popular
	and used by a lot of stupid people


	- Buffer overflow: call something and you pass a perameter, 
	having information and a length. Put in a huge string, and
	it will be put into a buffer and overwrite code in the
	program. The favored way to break into code. (people
	just don't check for this when coding)


                                  - .18 -


             Topic: Encryption


         +   I recommend Kahn, "The Codebreakers".  See also Whitfield

             Diffie  and  Martin Hellman, "Privacy and Authentication:

             An Introduction to  Cryptography",  Proc.  IEEE,  67,  3,

             March, 1979, pp. 397-427.


     +   Popular approach to security in computer systems: encryption.

         Store and transmit information in an encoded form.

         +   Cryptography - the use of transformation of data intended

             to make the data useless to one's opponents.

         +   Note that encryption is not new -  has  been  used  since

             times of the Romans - ``Caesar Cipher''.


                               Key1                            Key2

                                V                               V

   clear text ->  encrypt --->  cipher text --->   decrypt ---> clear text

                                                V

                                             listener


                                  - .19 - 30:30 to 9:30


     +   The basic mechanism:

         +   Start with text to be protected.  Initial  readable  text

             is called clear text.

             +   diagram

         +   Encrypt the clear text so that it doesn't make any  sense

             at  all.   The  nonsense text is called cipher text.  The

             encryption is controlled by a secret password or  number;

             this is called the encryption key.

         +   The encrypted text can be stored in a readable  file,  or

             transmitted over unprotected channels.

         +   To make sense of the cipher text, it  must  be  decrypted

             back into clear text.  This is done with some other algo-

             rithm that uses another secret password or number, called

             the decryption key.


     +   All of this only works under three conditions:

         +   The encryption function cannot easily be inverted (cannot

             get  back  to  clear  text unless you know the decryption

             key).

         +   The encryption and decryption must be done in  some  safe

             place so the clear text can't be stolen.

         +   The keys must be protected.  In most systems, can compute

             one  key  from  the other (usually the encryption and de-

             cryption keys are identical), so can't afford to let  ei-

             ther key leak out.

		-Interesting problem is key distribution, how to safely
		give the key to only the correct people


     +   Types of Crytographic Systems:


                                  - .20 -


         +   (Simple) Substitution: There is  a  function  f(x)  which

             maps  each  letter of the plaintext (or group of letters)

             into f(x).   f(x)  must  be  1-1  or  one  to  many.   If

             f(x)=x+1, then called a Caesar Cipher.

		-example: Smith ---> Tnjui

             +   Solved by using tables  of  frequencies  of  letters,

                 doubles, triples, etc.

		- e and t are most commmon, use the sample and
		frequency distribution
		- also combinations of letters are used: 
		th is a common combo. q almost never shows up without u

                 +   Mapping "to many" disguises frequency.

		- ie: use 8 bit characters, 256 possibilities. 
		e can now be 12 different bit patterns

         +   Transposition: Permute (or transpose) the input in blocks

             to obtain the output.
		- example diagram: The quick brown fox jumps

		 ---------
		|T|h|e| |q|
		 ---------
		|u|i|c|k| |
		 ---------
		|b|r|o|w|n|
		 ---------
		|f|o|x| |j|
		 ---------
		|u|m|p|s| |
		 ---------
		
		-now it is: Tubfuhirombcoxp kw s q n j 

		- To solve: frequency analysis again, using different
		block sizes and shapes together with known letter 
		combinations (pairs, triples, quadruples, etc)

             +   Look  for  permutations  that  rejoin  commonly  used

                 letter pairs, such as "th".

         +   Polyalphabetic Ciphers - substition cipher, where  f(i,x)

             is  a  function of i, which is the sequence number of the

             letter in the text.  Typically periodic in  i.   Can  get

             long periods by using two functions with relatively prime

             periods.

		- Example:

	Caesar's cipher on alternate letters
	Reverse Caesar's cipher on the remaining letters

	QUICK BROWN FOX	
	RTJOL CQPVO EPW
	||
	|L__> X=X-1
	L___> X=X+1

		- Encryption and history: The side with the decryption
		has an enormous advantage in battles - such as WW2


             +   Solved  in  two  steps.   First  look  for   repeated

                 strings,  and  count  the  number  of letters between

                 them.  Least common denominator of  distance  between

                 strings  is the period.   (Or can look at frequencies

                 of letters K apart, until they look  ok,  then  K  is

                 period  of  cipher.)  Then  solve  each  of N ciphers

                 separately, using frequency methods.

		-Unfortunately as the cipher is more complex, you will
		need a much bigger sample (this is what makes 
		polyalphabetic much more difficult to solve)

		-Usually the more complicated, the better. However
		there is usually a trade off between confusing the
		spy and confusing the user. Sometimes introducing
		more complexities doesn't actually make solving it	
		more diffcult, but just decrypting it more tedious

             +   Old fashioned coding machines (e.g. Hagelin machines)

                 worked as polyalphabetic cipher - had rotating wheels

                 with relatively prime number of cogs.  Code was  pro-

                 duct of path through wheels.


                                  - .21 -


         +   Running Key Cipher - use key as long as  message  -  e.g.

             text of book.  (but not random)
		-Old spy movies: All spies carry around a book (Pride
		and Prejudice perhaps?), and these are used to decrypt
		the messages - they say, "start out on page 231, line 8"
		and use the letters as the key, xoring or what not with
		the message.

             +   Solve: use probably word;  substitute  it  everywhere

                 (i.e.  XOR  it  with  the  cipher  text) and see if a

                 recognizable word pops out.  If so, work backward and

                 forward  by context.  Or, use frequency methods - but

                 frequencies are now products of key and message  fre-

                 quencies, so quite hard.
		- You "guess" and see what (hopefully) words come out
		of your guess, telling you if you've succeeded or not
		- unfortunately you'll need very large samples, which
		they won't give you if they are smart.

         +   Codes - take linguistic units of input (e.g.  words)  and

             use  a  code book (large table) to map them into output -

             e.g. letter groups.  (Can also encode phrases.)
		-used to sell these books, and lots of letters/words
		map to one letter, making it a cheaper message to send
		(when it was more expensive back then to communicate
		"bales of cotton prices" for example)
		-So people with the same code book, can encrypt a 
		several sentence message into a few letters

             +   Hard to solve.  Try  frequency  counts.   Also  known

                 plaintext method.
		- The easiest way to break it is to probably just 
		capture the physical code book

		- Known plaintext method: If you have a copy of the
		encrypted message and a copy of the decrypted message
		it gives you a really long way to crack the code.
			-figure out the key with a sample message


                 then can get K.  From K, we can decrypt messages.


         +   Key U is xor of U1 and U2.  U1 and U2 held  by  different

             federal agencies.  Can get both U1 and U2 only with court

             ordered wiretap.