1. Pr[F] = Pr[E].  The location doesn't matter: there are 52 possibilities
for the 3rd card, and 13 of them are a spade, so Pr[F] = 13/52 = 1/4.

Here's a more careful approach.  The experiment is that we choose a random
permutation of the 52 cards.  There are 52! possible outcomes, and each
outcome can be represented as a sequence (w_1,..,w_52) of cards, where w_i
is the ith card from the top.  All outcomes are equally likely, so we have
a uniform distribution.  The event E is a set of outcomes, namely:
  E = {(w_1,..,w_52) : w_1 is a spade}
and
  F = {(w_1,..,w_52) : w_3 is a spade}
Consider the function f:E->F given by
  f(w_1,..,w_52) = (w_2,w_3,w_1,w_4,w_5,w_6,..,w_52)
This maps every outcome in E to an outcome in F, and vice versa, so f is
a bijection.  That means that |E| = |F|, so
  Pr[F] = |F|/|Omega| = |E|/|Omega| = Pr[E] = 1/13.
Alternatively, we could count the number of outcomes (w_1,..,w_52) in F
directly.  We find that there are 13 choices for w_3 that make the 3rd card
a spade, and then there are 51 choices for w_1 (everything but what we
chose for w_3), 50 choices for w_2 (everything but what we chose for w_3
and w_1), 49 choices for w_4, 48 choices for w_5, etc., so in all we find
that |F| = 13 * 51!.  Hence
  Pr[F] = |F|/|Omega| = (13 * 51!)/52! = 13/52 = 1/4.

Comment: This illustrates a principle known as the "principle of
deferred decisions".  Often, it is useful to think of a probabilistic
experiment as a sequence of random choices.  One way to think of the
experiment here is that first the top card is chosen randomly from all
52 cards, then the next card is chosen randomly from all remaining 51
cards, etc.  The principle of deferred decision says that we can
instead choose these cards in any other order that is convenient to us,
so another way to think of the experiment is that first we choose the
3rd card, then we choose the top card, then the 2nd card, and so on.
The latter way of thinking about it is more convenient for this
problem, so we use that.  The principle of deferred decisions says you
can use whichever is most convenient.  If you like, you can think of
the principle of deferred decisions as a "lazy" version of the experiment,
where we make a choice only when we need it for the problem being analyzed.
(This is "lazy" in the computer science sense.)


2. E[f(X)] is larger:
  E[f(X)] = 1/2 * (0-2)^2 + 1/2 * (1-2)^2 = 2.5.
  f(E[X]) = f(1/2) = (1/2 - 2)^2 = 2.25.