Jekyll2018-10-17T21:45:35+00:00https://kevchn.com/feed.xmlKevin ChenUndergraduate student at the University of Maryland studying computer science.Introduction and Exercises for Bayes Decision Theory and Maximum Likelihood Estimation2018-10-15T00:00:00+00:002018-10-15T00:00:00+00:00https://kevchn.com/machine%20learning/2018/10/15/bayes-mle<p>We provide an exercise-oriented introduction to bayes decision theory and maximum likelihood estimation and parameter estimation.</p>
<h2 id="bayes-decision-theory">Bayes Decision Theory</h2>
<p>Bayesian decision theory revolves around making classification decisions based on probabilities and the cost of making such decisions. This makes the assumption that we know (or can derive) all the relevant probabilities.</p>
<p>We denote the state of nature as w, which can take on different classes such as w1 for sea bass or w2 for salmon. We also may have prior probabilities for each class $P(w_i)$. If we don’t know any prior information about the classes, we can treat each prior the same: $P(w_i) = 1/c$, where c is the number of classes.</p>
<p>Class conditional densities, or, likelihood, $p(x \vert wi)$ represent the probability density of a feature x, given some class wi. For example, one such class conditional density could be the probability density of height of a salmon (e.g 25% are 5ft, 75% are 6 ft).</p>
<p>Bayes rule tells us that the posterior probability, $P(wi \vert x) = p(x \vert wi) * P(wi) / P(x)$. This give us a way to derive necessary probabilities to make our decisions. It also tells us that the probability of a certain class given some observed feature is proportional to the likelihood multiplied by the prior. We’ll later see that maximizing the probability of a certain class is what gives us a good decision, so finding the class with maximum likelihood * prior is how we make a good decision.</p>
<p>Note that $P(x)$, or the evidence, is equivalent to the sum of likelihood and prior products. This way, summing all of the class probabilities given some evidence equates to 1- this expensive normalization can be ignored since all class probabilities given a feature x will have the same $P(x)$.</p>
<p>We also know that the probability of two events happening, $P(a,b)$, is equivalent to the probability of one event happening multiplied by the conditional probability of the second event happening if the first event happens $P(a) P(b \vert a)$.</p>
<p><strong>Theorem: Maximization of Posterior Decision Rule Minimizes Error Probability</strong></p>
<p>Suppose we have the following decision rule for binary classification: given some observed feature, if the posterior probability of class 1 is greater than the posterior probability of class 2, then decide on class 1, else decide on class 2. The probability of making an error given some evidence in this case would be 1 - posterior of the class we decided on = posterior of the class we didn’t decide on.</p>
<p>In order to find the total probability of making an error using this decision rule, we can take the integral of the probability of making an error and observing feature x. This is equivalent to the integral of the probability of making an error given the observation of a feature x, multiplied by the total probability density of the feature. So now we have the probability of error of our decision rule given that we know or can derive the relevant probabilities (posterior and evidence).</p>
<p>To make a good decision rule, we can try to decrease our probability of error by minimizing the probability of making an error given the observation of any feature x. Note that we can’t modify the evidence since that’s an aspect of the dataset, not our decision rule. Our decision rule does in fact minimize $P(error \vert x)$, so our decision rule is optimal for binary classification in terms of making minimal error. Thus we have justified that for binary classification, we should decide on the class with the highest $P(Wi \vert x)$, as this will minimize our error.</p>
<p>Note that we can also rewrite our decision rule using bayes rule: if $P(x \vert wi) * P(wi) > P(x \vert w2) * P(w2)$, then choose w1 else choose w2.</p>
<h1 id="generalization-of-bayes-decision-theory-with-loss-functions">Generalization of Bayes Decision Theory with Loss Functions</h1>
<p>We can generalize x from a scalar feature to feature vectors, without changing any proofs. Also, by looking at non-classification problems, instead of looking at error probabilities, we can choose something more general to minimize- a loss function (where a loss is defined per action, assuming some class).</p>
<p>If we take some action, such as classification, then the expected loss of taking that action, or conditional risk, is equivalent to the summation of all the losses of taking that action if the class is some class, multiplied by the probability that that class is the actual class, given the observed feature. We can then calculate the total risk, or expected loss, over all actions, of our decision rule by taking the integral over the probability density of the feature multiplied by the
risk of taking an action, given that we have observed the feature.</p>
<p><strong>Likelihood Ratio:</strong>
By writing out the conditional risk in terms of the loss and the posteriors, we can obtain an alternative representation of the minimum-risk decision rule algebrarically: decide w1 if likelihood ratio (likelihood of $w1$ over likelihood of $w2$) > misclassification ratio $(L12 - L22 / L21 - L11)$ * prior ratio. This allows for an interpretation of the Bayes decision rule as deciding w1 if the likelihood ratio meets some threshold decided by the loss and the priors.</p>
<h1 id="minimum-error-classification">Minimum Error Classification</h1>
<p>To get a concrete loss function for generalization, we can define a loss function, called the zero-one loss function, as 0 if the action classifies to the correct state of nature, and 1 if not. This loss function assigns 0 loss to the correct decision, and 1 for any error. The risk would be defined as the summation over the loss multiplied by the posterior, which would then just be the summation over the posterior for classes that do not correspond to the chosen classification, which is
then just 1 - the posterior of the classified class.</p>
<h2 id="likelihood-normalization-and-likelihood-ratio-practice-problems">Likelihood Normalization and Likelihood Ratio Practice Problems</h2>
<p>Suppose two equally probable one-dimensional densities are of the form $p(x \vert ωi) ∝ e − \vert x−ai \vert /bi (1)$ for i = 1, 2 and $0 < bi$.</p>
<p><strong>(Problem)</strong> Write an analytic expression for each density- normalize each function for arbitrary ai and positive bi.</p>
<p>To normalize the likelihood density, we must multiply the integral from negative infinity to positive infinity of the likelihood by some constant such that the result is equivalent to 1. After writing this equation out, we can remove the absolute value by splitting the equation into the sum of two integrals: from a to inf, and -inf to a, for when the absolute value actually applies. Then to integrate $ce^{-x+a}/b dx$, we use u-substitution to turn the integral into $bce^{-x+a} du$, which is just $-bce^{-x+a}$. Then we have that $c = 1/2bi$.</p>
<p>categories: likelihood, absolute value splitting, u-substition</p>
<p><strong>(Problem)</strong> Calculate the likelihood ratio as a function of your four variables.</p>
<p>We simply multiply our likelihoods by our constants obtained above and substitute in i=1, i=2. Then we do simple algebraic manipulation of exponentials and fractions and obtain $b2/b1 e^(- \vert x-a1 \vert b2 + \vert x-a2 \vert b1)/b1b2$.</p>
<p>categories: likelihood ratio, algebra</p>
<p><strong>(Problem)</strong> Sketch a graph of the likelihood ratio p(x \vert ω1)/p(x \vert ω2) for the case $a1 = 0, b1 = 1, a2 = 1$ and $b2 = 2$.</p>
<p>First we can substitute the given values into our equation above. To sketch the graph, we can plug in every point manually or split the function into a piecewise function (because of the absolute values). The splitting points are at x=0, x=1, so we have 3 partitions: -inf to 0, 0 to 1, 1 to inf. Then we evaluate the points at 0 and 1 to get $2e^½$ and $2/e$. Then we can vaguely sketch the plots which are all variants of $e^x$ with some transformations (the main thing to get right is the slope,
which depends on the sign of x).</p>
<p>categories: piecewise function, absolute value splitting</p>
<h1 id="position-of-minimum-error-decision-boundary-and-posterior-behavior">Position of Minimum Error Decision Boundary and Posterior Behavior</h1>
<p>Let the conditional densities for a two-category one-dimensional problem be given by the Cauchy distribution described as: $p(x \vert ωi) = 1/πb · 1/(1 + (x−ai/b)2)$ where $i = 1, 2$.</p>
<p><strong>(Problem)</strong> Check that the likelihoods are indeed normalized</p>
<p>We must check that the conditional is equal to one. By performing u substitution on the denominator term, we get $du = b dx$, cancelling out the $1/b$ factor. Then we see that since the function is symmetric, the integral can be changed to 2 * integral from 0 to infinity. Then we perform u substitution on u, setting it equal to tan(z), so $du = sec^2(z)dz$. The integral limits then change from 0 to inf to 0 to pi/2 $(arctan(inf) = pi/2)$. Then we have $1/1+tan^2(z) * sec^2(z)$, which we can trigonometrically manipulate into 1. Then taking the integral of 0 to $\pi$ of 1 we get $\pi$. $1/\pi * \pi = 1$, so we have confirmed the likelihoods are indeed normalized.</p>
<p>categories: u-substitution</p>
<p><strong>(Problem)</strong> Assuming $P(ω1) = P(ω2)$, show that $P(ω1 \vert x) = P(ω2 \vert x)$ if $x = (a1 + a2)/2$, that is, the minimum error decision boundary is a point midway between the peaks of the two distributions, regardless of b.</p>
<p>From assuming the priors are equal, we show that when the posteriors are equal, the likelihoods are equal by bayes rule. By equating the posteriors given by the cauchy distribution above, the posteriors are either the same or $x = (a1 + a2)/2$.</p>
<p>categories: bayes rule, likelihood, decision boundary</p>
<p><strong>(Problem)</strong> How do $P(ω1 \vert x)$ and $P(ω2 \vert x)$ behave as $x → −∞$ and $x → +∞$?</p>
<p>As x goes to infinity or negative infinity, the likelihood goes to $1/\pi * b$. When comparing binary posteriors, we can ignore the evidence since they’re equal by definition. The posteriors also have equal likelihoods in this case. Therefore the ratio of posteriors is equivalent to their priors. If they have equal priors, like in this problem, the posteriors are equivalent and thus must equal ½. We can also solve this problem by explicitly writing out the posterior formulation.</p>
<p>categories: likelihood, posterior formulation</p>
<h1 id="bayes-decision-boundary-problem-for-gaussian-discriminant">Bayes Decision Boundary Problem for Gaussian Discriminant</h1>
<p>Consider a two-category classification problem in two dimensions with $p(x \vert ω1) ∼ N (0, I), p(x \vert ω2) ∼ N((1c1), I)$ and $P(ω1) = P(ω2) = 1/2$.</p>
<p><strong>(Problem)</strong> Calculate the Bayes decision boundary.</p>
<p>Gaussian Discriminant Function: The decision boundary is the set of points where the discriminant functions are equal ($g1(x) = g2(x)$). To classify variables with gaussian likelihoods, we use the discriminant function for the gaussian case.</p>
<p>categories: gaussian discriminant</p>
<h1 id="decision-rule-for-classes-with-opposite-likelihoods-and-independent-binary-features">Decision Rule for Classes with Opposite Likelihoods and Independent Binary Features</h1>
<p>Let the components of the vector $x = (x1, …, xd) t$ be independent, binary-valued (0 or 1), with d odd, and $pi1 = p(xi = 1 \vert ω1) = p > 1/2 i = 1, …, d pi2 = p(xi = 1 \vert ω2) = 1 − p i = 1, …, d (4)$ and $P(ω1) = P(ω2) = 1/2$.</p>
<p><strong>(Problem)</strong> Show that the minimum-error-rate decision rule becomes Decide ω1 if X d i=1 xi > d/2 and ω2 otherwise.</p>
<p>Simply plug in discriminant function for classes with independent binary features and known likelihoods.</p>
<p>categories: gaussian discriminant</p>
<h1 id="explicit-posterior-calculation-for-classification-on-gaussian-classes">Explicit Posterior Calculation for Classification on Gaussian Classes</h1>
<p>Suppose we have three categories in two dimensions with the following underlying distributions:
<script type="math/tex">p(x \vert ω1) ∼ N (0, I)</script></p>
<script type="math/tex; mode=display">p(x \vert ω2) ∼ N 1 1 , I</script>
<p><script type="math/tex">p(x \vert ω3) ∼ 1 2N 0.5 0.5, I + 1 2N −0.5 0.5, I \text{with} P(ωi) = 1/3, i = 1, 2, 3</script>.</p>
<p><strong>(Problem)</strong> By explicit calculation of posteriors, classify the point x = 0.3 0.3 for minimum probability of error.</p>
<p>Gaussian Likelihood/Posterior: Simply plug in the distribution values into the gaussian likelihood and multiply by the prior to get the posterior. Pick the highest posterior to minimize error, w1.</p>
<p><strong>(Problem)</strong> Suppose that for a particular test point the first feature is missing. That is, classify $x = ∗ 0.3$.</p>
<p>Recalculate the posterior except integrate over the missing value x1 from -inf to inf.</p>
<h2 id="maximum-likelihood-estimation-practice-problems">Maximum Likelihood Estimation Practice Problems</h2>
<p>Let x have a uniform density $p(x \vert \theta) ∼ U(0, \theta) = 1/\theta, 0 ≤ x ≤ \theta, 0$ otherwise.</p>
<p><strong>(Problem)</strong> Suppose that n samples D = {x1, …, xn} are drawn independently according to p(x \vert \theta). Show that the maximum-likelihood estimate for \theta is max[D] - that is, the value of the maximum element in D.</p>
<p>Maximum-Likelihood:
$P(D \vert \theta) = \prod_{k=1}^n p(xk \vert \theta)$</p>
<h1 id="mle-with-poor-models">MLE with Poor Models</h1>
<p><strong>(Problem)</strong> Show that if our model is poor, the maximum likelihood classifier we derive is not the best — even among our (poor) model set — by exploring the following example. Suppose we have two equally probable categories (i.e., $P(ω1) = P(ω2) = 0.5$). Furthermore, we know that $p(x \vert ω1) ∼ N (0, 1)$ but assume that $p(x \vert ω2) ∼ N (µ, 1)$. (That is, the parameter $\theta$ we seek by maximum-likelihood is the mean of the second distribution.) Imagine, however, that the true underlying distribution is $p(x \vert ω2) ∼ N (1, 106 )$.</p>
<p><strong>(Problem)</strong> What is the value of our maximum-likelihood estimate µb in our poor model, given a large amount of data?</p>
<p>Gaussian distribution with variance = 1. Recall that $\mu = 1/n * \sum\limits_{k=1}^n x^k$, aka just the average. So $\mu = 1$.</p>
<p><strong>(Problem)</strong> What is the decision boundary arising from this maximum likelihood estimate in the poor model?</p>
<p>Gaussian Probability Density: Equate gaussians</p>
<p><strong>(Problem)</strong> Ignore for the moment the maximum likelihood approach, and use the method from Chapter 2 to derive the Bayes optimal decision boundary given the true underlying distributions: $p(x \vert ω1) ∼ N (0, 1)$ and $p(x \vert ω2) ∼ N (1, 106)$. Be careful to include all portions of the decision boundary</p>
<p>Simply do gaussians with true variance</p>
<p><strong>(Problem)</strong> Now consider again classifiers based on the (poor) model assumption $p(x \vert ω2) ∼ N (µ, 1)$. Using your result immediately above, find a new value of µ that will give lower error than the maximum-likelihood classifier.</p>
<p>We know our model is bad (variance is wrong and MLE thus gave a number that maximizes likelihood but doesn’t mininize the real-world error), but now we know the decision boundary we want, so we will give this bad model some mean to reach that decision boundary. We found that the decision boundary is the average of the means, so we can set the mean to $2 * \text{ideal decision boundary}$ found above.</p>
<h2 id="resources-and-acknowledgments">Resources and Acknowledgments</h2>
<p>Several exercises provided or inspired by <a href="https://www.amazon.com/Pattern-Classification-Pt-1-Richard-Duda/dp/0471056693">Pattern Classification</a>, Duda & Hart.</p>
<h1 id="further-readings">Further Readings</h1>
<p><a href="https://ocw.mit.edu/courses/mathematics/18-05-introduction-to-probability-and-statistics-spring-2014/readings/MIT18_05S14_Reading10b.pdf">MIT MLE Estimates</a></p>
<p>This short reading is a well-written tutorial from an undergrad MIT statistics class on using MLE for a variety of simple statistical problems. Useful to practicing recognizing where MLE can be/is applied.</p>We provide an exercise-oriented introduction to bayes decision theory and maximum likelihood estimation and parameter estimation.The Ultimate Guide to Preparing for CMSC1322017-01-04T00:00:00+00:002017-01-04T00:00:00+00:00https://kevchn.com/guide/umd/2017/01/04/cmsc132<p>CMSC132, or Object-Oriented Programming II, is one of the most fundamental courses for a computer science student here at Maryland.</p>
<p>This essential course continues with more advanced Java constructs like abstract classes, enums, and inner classes, but also addresses canonical topics like asymptotic complexity, recursion, graphs, and threads. The course’s instructors have noted that the course “covers a ton of material”, but you don’t have to stress because, aside from a few OOP concepts in the first few weeks of classes, nothing is gone into with overwhelming depth.</p>
<p><img src="/assets/csic_room.png" alt="CSIC" /></p>
<h1 id="background">Background</h1>
<p>Unlike its second-year counterparts (CMSC216, CMSC330, and CMSC351), CMSC132 isn’t meant to be a weed-out course. Instead, the class continues with the same format established in CMSC131, at a slightly faster pace. Everything is still taught in Java, there are still weekly projects that need to be submitted through <a href="http://www.cs.umd.edu/eclipse/">Eclipse to the UMD CS submit server</a>, and traditional homework is scarce (might vary by instructor).</p>
<p>If you performed decently well in CMSC131 or received an exemption, you should have no problem with the slightly faster pace of CMSC132, as long as you prepare and know what to expect.</p>
<h1 id="preparation">Preparation</h1>
<p>If you want to get ahead before the semester starts, read up on the lecture slides UMD has available from previous offerings of the course. <a href="http://cs.umd.edu/class/summer2014/cmsc132/schedule.shtml">Summer 2014 Slides</a>. It’s also good to take a look over the lecture material a few minutes before class each day.</p>
<p>Some of the slides can be dry, so for a quick introduction to hashing, traversals, shortest-path algorithms, and sorting algorithms– which encompass a huge chunk of the course– check out these <a href="https://visualgo.net">algorithm visualizations</a>.</p>
<p>On exams, some topics are treated more importantly than others. Here is a listing of topics that tend to require more focus, along with a set of mini-projects if you have the free time.</p>
<ul>
<li><strong>Generics</strong>: <a href="https://gist.github.com/kevchn/6671fc6fcc054c0e5e967aefeee2ec6a">Implement a box class that can be instantiated with any type of object, and has a method that returns the textual representation of the object in the box with toString().</a></li>
<li><strong>Iterator</strong>: <a href="https://gist.github.com/kevchn/a35cfcdb331e8f103551fdcc2f0547de">Create a class that contains a hardcoded int array and extends iterable. No generics. Test it with an enhanced for loop.</a></li>
<li><strong>Inheritance</strong>: Create a grandparent, parent, and child class. Give them different methods and play around with inheritance.</li>
<li><strong>Tree</strong>: Implement a binary tree using nodes with just an addNode and an in-order traversal method</li>
<li><strong>Graph</strong>: <a href="https://gist.github.com/kevchn/f1416aa5c3443ca0ee7546a48cb83fec">Create a very simple graph class using an adjacency list. An addNode, an addEdge, a getNeighbors, and a getVertices method.</a></li>
</ul>
<p>It can also useful to know the similarities/differences between sorting algorithms, which is why I compiled this table back when I was studying for finals: <a href="https://docs.google.com/spreadsheets/d/1bHWBKUhnkbNz8_7JgCg7rxTdDJWyBCPTzXb26OH--NE/pubhtml">Table of Sorting Algorithms</a>.
<em>Please note that the in-place and stable columns depend on how you implement the sorting algorithm, and depending on what your professor taught, it may differ for you. In the world of exam scores, professors’ words reign king.</em></p>
<p>Studying all of this obsessively before the class even starts is going to be overkill for most students. If you’re looking for an A, you’ll be fine spending a week or so getting an intuition for the material, and spending a few weekends during the semester reviewing/getting ahead on any tougher topics.</p>
<h1 id="studying-tips">Studying tips</h1>
<ul>
<li>Don’t buy the textbook unless your professor tells you to during the first day of classes. Your notes should be all you need.</li>
<li>When you’re studying: Class Notes > Practice Exams > Labs > Practice Questions > Projects. Practice exams are secondary to notes because the content of this course is so frequently reorganized.</li>
<li>If you’re struggling with a topic, you can talk with the professor after class or during office hours, talking with your TA, or searching on StackOverflow. The content of this course is pretty universal, and nearly any question you could have is only a few clicks away.</li>
</ul>
<h1 id="other-tips">Other tips</h1>
<ul>
<li>Don’t share any code (unless instructions say otherwise). <a href="https://www.reddit.com/r/UMD/comments/3wyy75/does_anyone_have_experience_with_comp_sci/">The UMD CS department takes cheating seriously</a>.</li>
<li>Start the projects early! Can you finish that project in one night? Probably. But if you have an error, you won’t have time to ask your TA for help. Don’t risk it.</li>
<li>Pick a lecturer that fits your learning style. For example, while Fawzi and Nelson are both widely regarded as great lecturers, Fawzi is more straight-to-business whereas Nelson prefers to incorporate more jokes in his lecture. Check out <a href="http://www.ratemyprofessors.com">Rate My Professors</a>, <a href="http://www.ourumd.com/class/CMSC132">OurUMD</a>, and the <a href="https://www.reddit.com/r/UMD/search?q=cmsc132+professor&restrict_sr=on">UMD subreddit</a> for more reviews.</li>
</ul>
<h1 id="a-final-word">A final word</h1>
<p>For some of you, this class will be a cakewalk, but for others, it might not be. Please don’t talk about how “easy” the last midterm or project was. We’re all in the same boat here: if you understand the material well, help a classmate out by pointing them in the right direction.</p>CMSC132, or Object-Oriented Programming II, is one of the most fundamental courses for a computer science student here at Maryland.Short introduction to using recursion2015-12-26T00:00:00+00:002015-12-26T00:00:00+00:00https://kevchn.com/2015/12/26/recursion<p>Coding up a recursive solution to a problem you’ve never seen before might seem tricky, but there’s a step-by-step process you can follow to figure out exactly when and how to code recursive solutions.</p>
<h3 id="first-when">First, when?</h3>
<p>You can use recursion in any problem where <strong><em>the answer can be obtained in an easy way, assuming you know the answer to a simpler problem</em></strong>.</p>
<p>For example, if you want to calculate the weight of the 50th floor of the empire state building, and you know that each floor is half as heavy as the one below it, then you could use recursion to calculate the 50th floor based on the 49th floor (which is based on the 48th … and so on), until you reach the base floor– whose weight you would need to know.</p>
<p><em>Note: If anyone is familiar with inductive reasoning (see CMSC250), the process of recursion is virtually identical.</em></p>
<h3 id="secondly-how">Secondly, how?</h3>
<p>Lots of students get frustrated mistakenly trying to think through a recursive algorithm <em>the whole way through</em> (e.g. through all 50 floors in the previous example), but that’s not the correct approach to construct a recursive solution. Instead, try this:</p>
<ol>
<li>Write down the simple base case</li>
<li>Pretend that you know the answer to the case(s) that are slightly simpler than the answer</li>
<li>Calculate the answer from the answer to the slightly simpler case(s)</li>
</ol>
<p>Let’s try this with the canonical university example: calculating the fibonacci number for a number <em>n</em>.</p>
<ol>
<li>We know that the fibonacci number for 0 is 0, and 1 is 1.</li>
<li>We pretend we know the answer to n - 1 and n - 2</li>
<li>We now can easily calculate the fibonacci number n, by adding the fibonacci of n - 1 and n - 2.</li>
</ol>
<p>The recursive solution is then as follows:</p>
<div class="highlighter-rouge"><div class="highlight"><pre class="highlight"><code>int fib(n) {
if n == 0:
return 0
if n == 1:
return 1:
return fib(n-1) + fib(n-2)
}
</code></pre></div></div>Coding up a recursive solution to a problem you’ve never seen before might seem tricky, but there’s a step-by-step process you can follow to figure out exactly when and how to code recursive solutions.How to get started training at the USACO Gateway2013-11-14T00:00:00+00:002013-11-14T00:00:00+00:00https://kevchn.com/guide/2013/11/14/usaco<p>This post is assuming that you already have a basic grasp of what algorithms are, and you are proficient in the Java programming language.</p>
<h2 id="what-is-usaco">What is USACO?</h2>
<p>What is USACO, and why should you use it to start to learn how to program competitively?</p>
<blockquote>
<p>The USACO (United States of America Computing Olympiad) is a programming competition for high schoolers in the United States. The top scorers in the USACO US Open Competition are selected to be part of the National U.S Programming Team, who will then compete in the IOI (International Olympiad in Informatics).</p>
</blockquote>
<p>The USACO.comanization has created an online training website for students to develop their programming skills in a variety of different problems, to be completed at the students’ own pace. The training problems are so well made, that the majority of the users of the training pages are from the national IOI teams of other countries (i.e China, Russia, India, etc.)</p>
<p>So how do you get started using this resource? Just visit http://train.usaco.com/usacogate, and sign up for a free account.</p>
<p>Keep in mind that the problems are meant to be difficult. Try tackling just 1 problem a day, and relying on online help if you get stuck (Stack Overflow). If you know anyone around your programming level, I’d recommend setting up an online group with them to keep track of your progress and ask for help if needed.</p>
<p>Now that you’ve got an account, sign in and start reading the texts/ doing the problems!</p>
<h2 id="doing-the-first-few-usaco-problems">Doing the first few USACO problems</h2>
<p>The site is represented as a list of links, each designated as either a TEXT, or a PROB. TEXTs explain what you should for the upcoming problems, and PROBs are the actual programming problems. The first two TEXTs are an introduction to the format of the website.</p>
<h3 id="setup">Setup</h3>
<p>The second TEXT has snippets of code for each programming language. Scroll down to the JAVA one and copy that. Then open up the programming IDE of your choice (ex: Eclipse), and create a new Java Project named USACO.</p>
<p>Then create a new class with the name of the PROB (in this case, it would just be “test”). Paste the “test” code into it, and save (ctrl+s).</p>
<p>Now, open up Windows Explorer (the program that you use to navigate files/folders), and search for your Eclipse Workspace (again, if you don’t know how, look it up). Once there, look for your project (it should be a folder named ‘USACO’), right click it, and create a shortcut. Drag that shortcut onto your desktop. Now you have easy access to your project folder!</p>
<h3 id="file-inputoutput">File input/output</h3>
<p>Double click on USACO again, and right click the window when it comes up. Click on create new file, and create “file.in” and “file.out”. You can then close the folder and go back to eclipse. You should see two new files named “file.in” and “file.out” in the workspace to the left. If it is named “file.in.txt” or something similar, right click the file, and refactor/rename it to file.in. Same thing with file.out.</p>
<p>If you use Eclipse or any other IDE that has a console for input and output, note that for USACO, you do NOT use the console. Instead, you rely on two input/output FILES. Your program needs to open up an input file, which has a bunch of text or numbers in it. The program then does whatever you ask it to, and then pastes it into the output file.</p>
<p>Lines of code above “Get line, break into tokens” are the input/output part of the program. Lines of code above out.close() are the actual meat of the program. If you have trouble understanding the input/output part of it, try looking up the code online (be independent).</p>
<p>Put some numbers in the test.in file, run the program, and then check the test.out!</p>
<h3 id="submission">Submission</h3>
<p>Once you’ve finished playing with the test program, go back to the USACO Text, and upload the file.</p>
<p>Once it says your program has passed, you can go to your first PROB, RIDE. Create a new class named RIDE, and set up your input output. You can use the same files, “test.in” and “test.out”, since your RIDE class is in the same project as the files.</p>
<p>However, once you submit the official program, you will need to change the code of the program to say “ride.in” and ride.out” for the input and output files, because those are the names of the files that USACO uses on their end.</p>
<p>Note that for the RIDE program, the problem is asking for input from multiple lines, rather than the first program, which asked for input from a single line.</p>
<h2 id="an-aside-to-future-frustrated-students">An aside to future frustrated students</h2>
<p>There are a lot of people out there who think that they’re smart, because learning comes easy to them. That they don’t have to try. But you have to remember that that feeling stupid and feeling annoyed, is a part of becoming better.</p>
<p>Everyone struggled through the same infuriatingly simple bugs that you’re going to be encountering in the next few weeks. Embrace the battle, feel stupid, and push through to understand whatever you’re stuck on.</p>This post is assuming that you already have a basic grasp of what algorithms are, and you are proficient in the Java programming language.