Help:Contents/Finding Content/Finding text

There are a couple of major ways to find text: with a hex editor or the strings utility.

Regular ASCII
Some games have their strings stored in ASCII. This means that a byte with value will be an N. If this is the case, you'll be able to spot strings of ordinary text amidst the rest of the data bytes.

Using the strings utility makes finding ASCII text simple, as it simply searches through a file for any printable text that's at least a few characters long. The results can also be written to a text file for later viewing or searching.

When using a hex editor to find text, you'll have to navigate the whole file and pay attention to when you see something that resembles text. Of course, if you know of an existing word, search for it, and you'll likely find the rest of the strings used in the game nearby. Beware that some strings might be located in other parts of the ROM.

When you find a string, just check to see if it's interesting or not. Simple as that.


 * Examples of games with strings stored this way: Ecco the Dolphin, Light Crusader

Custom character table
This is where it hurts. If the strings aren't stored as ASCII, that means the game has a certain value for every character. It's a lot harder to find these, but not impossible. Almost every time, the relative values between each character are the same, for instance, the value for the letter "E" will always be 4 more than the letter "A". As such, a simple "relative search" or "scan" tool should do the trick.

Many hex editors, like Translhextion have these. If you type in a known string, it will look through the file looking for patterns that match that particular pattern of letters, and if it's successful, you can generate a table file for all the letters, effectively "fixing" the game's strings.

If you're unlucky, the game might actually skip a couple of characters, and as such, the generated table might be incomplete, wrong in some parts or completely useless. For instance, Sonic the Hedgehog can't display "Q", "V", "W" and "X", and "Y"s will not be recognized correctly in the automatically generated table. In these cases, you just need to try smaller strings, preferably strings that use letters closer to the beginning of the alphabet.

If the game has completely nonsensical values for each letter... Well, you're out of luck then. Try the other methods.


 * Examples of games with strings stored this way: Rocket Knight Adventures (normal ASCII, but with an offset), Sonic the Hedgehog (normal ASCII with offset, some characters wrong)

Compressed text
Games with large amounts of text such as RPGs may use a primitive compression method known as dictionary compression. The longest and/or most common words are stored in a list, and each word in the script is replaced with a hexadecimal value that corresponds to its position in this list.

A very basic example:

In a compressed script, this may be stored as:


 * List:


 * Script:

To decode the script, you will have to figure out which words correspond to which codes, then create a table file (see above).

In memory
A last resort is to try finding them in the game's memory while it's running. Cheat Engine can search for strings in the RAM, but these are limited to normal ASCII strings, so read the paragraph above. If you try this method, make sure you only try to search for strings in the area where you're at in the game. For instance, the strings for the options menu probably aren't loaded when you're talking to someone in a temple.

Images
Maybe you tried searching every way imaginable, without realizing that that string is actually an image. If this is the case, you can't change it without editing the image, but that's probably not what you want to do. If this is the case, take a look at the Finding graphics guide.

One final note
Capitals. Remember that if the game has lower case and upper case characters, you should search for the strings with the appropriate case, because "A" (ASCII ) is not the same as "a" (ASCII ). If the characters are all the same case, you should try searching in all upper-case, and if you can't find anything, all lower-case. And if you think you can get anything, try "Normal Case", with upper-cased letters in the beginning of sentences and whatnot.

This is because a string in the game may look like "FOREST", but its bytes are actually for the lower-cased characters; it's just their graphics that are upper-cased.

Various text tools

 * Standard on most, if not all, Unix systems; Windows equivalents can be found easily:
 * the strings command, which merely extracts ASCII strings from a file. Use man strings to find out how your variant works.
 * the grep command and its brothers, which merely searches for regular expressions in files (the others are egrep, which uses an extended regular expression syntax, and fgrep, which does not use regular expressions). Use man grep/egrep/fgrep to find out how your commands work.
 * bgrep
 * A small, open source tool that finds byte sequences in files or whole directory trees. You also have a wildcard byte (??) to search with. Far more useful than just for text. Must be built yourself; no binary distribution is availble.


 * BinText
 * Searches for ASCII and UTF-16 strings in files as well as Windows Resource strings. Mainly intended for Windows programs (it was developed by antivirus software maker McAfee).