Please consider supporting The Cutting Room Floor on Patreon. Thanks for all your support!

Help:Contents/Finding Content/Finding text

From The Cutting Room Floor
Jump to: navigation, search

There are a couple of major ways to find text: with a hex editor or the strings utility.

Regular ASCII

Some games have their strings stored in ASCII. This means that a byte with value 4E will be an N. If this is the case, you'll be able to spot strings of ordinary text amidst the rest of the data bytes.

Using the strings utility makes finding ASCII text simple, as it simply searches through a file for any printable text that's at least a few characters long. The results can also be written to a text file for later viewing or searching.

When using a hex editor to find text, you'll have to navigate the whole file and pay attention to when you see something that resembles text. Of course, if you know of an existing word, search for it, and you'll likely find the rest of the strings used in the game nearby. Beware that some strings might be located in other parts of the ROM.

When you find a string, just check to see if it's interesting or not. Simple as that.

If you are searching for strings with a hex editor, make sure you don't copy text from the ASCII column directly. Use something like the strings utility, or see if your hex editor has a "Copy as Text" feature. That way, formatting (such as newlines) is copied over correctly.

Any non-textual data within the text must be either removed or substituted for text. For example, things like large swaths of FF bytes between lines of text would be removed from the result you put on the wiki, while things like byte values that get substituted for the player's name would be replaced with the appropriate default name.

Examples of games with strings stored this way: Ecco the Dolphin, Light Crusader

Custom character table

This is where it hurts. If the strings aren't stored as ASCII, that means the game has a certain value for every character. It's a lot harder to find these, but not impossible. Almost every time, the relative values between each character are the same, for instance, the value for the letter "E" will always be 4 more than the letter "A". As such, a simple "relative search" or "scan" tool should do the trick.

Many hex editors, like Translhextion have these. If you type in a known string, it will look through the file looking for patterns that match that particular pattern of letters, and if it's successful, you can generate a table file for all the letters, effectively "fixing" the game's strings.

If you're unlucky, the game might actually skip a couple of characters, and as such, the generated table might be incomplete, wrong in some parts or completely useless. For instance, Sonic the Hedgehog can't display "Q", "V", "W" and "X", and "Y"s will not be recognized correctly in the automatically generated table. In these cases, you just need to try smaller strings, preferably strings that use letters closer to the beginning of the alphabet.

If the game has completely nonsensical values for each letter... Well, you're out of luck then. Try the other methods.

Examples of games with strings stored this way: Rocket Knight Adventures (normal ASCII, but with an offset), Sonic the Hedgehog (normal ASCII with offset, some characters wrong)

Compressed text

Games with large amounts of text such as RPGs may use a primitive compression method known as dictionary compression. The longest and/or most common words are stored in a list, and each word in the script is replaced with a hexadecimal value that corresponds to its position in this list.

A very basic example:

The sword you have found in the stone has become weakened by the dark forces of evil. Travel back in time to save the world! Do not forget to save the princess, or you shall be cursed forever. Your destiny awaits you...

In a compressed script, this may be stored as:

  • List:

...
the
...

  • Script:

The sword you have found in 0A stone has become weakened by 0A dark forces of evil. Travel back in time to save 0A world! Do not forget to save 0A princess, or you shall be cursed forever. Your destiny awaits you...

To decode the script, you will have to figure out which words correspond to which codes, then create a table file (see above).

In memory

A last resort is to try finding them in the game's memory while it's running. Cheat Engine can search for strings in the RAM, but these are limited to normal ASCII strings, so read the paragraph above. If you try this method, make sure you only try to search for strings in the area where you're at in the game. For instance, the strings for the options menu probably aren't loaded when you're talking to someone in a temple.

Images

Maybe you tried searching every way imaginable, without realizing that that string is actually an image. If this is the case, you can't change it without editing the image, but that's probably not what you want to do. If this is the case, take a look at the Finding graphics guide.

One final note

Capitals. Remember that if the game has lower case and upper case characters, you should search for the strings with the appropriate case, because "A" (ASCII 41) is not the same as "a" (ASCII 61). If the characters are all the same case, you should try searching in all upper-case, and if you can't find anything, all lower-case. And if you think you can get anything, try "Normal Case", with upper-cased letters in the beginning of sentences and whatnot.

This is because a string in the game may look like "FOREST", but its bytes are actually for the lower-cased characters; it's just their graphics that are upper-cased.

Various text tools

  • Standard on most, if not all, Unix systems; Windows equivalents can be found easily:
    • the strings command, which merely extracts ASCII strings from a file. Use man strings to find out how your variant works.
    • the grep command and its brothers, which merely searches for regular expressions in files (the others are egrep, which uses an extended regular expression syntax, and fgrep, which does not use regular expressions). Use man grep/egrep/fgrep to find out how your commands work.
  • bgrep
A small, open source tool that finds byte sequences in files or whole directory trees. You also have a wildcard byte (??) to search with. Far more useful than just for text. Must be built yourself; no binary distribution is availble.
Searches for ASCII and UTF-16 strings in files as well as Windows Resource strings. Mainly intended for Windows programs (it was developed by antivirus software maker McAfee).
Provides a Right-Click context menu to easily scan for text strings.
Right Click > Analyze file with FileAlyzer2 > Hex tab > Right Click > Scan for Strings - See tabs on right.
Useful for games installed on Windows. Can open .exe .dll files to view or edit strings and other information.