With the introduction of strings as a Prolog data type, there are
three main ways to represent text: using strings, atoms or code lists.
This section explains what to choose for what purpose. Both strings and
atoms are atomic objects: you can only look inside them using
dedicated predicates. Lists of character codes are compound
datastructures.
- Lists of character codes
- is what you need if you want to parse text using Prolog grammar
rules (DCGs, see phrase/3).
Most of the text reading predicates (e.g.,
read_line_to_codes/2)
return a list of character codes because most applications need to parse
these lines before the data can be processed.
- Atoms
- are identifiers. They are typically used in cases where
identity comparison is the main operation and that are typically not
composed nor taken apart. Examples are RDF resources (URIs that identify
something), system identifiers (e.g.,
'Boeing 747'
), but
also individual words in a natural language processing system. They are
also used where other languages would use enumerated types,
such as the names of days in the week. Unlike enumerated types, Prolog
atoms do not form not a fixed set and the same atom can represent
different things in different contexts.
- Strings
- typically represents text that is processed as a unit most of the time,
but which is not an identifier for something. Format specifications for
format/3
is a good example. Another example is a descriptive text provided in an
application. Strings may be composed and decomposed using e.g., string_concat/3
and sub_string/5
or converted for parsing using string_codes/2
or created from codes generated by a generative grammar rule, also using string_codes/2.