std.string
String handling functions. Objects of types
string,
wstring, and
dstring are value types and cannot be mutated
element-by-element. For using mutation during building strings, use
char[],
wchar[], or
dchar[]. The
*string types
are preferable because they don't exhibit undesired aliasing, thus
making code more robust.
License:Boost License 1.0.
Authors:Walter Bright,
Andrei Alexandrescu,
and Jonathan M Davis
Source:
std/string.d
- class StringException: object.Exception;
- Exception thrown on errors in std.string functions.
- this(string msg, string file = __FILE__, size_t line = __LINE__, Throwable next = null);
- Parameters:
string msg |
The message for the exception. |
string file |
The file where the exception occurred. |
size_t line |
The line number where the exception occurred. |
Throwable next |
The previous exception in the chain of exceptions, if any. |
- int icmp(alias pred = "a < b", S1, S2)(S1 s1, S2 s2);
- Compares two ranges of characters lexicographically. The comparison is
case insensitive. Use XREF algorithm, cmp for a case sensitive
comparison. icmp works like XREF algorithm, cmp except that it
converts characters to lowercase prior to applying ($D pred). Technically,
icmp(r1, r2) is equivalent to
cmp!"std.uni.toLower(a) < std.uni.toLower(b)"(r1, r2).
< 0 | s1 < s2 |
= 0 | s1 == s2 |
> 0 | s1 > s2 |
- pure nothrow immutable(char)* toStringz(const(char)[] s);
pure nothrow immutable(char)* toStringz(string s);
- Returns a C-style zero-terminated string equivalent to s. s
must not contain embedded '\0''s as any C function will treat the first
'\0' that it sees as the end of the string. If s.empty is
true, then a string containing only '\0' is returned.
Important Note: When passing a char* to a C function, and the C
function keeps it around for any reason, make sure that you keep a reference
to it in your D code. Otherwise, it may go away during a garbage collection
cycle and cause a nasty bug when the C code tries to use it.
- enum CaseSensitive;
- Flag indicating whether a search is case-sensitive.
- pure ptrdiff_t indexOf(Char)(in Char[] s, dchar c, CaseSensitive cs = CaseSensitive.yes);
- Returns the index of the first occurence of c in s. If c
is not found, then -1 is returned.
cs indicates whether the comparisons are case sensitive.
- ptrdiff_t indexOf(Char1, Char2)(const(Char1)[] s, const(Char2)[] sub, CaseSensitive cs = CaseSensitive.yes);
- Returns the index of the first occurence of sub in s. If sub
is not found, then -1 is returned.
cs indicates whether the comparisons are case sensitive.
- ptrdiff_t lastIndexOf(Char)(const(Char)[] s, dchar c, CaseSensitive cs = CaseSensitive.yes);
- Returns the index of the last occurence of c in s. If c
is not found, then -1 is returned.
cs indicates whether the comparisons are case sensitive.
- ptrdiff_t lastIndexOf(Char1, Char2)(const(Char1)[] s, const(Char2)[] sub, CaseSensitive cs = CaseSensitive.yes);
- Returns the index of the last occurence of sub in s. If sub
is not found, then -1 is returned.
cs indicates whether the comparisons are case sensitive.
- pure nothrow auto representation(Char)(Char[] s);
- Returns the representation of a string, which has the same type
as the string except the character type is replaced by ubyte,
ushort, or uint depending on the character width.
Example:
string s = "hello";
static assert(is(typeof(representation(s)) == immutable(ubyte)[]));
assert(representation(s) is cast(immutable(ubyte)[]) s);
assert(representation(s) == [0x68, 0x65, 0x6c, 0x6c, 0x6f]);
- pure @trusted S toLower(S)(S s);
- Returns a string which is identical to s except that all of its
characters are lowercase (in unicode, not just ASCII). If s does not
have any uppercase characters, then s is returned.
- void toLowerInPlace(C)(ref C[] s);
- Converts s to lowercase (in unicode, not just ASCII) in place.
If s does not have any uppercase characters, then s is unaltered.
- pure @trusted S toUpper(S)(S s);
- Returns a string which is identical to s except that all of its
characters are uppercase (in unicode, not just ASCII). If s does not
have any lowercase characters, then s is returned.
- void toUpperInPlace(C)(ref C[] s);
- Converts s to uppercase (in unicode, not just ASCII) in place.
If s does not have any lowercase characters, then s is unaltered.
- pure @trusted S capitalize(S)(S s);
- Capitalize the first character of s and conver the rest of s
to lowercase.
- enum KeepTerminator;
S[] splitLines(S)(S s, KeepTerminator keepTerm = KeepTerminator.no);
- Split s into an array of lines using '\r', '\n',
"\r\n", std.uni.lineSep, and std.uni.paraSep as delimiters.
If keepTerm is set to KeepTerminator.yes, then the delimiter
is included in the strings returned.
- pure @safe C[] stripLeft(C)(C[] str);
- Strips leading whitespace.
Examples:
assert(stripLeft(" hello world ") ==
"hello world ");
assert(stripLeft("\n\t\v\rhello world\n\t\v\r") ==
"hello world\n\t\v\r");
assert(stripLeft("hello world") ==
"hello world");
assert(stripLeft([lineSep] ~ "hello world" ~ lineSep) ==
"hello world" ~ [lineSep]);
assert(stripLeft([paraSep] ~ "hello world" ~ paraSep) ==
"hello world" ~ [paraSep]);
- C[] stripRight(C)(C[] str);
- Strips trailing whitespace.
Examples:
assert(stripRight(" hello world ") ==
" hello world");
assert(stripRight("\n\t\v\rhello world\n\t\v\r") ==
"\n\t\v\rhello world");
assert(stripRight("hello world") ==
"hello world");
assert(stripRight([lineSep] ~ "hello world" ~ lineSep) ==
[lineSep] ~ "hello world");
assert(stripRight([paraSep] ~ "hello world" ~ paraSep) ==
[paraSep] ~ "hello world");
- C[] strip(C)(C[] str);
- Strips both leading and trailing whitespace.
Examples:
assert(strip(" hello world ") ==
"hello world");
assert(strip("\n\t\v\rhello world\n\t\v\r") ==
"hello world");
assert(strip("hello world") ==
"hello world");
assert(strip([lineSep] ~ "hello world" ~ [lineSep]) ==
"hello world");
assert(strip([paraSep] ~ "hello world" ~ [paraSep]) ==
"hello world");
- C[] chomp(C)(C[] str);
C1[] chomp(C1, C2)(C1[] str, const(C2)[] delimiter);
- If str ends with delimiter, then str is returned without
delimiter on its end. If it str does not end with
delimiter, then it is returned unchanged.
If no delimiter is given, then one trailing '\r', '\n',
"\r\n", std.uni.lineSep, or std.uni.paraSep is removed from
the end of str. If str does not end with any of those characters,
then it is returned unchanged.
Examples:
assert(chomp(" hello world \n\r") == " hello world \n");
assert(chomp(" hello world \r\n") == " hello world ");
assert(chomp(" hello world \n\n") == " hello world \n");
assert(chomp(" hello world \n\n ") == " hello world \n\n ");
assert(chomp(" hello world \n\n" ~ [lineSep]) == " hello world \n\n");
assert(chomp(" hello world \n\n" ~ [paraSep]) == " hello world \n\n");
assert(chomp(" hello world") == " hello world");
assert(chomp("") == "");
assert(chomp(" hello world", "orld") == " hello w");
assert(chomp(" hello world", " he") == " hello world");
assert(chomp("", "hello") == "");
- C1[] chompPrefix(C1, C2)(C1[] str, C2[] delimiter);
- If str starts with delimiter, then the part of str following
delimiter is returned. If it str does not start with
delimiter, then it is returned unchanged.
Examples:
assert(chompPrefix("hello world", "he") == "llo world");
assert(chompPrefix("hello world", "hello w") == "orld");
assert(chompPrefix("hello world", " world") == "hello world");
assert(chompPrefix("", "hello") == "");
- S chop(S)(S str);
- Returns str without its last character, if there is one. If str
ends with "\r\n", then both are removed. If str is empty, then
then it is returned unchanged.
Examples:
assert(chop("hello world") == "hello worl");
assert(chop("hello world\n") == "hello world");
assert(chop("hello world\r") == "hello world");
assert(chop("hello world\n\r") == "hello world\n");
assert(chop("hello world\r\n") == "hello world");
assert(chop("Walter Bright") == "Walter Brigh");
assert(chop("") == "");
- @trusted S leftJustify(S)(S s, size_t width, dchar fillChar = ' ');
- Left justify s in a field width characters wide. fillChar
is the character that will be used to fill up the space in the field that
s doesn't fill.
- @trusted S rightJustify(S)(S s, size_t width, dchar fillChar = ' ');
- Right justify s in a field width characters wide. fillChar
is the character that will be used to fill up the space in the field that
s doesn't fill.
- @trusted S center(S)(S s, size_t width, dchar fillChar = ' ');
- Center s in a field width characters wide. fillChar
is the character that will be used to fill up the space in the field that
s doesn't fill.
- pure @trusted S detab(S)(S s, size_t tabSize = 8);
- Replace each tab character in s with the number of spaces necessary
to align the following character at the next tab stop where tabSize
is the distance between tab stops.
- pure @trusted S entab(S)(S s, size_t tabSize = 8);
- Replaces spaces in s with the optimal number of tabs.
All spaces and tabs at the end of a line are removed.
Parameters:
s |
String to convert. |
tabSize |
Tab columns are tabSize spaces apart. |
- @safe C1[] translate(C1, C2 = immutable(char))(C1[] str, dchar[dchar] transTable, const(C2)[] toRemove = null);
@safe C1[] translate(C1, S, C2 = immutable(char))(C1[] str, S[dchar] transTable, const(C2)[] toRemove = null);
- Replaces the characters in str which are keys in transTable with
their corresponding values in transTable. transTable is an AA
where its keys are dchar and its values are either dchar or some
type of string. Also, if toRemove is given, the characters in it are
removed from str prior to translation. str itself is unaltered.
A copy with the changes is returned.
See Also:
tr
std.array.replace
Parameters:str |
The original string. |
transTable |
The AA indicating which characters to replace and what to
replace them with. |
toRemove |
The characters to remove from the string. |
Examples:
dchar[dchar] transTable1 = ['e' : '5', 'o' : '7', '5': 'q'];
assert(translate("hello world", transTable1) == "h5ll7 w7rld");
assert(translate("hello world", transTable1, "low") == "h5 rd");
string[dchar] transTable2 = ['e' : "5", 'o' : "orange"];
assert(translate("hello world", transTable2) == "h5llorange worangerld");
- nothrow @trusted C[] translate(C = immutable(char))(in char[] str, in char[] transTable, in char[] toRemove = null);
pure nothrow @trusted string makeTrans(in char[] from, in char[] to);
- This is an ASCII-only overload of translate. It
will not work with Unicode. It exists as an optimization for the
cases where Unicode processing is not necessary.
Unlike the other overloads of translate, this one does not take
an AA. Rather, it takes a string generated by makeTrans.
The array generated by makeTrans is 256 elements long such that
the index is equal to the ASCII character being replaced and the value is
equal to the character that it's being replaced with. Note that translate
does not decode any of the characters, so you can actually pass it Extended
ASCII characters if you want to (ASCII only actually uses 128
characters), but be warned that Extended ASCII characters are not valid
Unicode and therefore will result in a UTFException being thrown from
most other Phobos functions.
Also, because no decoding occurs, it is possible to use this overload to
translate ASCII characters within a proper UTF-8 string without altering the
other, non-ASCII characters. It's replacing any code unit greater than
127 with another code unit or replacing any code unit with another code
unit greater than 127 which will cause UTF validation issues.
See Also:
tr
std.array.replace
Parameters:str |
The original string. |
transTable |
The string indicating which characters to replace and what
to replace them with. It is generated by makeTrans. |
toRemove |
The characters to remove from the string. |
Examples:
auto transTable1 = makeTrans("eo5", "57q");
assert(translate("hello world", transTable1) == "h5ll7 w7rld");
assert(translate("hello world", transTable1, "low") == "h5 rd");
- string format(Char, Args...)(in Char[] fmt, Args args);
- Format arguments into a string.
format's current implementation has been replaced with xformat's
implementation. in November 2012.
This is seamless for most code, but it makes it so that the only
argument that can be a format string is the first one, so any
code which used multiple format strings has broken. Please change
your calls to format accordingly.
e.g.:
format("key = %s", key, ", value = %s", value)
needs to be rewritten as:
format("key = %s, value = %s", key, value)
- char[] sformat(Char, Args...)(char[] buf, in Char[] fmt, Args args);
- Format arguments into string s which must be large
enough to hold the result. Throws RangeError if it is not.
Returns:
s
sformat's current implementation has been replaced with xsformat's
implementation. in November 2012.
This is seamless for most code, but it makes it so that the only
argument that can be a format string is the first one, so any
code which used multiple format strings has broken. Please change
your calls to sformat accordingly.
e.g.:
sformat(buf, "key = %s", key, ", value = %s", value)
needs to be rewritten as:
sformat(buf, "key = %s, value = %s", key, value)
- string xformat(Char, Args...)(in Char[] fmt, Args args);
- Format arguments into a string.
format has been changed to use this implementation in November 2012.
Then xformat has been scheduled for deprecation at the same time.
It will be deprecateed in May 2013.
- char[] xsformat(Char, Args...)(char[] buf, in Char[] fmt, Args args);
- Format arguments into string buf which must be large
enough to hold the result. Throws RangeError if it is not.
sformat has been changed to use this implementation in November 2012.
Then xsformat has been scheduled for deprecation at the same time.
It will be deprecateed in May 2013.
Returns:
filled slice of buf
- bool inPattern(S)(dchar c, in S pattern);
- See if character c is in the pattern.
Patterns:
A pattern is an array of characters much like a character
class in regular expressions. A sequence of characters
can be given, such as "abcde". The '-' can represent a range
of characters, as "a-e" represents the same pattern as "abcde".
"a-fA-F0-9" represents all the hex characters.
If the first character of a pattern is '^', then the pattern
is negated, i.e. "^0-9" means any character except a digit.
The functions inPattern, countchars, removeschars,
and squeeze
use patterns.
Note:
In the future, the pattern syntax may be improved
to be more like regular expression character classes.
- bool inPattern(S)(dchar c, S[] patterns);
- See if character c is in the intersection of the patterns.
- size_t countchars(S, S1)(S s, in S1 pattern);
- Count characters in s that match pattern.
- S removechars(S)(S s, in S pattern);
- Return string that is s with all characters removed that match pattern.
- S squeeze(S)(S s, in S pattern = null);
- Return string where sequences of a character in s[] from pattern[]
are replaced with a single instance of that character.
If pattern is null, it defaults to all characters.
- S1 munch(S1, S2)(ref S1 s, S2 pattern);
- Finds the position pos of the first character in s that does not match pattern (in the terminology used by
inPattern). Updates s =
s[pos..$]. Returns the slice from the beginning of the original
(before update) string up to, and excluding, pos.
Example:
string s = "123abc";
string t = munch(s, "0123456789");
assert(t == "123" && s == "abc");
t = munch(s, "0123456789");
assert(t == "" && s == "abc");
The munch function is mostly convenient for skipping
certain category of characters (e.g. whitespace) when parsing
strings. (In such cases, the return value is not used.)
- S succ(S)(S s);
- Return string that is the 'successor' to s[].
If the rightmost character is a-zA-Z0-9, it is incremented within
its case or digits. If it generates a carry, the process is
repeated with the one to its immediate left.
- C1[] tr(C1, C2, C3, C4 = immutable(char))(C1[] str, const(C2)[] from, const(C3)[] to, const(C4)[] modifiers = null);
- Replaces the characters in str which are in from with the
the corresponding characters in to and returns the resulting string.
tr is based on
Posix's tr,
though it doesn't do everything that the Posix utility does.
Parameters:
str |
The original string. |
from |
The characters to replace. |
to |
The characters to replace with. |
modifiers |
String containing modifiers. |
Modifiers:
Modifier | Description |
'c' | Complement the list of characters in from |
'd' | Removes matching characters with no corresponding
replacement in to |
's' | Removes adjacent duplicates in the replaced
characters |
If the modifier 'd' is present, then the number of characters in
to may be only 0 or 1.
If the modifier 'd' is not present, and to is empty, then
to is taken to be the same as from.
If the modifier 'd' is not present, and to is shorter than
from, then to is extended by replicating the last charcter in
to.
Both from and to may contain ranges using the '-' character
(e.g. "a-d" is synonymous with "abcd.) Neither accept a leading
'^' as meaning the complement of the string (use the 'c' modifier
for that).
- bool isNumeric(const(char)[] s, in bool bAllowSep = false);
- [in] string s can be formatted in the following ways:
Integer Whole Number:
(for byte, ubyte, short, ushort, int, uint, long, and ulong)
['+'|'-']digit(s)[U|L|UL]
Examples:
123, 123UL, 123L, +123U, -123L
Floating-Point Number:
(for float, double, real, ifloat, idouble, and ireal)
['+'|'-']digit(s)[.][digit(s)][[e-|e+]digit(s)][i|f|L|Li|fi]]
or [nan|nani|inf|-inf]
Examples:
+123., -123.01, 123.3e-10f, 123.3e-10fi, 123.3e-10L
(for cfloat, cdouble, and creal)
['+'|'-']digit(s)[.][digit(s)][[e-|e+]digit(s)][+]
[digit(s)[.][digit(s)][[e-|e+]digit(s)][i|f|L|Li|fi]]
or [nan|nani|nan+nani|inf|-inf]
Examples:
nan, -123e-1+456.9e-10Li, +123e+10+456i, 123+456
[in] bool bAllowSep
False by default, but when set to true it will accept the
separator characters "," and "" within the string, but these
characters should be stripped from the string before using any
of the conversion functions like toInt(), toFloat(), and etc
else an error will occur.
Also please note, that no spaces are allowed within the string
anywhere whether it's a leading, trailing, or embedded space(s),
thus they too must be stripped from the string before using this
function, or any of the conversion functions.
- char[] soundex(const(char)[] string, char[] buffer = null);
- Soundex algorithm.
The Soundex algorithm converts a word into 4 characters
based on how the word sounds phonetically. The idea is that
two spellings that sound alike will have the same Soundex
value, which means that Soundex can be used for fuzzy matching
of names.
Parameters:
const(char)[] string |
String to convert to Soundex representation. |
char[] buffer |
Optional 4 char array to put the resulting Soundex
characters into. If null, the return value
buffer will be allocated on the heap. |
Returns:
The four character array with the Soundex result in it.
Returns null if there is no Soundex representation for the string.
See Also:
Wikipedia,
The Soundex Indexing System
BUGS:
Only works well with English names.
There are other arguably better Soundex algorithms,
but this one is the standard one.
- string[string] abbrev(string[] values);
- Construct an associative array consisting of all
abbreviations that uniquely map to the strings in values.
This is useful in cases where the user is expected to type
in one of a known set of strings, and the program will helpfully
autocomplete the string once sufficient characters have been
entered that uniquely identify it.
Example:
import std.stdio;
import std.string;
void main()
{
static string[] list = [ "food", "foxy" ];
auto abbrevs = std.string.abbrev(list);
foreach (key, value; abbrevs)
{
writefln("%s => %s", key, value);
}
}
produces the output:
fox => foxy
food => food
foxy => foxy
foo => food
- size_t column(S)(S str, size_t tabsize = 8);
- Compute column number after string if string starts in the
leftmost column, which is numbered starting from 0.
- S wrap(S)(S s, size_t columns = 80, S firstindent = null, S indent = null, size_t tabsize = 8);
- Wrap text into a paragraph.
The input text string s is formed into a paragraph
by breaking it up into a sequence of lines, delineated
by \n, such that the number of columns is not exceeded
on each line.
The last line is terminated with a \n.
Parameters:
s |
text string to be wrapped |
columns |
maximum number of columns in the paragraph |
firstindent |
string used to indent first line of the paragraph |
indent |
string to use to indent following lines of the paragraph |
tabsize |
column spacing of tabs |
Returns:
The resulting paragraph.
- S outdent(S)(S str);
S[] outdent(S)(S[] lines);
- Removes indentation from a multi-line string or an array of single-line strings.
This uniformly outdents the text as much as possible.
Whitespace-only lines are always converted to blank lines.
A StringException will be thrown if inconsistent indentation prevents
the input from being outdented.
Works at compile-time.
Example:
writeln(q{
import std.stdio;
void main() {
writeln("Hello");
}
}.outdent());
Output:
import std.stdio;
void main() {
writeln("Hello");
}