Python/Re: Difference between revisions
From charlesreid1
No edit summary |
|||
| (3 intermediate revisions by the same user not shown) | |||
| Line 15: | Line 15: | ||
==Splitting string at occurrences of regular expression== | ==Splitting string at occurrences of regular expression== | ||
To split a string at occurrences of regular expressions, use | To split a string at occurrences of regular expressions, use <code>re.split(regular_expression, string)</code>. This will apply the regular expression to the string, and split the string at all occurrences of the given pattern. The pattern will be thrown away unless it is surrounded by ()s. | ||
The regular expression <code>[^a-zA-Z0-9]{1,}</code> will match non-alphanumeric characters occurring one or more times in the string, and will split the string at the locations where this pattern occurs. For example: | The regular expression <code>[^a-zA-Z0-9]{1,}</code> will match non-alphanumeric characters occurring one or more times in the string, and will split the string at the locations where this pattern occurs. For example: | ||
| Line 21: | Line 21: | ||
<pre> | <pre> | ||
>>> z = "thing2_2017-04-09_05-04-67.csv" | >>> z = "thing2_2017-04-09_05-04-67.csv" | ||
>> re. | >> re.split(r'[^a-zA-Z0-9]{1,}', z) | ||
['thing2', '2017', '04', '09', '05', '04', '67', 'csv'] | ['thing2', '2017', '04', '09', '05', '04', '67', 'csv'] | ||
</pre> | </pre> | ||
[[Category:Regular Expressions]] | |||
[[Category:Python]] | |||
Latest revision as of 00:52, 13 April 2017
Regular expressions in Python.
Extracting substring from strings using re
Suppose we have a string like "thing2_2017-04-09_05-04-67.csv" and we want to extract tokens from the filename (thing2, 2017, 04, 09, etc).
To extract particular tokens using a regular expression, we can use re.findall(regular_expression,string). For example, the regular expression [0-9]{4} looks for the digits 0-9 occurring exactly 4 times.
>>> z = "thing2_2017-04-09_05-04-67.csv"
>> re.findall(r'[0-9]{4}', z)
['2017']
Splitting string at occurrences of regular expression
To split a string at occurrences of regular expressions, use re.split(regular_expression, string). This will apply the regular expression to the string, and split the string at all occurrences of the given pattern. The pattern will be thrown away unless it is surrounded by ()s.
The regular expression [^a-zA-Z0-9]{1,} will match non-alphanumeric characters occurring one or more times in the string, and will split the string at the locations where this pattern occurs. For example:
>>> z = "thing2_2017-04-09_05-04-67.csv"
>> re.split(r'[^a-zA-Z0-9]{1,}', z)
['thing2', '2017', '04', '09', '05', '04', '67', 'csv']