Python/Re: Difference between revisions

Latest revision as of 00:52, 13 April 2017

Extracting substring from strings using re

Suppose we have a string like "thing2_2017-04-09_05-04-67.csv" and we want to extract tokens from the filename (thing2, 2017, 04, 09, etc).

To extract particular tokens using a regular expression, we can use re.findall(regular_expression,string). For example, the regular expression [0-9]{4} looks for the digits 0-9 occurring exactly 4 times.

>>> z = "thing2_2017-04-09_05-04-67.csv"
>> re.findall(r'[0-9]{4}', z)
['2017']

Splitting string at occurrences of regular expression

To split a string at occurrences of regular expressions, use re.split(regular_expression, string). This will apply the regular expression to the string, and split the string at all occurrences of the given pattern. The pattern will be thrown away unless it is surrounded by ()s.

The regular expression [^a-zA-Z0-9]{1,} will match non-alphanumeric characters occurring one or more times in the string, and will split the string at the locations where this pattern occurs. For example:

>>> z = "thing2_2017-04-09_05-04-67.csv"
>> re.split(r'[^a-zA-Z0-9]{1,}', z)
['thing2', '2017', '04', '09', '05', '04', '67', 'csv']

@@ Line 15: / Line 15: @@
 ==Splitting string at occurrences of regular expression==
-To split a string at occurrences of regular expressions, use ,code>re.split(regular_expression, string)</code>. This will apply the regular expression to the string, and split the string at all occurrences of the given pattern. The pattern will be thrown away unless it is surrounded by ()s.
+To split a string at occurrences of regular expressions, use <code>re.split(regular_expression, string)</code>. This will apply the regular expression to the string, and split the string at all occurrences of the given pattern. The pattern will be thrown away unless it is surrounded by ()s.
 The regular expression <code>[^a-zA-Z0-9]{1,}</code> will match non-alphanumeric characters occurring one or more times in the string, and will split the string at the locations where this pattern occurs. For example:
@@ Line 21: / Line 21: @@
 <pre>
 >>> z = "thing2_2017-04-09_05-04-67.csv"
->> re.findall(r'[0-9]{4}', z)
+>> re.split(r'[^a-zA-Z0-9]{1,}', z)
 ['thing2', '2017', '04', '09', '05', '04', '67', 'csv']
 </pre>
+[[Category:Regular Expressions]]
+[[Category:Python]]

Python/Re: Difference between revisions

From charlesreid1

Latest revision as of 00:52, 13 April 2017

Extracting substring from strings using re

Splitting string at occurrences of regular expression