#WritersCoffeeClub Apr 24 Share a silly mistake you've made while writing.
-
@cstross Sure, regexps are great. If your editor supports them, and you know how to write them correctly, and the implementation doesn't have word boundary issues with utf-8. For any average writer stuck on an average text editor, I suggest the 3-step method.
@richcarl I work in Scrivener, which includes pcre regexps. But you know even Microsoft Word has regexps these days? They're well-hidden and their implementation is typically Microsoftish (i.e. non-standard and missing a few features) but it's there in the search/replace dialog box. And the publishing industry runs on Word files—so much so that if you go the trad route you *have to* submit your manuscripts in docx format.
So every non-amateur author uses Word or LibreOffice at some stage.
-
@WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb No, they need to pad their search terms with non-word atoms (regular expressions are your friend!), i.e. \W+(search_word)\W+ (in perl-compatible regexp syntax).
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb or [^\w-] instead of \W for a more careful approach, since the \W class will replace smarty-pants to smarty-trousers. hyphens are not included in \w, so the inverted class \W matches on them, which is unlikely to be what you want. [^\w-] works the same but doesn't treat hyphens as word boundaries to avoid the issue.
-
@SmartmanApps @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @cstross To be fair, the one at the top is a plantain
-
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb or [^\w-] instead of \W for a more careful approach, since the \W class will replace smarty-pants to smarty-trousers. hyphens are not included in \w, so the inverted class \W matches on them, which is unlikely to be what you want. [^\w-] works the same but doesn't treat hyphens as word boundaries to avoid the issue.
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb annoyingly there's no standard character class that matches word boundaries in Latin script prose with high confidence, e.g. something along the lines of [\s"“”„;:!?¡¿‽.,()\[\]…]
-
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb or [^\w-] instead of \W for a more careful approach, since the \W class will replace smarty-pants to smarty-trousers. hyphens are not included in \w, so the inverted class \W matches on them, which is unlikely to be what you want. [^\w-] works the same but doesn't treat hyphens as word boundaries to avoid the issue.
@gsuberland
If you don't care about hyphens, `\bword\b` might be the better choice as a zero-width assertion (i.e. no need for capture groups to retain other characters).If you do.. `(?<!-)\bword\b(?!-)` with some perl magic included will do the look backs/lookaheads.
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb
-
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb annoyingly there's no standard character class that matches word boundaries in Latin script prose with high confidence, e.g. something along the lines of [\s"“”„;:!?¡¿‽.,()\[\]…]
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Unicode defines word boundaries, and Perl has
\b{wb}, which matches them. -
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Unicode defines word boundaries, and Perl has
\b{wb}, which matches them.@ilmari @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb My perl experience mostly predates unicode

-
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Unicode defines word boundaries, and Perl has
\b{wb}, which matches them.@ilmari @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb ooh good to know, thanks
-
@ilmari @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb My perl experience mostly predates unicode

@cstross @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb To be fair,
\b{…}was only added to Perl ten years ago
-
@cstross @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb To be fair,
\b{…}was only added to Perl ten years ago
@ilmari @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Yeah, it's been most of 25 years for me ...
-
@cstross @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb To be fair,
\b{…}was only added to Perl ten years ago
@ilmari @cstross @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb \b has been in regexp far longer, only the Unicode additions are new.
-
@ilmari @cstross @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb \b has been in regexp far longer, only the Unicode additions are new.
@jernej__s @cstross @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb yes, that's why I wrote
\b{…}, not\b. -
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Unicode defines word boundaries, and Perl has
\b{wb}, which matches them.@ilmari @gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb
and vim has \< and \> for “directed” word boundary zero-width expression -
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb or [^\w-] instead of \W for a more careful approach, since the \W class will replace smarty-pants to smarty-trousers. hyphens are not included in \w, so the inverted class \W matches on them, which is unlikely to be what you want. [^\w-] works the same but doesn't treat hyphens as word boundaries to avoid the issue.
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Wait, you’re telling me a word character is not the same as a not-not word character?
-
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb Wait, you’re telling me a word character is not the same as a not-not word character?
@adamrice @gsuberland @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb (Obligatory Bill Clinton joke): It depends what you mean by "word".
Less flippantly: is 467130356 a word? Is 17/4/2012 a word? Is !true a word?
-
@cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb or [^\w-] instead of \W for a more careful approach, since the \W class will replace smarty-pants to smarty-trousers. hyphens are not included in \w, so the inverted class \W matches on them, which is unlikely to be what you want. [^\w-] works the same but doesn't treat hyphens as word boundaries to avoid the issue.
@gsuberland @cstross @WellsiteGeo @quixoticgeek @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord @edwinb gonna be blunt: you want to eyeball and confirm every substitution if possible
these days you can be told how many potential ones up front for a lot of text pretty fast
-
#WritersCoffeeClub Apr 24 Share a silly mistake you've made while writing.
Character name changes. If for some reason you change the name of a character you *really* need to double-check that it's changed *everywhere*. Hint: regular expressions and global *conditional* search/replace are your tools. Also how to manage word stemming with regexps. Then triple-check *everything*. Otherwise—guaranteed—you'll flip a character's name in one paragraph and the internet will never let you forget it!
@cstross
I would also recommend doing it interactively.
Yes you need to confirm every change but you learn where your regex goes wrong
Sadly this doesn't help with missed occurrence -
@DJRNDM @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord
Groan.
s/(\W+?)(pants)(\W+?)/\1trousers\3/ig
You could use \b — match a word boundary — instead of \W+? (smallest count of non-word characters preceding the next regexp group) but that'd miss run-on strings ending in pants (eg. InterCappedpants).
The pcre search modifiers s///ig are for case-insensitive and global.
@cstross @DJRNDM @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord
This is still not perfect. You would need to make sure every substitution is the correct meaning of ‘pants’. Otherwise you risk sentences like:
“Whew! I'm all out of breath after that steep hill,” he trousers.
-
I once changed a character's name from Allan to Ben, and later changed it back.
Reading through the manuscript, I found I had thus invented the Allanch seat.
@davidtheeviloverlord @cstross I recall a story where one of the characters was pulling up his Brendas. I guess Jean got renamed...
-
@cstross @DJRNDM @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord
This is still not perfect. You would need to make sure every substitution is the correct meaning of ‘pants’. Otherwise you risk sentences like:
“Whew! I'm all out of breath after that steep hill,” he trousers.
@headword @cstross @owent @alicemcalicepants @nullcolaship @davidtheeviloverlord Hot damn! Totally forgot for a moment there that verb existed.
-
F folfdk@helvede.net shared this topic