Using SQL to identify embedded resources for your CMS site's CSP
It's good practice these days to add a Content Security Policy and couple it with the consent dialog. Both should include all third parties a site might have embedded.
But especially if being too relaxed in the past and allowing for example a generic HTML block or similar; it's hard to overview and easy to miss external embedded resources that might reside in the content database.
Putting together a CSP and activating it in content-security-policy-report-only
mode is a good way to start but I thought it also valuable to try to extract possible embeddings from CMS content.
To work in a more direct fashion I figured that using SQL queries would be nice.
This probably might have some holes but I thought that looking for src="//
and src="http
in LongString-based properties should give decent coverage.
A Stack Overflow answer from Darka had a smart function to involve so after downloading the production database from the self-service portal I added that function to my local copy.
I only needed to do a small modification to the extraction pattern to look for // only.
CREATE FUNCTION dbo.getLinksFromText (@Tekstas NVARCHAR(MAX))
RETURNS @Data TABLE(TheLink NVARCHAR(500))
AS
BEGIN
DECLARE @FirstIndexOfChar INT,
@LastIndexOfChar INT,
@LengthOfStringBetweenChars INT,
@String VARCHAR(MAX)
SET @FirstIndexOfChar = CHARINDEX('//', @Tekstas, 0)
WHILE @FirstIndexOfChar > 0
BEGIN
SET @String = ''
SET @LastIndexOfChar = CHARINDEX('/', @Tekstas,@FirstIndexOfChar+7)
SET @LengthOfStringBetweenChars = @LastIndexOfChar - @FirstIndexOfChar + 1
SET @String = SUBSTRING(@Tekstas, @FirstIndexOfChar, @LengthOfStringBetweenChars)
INSERT INTO @Data (TheLink) VALUES (@String);
SET @Tekstas = SUBSTRING(@Tekstas, @LastIndexOfChar, LEN(@Tekstas))
SET @FirstIndexOfChar = CHARINDEX('//', @Tekstas, 0)
END
RETURN
END
All results query
SELECT fkContentID, LongString, Extracted.*
FROM tblContentProperty
OUTER APPLY dbo.getLinksFromText(LongString) as Extracted
WHERE LongString LIKE '%src="http%'
OR LongString LIKE '%src="//%'
OR LongString LIKE '%src=http%'
OR LongString LIKE '%src=//%'
ORDER BY fkContentID
This gave me a good amount of rows with the property value included.
All results example
fkContentID | LongString | TheLink |
---|---|---|
202732 | <iframe src="https://www.youtube-nocookie.com/embed/qda .. |
//www.youtube-nocookie.com/ |
Distinct results query
I also complemented the full report with a listing of the distinct host names.
SELECT DISTINCT TheLink FROM (
SELECT fkContentID, Extracted.*
FROM tblContentProperty
OUTER APPLY dbo.getLinksFromText(LongString) as Extracted
WHERE LongString LIKE '%src="http%'
OR LongString LIKE '%src="//%'
OR LongString LIKE '%src=http%'
OR LongString LIKE '%src=//%'
) AS tbl
ORDER BY TheLink
Distinct results example
TheLink |
---|
//www.facebook.com/ |
//www.youtube-nocookie.com/ |
Comments?
Published and tagged with these categories: Optimizely, CMS, Microsoft SQL Server