TXTing 101: Finding Security Issues in the Long Tail of DNS TXT Records

Olivier van der Toorn

University of Twente

[email protected]

Roland van Rijswijk-Deij

University of Twente

r[email protected]

Tobias Fiebig

TU Delft

t.ﬁ[email protected]

Martina Lindorfer

TU Wien

martina@iseclab.org

Anna Sperotto

University of Twente

a.sperotto@utwente.nl

Abstract—The DNS TXT resource record is the one with the

most ﬂexibility for its contents, as it is a largely unstructured.

Although it might be the ideal basis for storing any form

of text-based information, it also poses a security threat, as

TXT records can also be used for malicious and unintended

practices. Yet, TXT records are often overlooked in security

research. In this paper, we present the ﬁrst structured

study of the uses of TXT records, with a speciﬁc focus on

security implications. We are able to classify over 99.54%

of all TXT records in our dataset, ﬁnding security issues

including accidentally published private keys and exploit

delivery attempts. We also report on our lessons learned

during our large-scale, systematic analysis of TXT records.

Index Terms—DNS, Security, Measurement, Classiﬁcation

1. Introduction

The Domain Name System (DNS) is critical to the In-

ternet’s infrastructure but has long outgrown its original

purpose of resolving names to IP addresses. Applications

nowadays rely on DNS to prevent email spooﬁng (SPF)

and to verify SSH (Secure SHell) and TLS (Transport

Layer Security) key ﬁngerprints (SSHFP and TLSA). With

DNS-over-HTTPS (DoH), it has now even become pos-

sible for browsers and browser-based malware to retrieve

DNS information directly [1].

Of the existing resource record (RR) types, the TXT

RR is the one that provides the most ﬂexibility in terms of

content. Its use and format have been subject to changes

and debates over the years. While DNS TXT records

initially were supposed to only hold descriptive text (RFC

1035), RFC 1464 tried to structure the record type by

introducing a key–value store format, which, e.g., SPF

and DKIM use. In practice, the freedom of an unstructured

resource record remains, which allows anyone to publish

any text-based information. This makes TXT records ideal

We would like to thank Daniël Meinsma for starting this research

by ﬁnding (malicious) PowerShell code samples in TXT records. This

work has been partially funded by the EU H2020 projects CON-

CORDIA (#830927), CyberSecurity4Europe (#830929), and Safe-DEED

(#825225). Furthermore, it has been funded by SIDN-fonds, an indepen-

dent fund on the initiative of SIDN, the registrar for ‘.nl’ domains. The

research leading to these results has also received funding from SBA

Research (SBA-K1), which is funded within the framework of COMET

Competence Centers for Excellent Technologies by BMVIT, BMDW, and

the federal state of Vienna, managed by the FFG.

TABLE 1: Overview of TXT records in our dataset.

Class Percent # of Records

A Standardized 68.95% 50,304,343

B Non standardized 31.05% 22,655,424

C - Legitimate, well deﬁned 14.40% 10,504,491

D - Legitimate, not well deﬁned 15.48% 11,292,795

E - Unclassiﬁed 1.17% 858,138

candidates for malicious and unintended practices, yet, the

TXT records are generally overlooked in security research.

This paper aims at ﬁlling this void, by providing a

longitudinal view of how DNS TXT records are used in

practice, focusing on unconventional use cases and their

security implications. Our dataset consists of all TXT

records from OpenINTEL [2], [3], amounting to roughly

75 billion TXT records collected between March 2015 and

December 2018. The breakdown of the TXT records in Ta-

ble 1 shows that the majority has a well-deﬁned purpose,

being either standardized or non-standardized (83.35%).

This includes email veriﬁcation and the veriﬁcation of

domain ownership. For 15.48% of TXT records, the un-

derlying use case is not well-deﬁned, but matching them

against regular expressions suggests legitimate use cases

(e.g., references to DNS services, dates). Finally, 1.17%

of the TXT records fall outside the previously mentioned

categories. This leads to the question of what type of

information is contained in this tail of TXT records.

Our main contributions are: (1) A structured and his-

torical analysis of TXT records spanning more than three

years. We highlight changes in how, and how often, TXT

records are used. (2) An in-depth analysis of the so-

far neglected tail of TXT records, focusing on security

implications. (3) Our lessons learned, especially in terms

of the amount of manual labor involved, in systematically

analyzing security issues in TXT records.

2. Background and Related Work

DNS and TXT Record Use. DNS originally only tied

domain names and IP addresses together, but has been

continuously extended to keep pace with the technical

requirements of the ever changing Internet. Major changes

include the introduction of new record types, e.g., SRV,

DNSSEC, DNS-over-TCP, and the introduction of DNS-

over-HTTPS.

543

2020 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW)

DOI 10.1109/EuroS&PW51379.2020.00079

A more subtle way to add new functionality to DNS is

overloading existing resource record types. TXT records

have been commonly used for this, as they were initially

built to hold descriptive (free) text. While some attempts

were made to structure (RFC 1464) or discourage (RFC

5507) using TXT records this way, several common appli-

cations leverage them. For example, TXT records are used

for various forms of email validation and spam preven-

tion, including SPF, DKIM, and DMARC, but DNS TXT

records can also be used as a way of ﬁnding contacts [4],

or to monitor IoT devices [5]. Besides these legitimate

use cases, malicious uses include adding large records to

create more efﬁcient DNS ampliﬁcation attacks [6], or

creating a command and control channel for malware [7],

[8], [9], [10], [11], [12]. Most recently, spam campaigns

have started to query DNS TXT records from JavaScript

embedded in their HTML payload to dynamically redirect

to target URLs [1].

DNS Measurement Studies. Initially, DNS-related mea-

surement studies focused on passive measurements that

investigate clients’ use of DNS [13], [14], while active

measurements provide a better understanding of the op-

erational side of Internet infrastructure. To make such

measurements reliable, research have to account for: (1)

DNS not necessarily being consistent across several van-

tage points, (2) the large amount of involved data, and

(3) measurements being temporally consistent. To address

these issues, van Rijswijk et al. build OpenINTEL, a

platform for longitudinal DNS scans [2], [3].

Streibelt et al. [15] actually use inconsistencies in DNS

replies to measure DNS. In their speciﬁc case they use

reply differences based on EDNS0 to measure DNS load

balancing (RFC 1794). Studies into the use of speciﬁc RRs

include Fiebig et al. looking into PTR records [16], [17],

and Portier et al. taking a ﬁrst look at TXT records [18].

The latter however focused mostly on quantifying the well

known parts of TXT record use — mostly email and veri-

ﬁcation related tokens — and did not explore the unstruc-

tured tail of TXT records, and what security implications

are tied to these records. Portier et al. do mention TXT

records may leak information but do not explore what kind

of information or in what quantity information is leaked.

Even though, our analysis is based on a larger volume

of TXT records — 75 billion TXT records collected

over a three-year period instead of 1.4 billion records

collected over a two-year period — we see comparable

results in the high-level classiﬁcation of records. Portier’s

‘Protocol enhancement’ accounts for 76.60% compared to

our ‘Standardized’ category of 68.95% of TXT records.

They reported 15.61% as ‘Domain veriﬁcation’, whereas

we observed 14.40% in this category. Finally, Portier

et al. classiﬁed 7.78% of records as unknown, which

only amounts to 1.17% in our classiﬁcation. An in-depth

comparison of results is out of scope for this paper.

3. Methodology

Dataset Description. We use data from OpenINTEL for

our research. OpenINTEL is an active DNS measurement

platform developed at the University of Twente in collabo-

ration with SURFnet, SIDN Labs, and NLnet Labs. Since

March 2015, OpenINTEL collects daily DNS snapshots

Figure 1: Growth of number of domains and TXT records.

TABLE 2: Dataset statistics.

Domains/day TXT records/day TXT records/domain

min 194M 34M 0

max 325M 73M 848

mean 247M 53M 0.22

std 42M 11M 0.66

and currently queries around 65% of the global DNS name

space, covering the zones .com, .net, .info, .mobi, the new

ICANN gTLDs, and a set of ccTLDs such as .nl, .se,

.ca, .ﬁ, .at, .dk and .nu. We extracted all TXT records

gathered between March 2015 and December 2018. As

summarized in Table 2, on average we observed 53 million

TXT records and 247 million domain names per day.

Note that even though every domain only has 0.2 TXT

records on average, we see a high variance, with individual

domains containing hundreds of records at the apex.

Figure 1 shows the number of TXT records collected

over time. The number of TXT records grows by a factor

of 2 (from 35 million records in 2015 to 73 million

in 2018). By comparison, the overall DNS name space

expands only by a factor of 1.5 (from 128 million domains

in 2015 to 191 million in 2018). A possible explanation

for the increasing use of TXT records are email defenses

like SPF and DKIM: the majority of TXT records are

related to email, and stricter email policies from large

providers (e.g., Google, Microsoft and Yahoo) [19], [20]

may explain this rise.

Note that these statistics are subjective to the set of

zones measured by OpenINTEL, which has grown over

the years [21]. Furthermore, the drop in domains be-

tween 2017-03-01 and 2017-05-01 was caused by a major

webhoster blocking (and later unblocking) OpenINTEL.

Table 3 summarizes events impacting our measurements.

Categorization of TXT Records. To get a structured

view on the TXT record ecosystem, we partition the set

of TXT records based on regular expressions, and group

similar classes into broader categories. We were careful

to match a TXT record to a single regular expression,

in order to prevent the record from being counted twice.

Our process was iterative, we performed a heavy-hitter

analysis to identify the current majority classes and built

our regular expressions iteratively until the Other category

did not contain unidentiﬁed items anymore. TXT records

classiﬁed as ‘SPF’, ‘SenderID’ or ‘DKIM’ are part of the

E-mail category. In addition, we identify the categories

Veriﬁcation (e.g., Google site veriﬁcation, Facebook do-

main veriﬁcation), Patterns (IP addresses, dates), Encoded

(usually Base64), Crypto Coins and Miscellaneous (in-

cluding service-speciﬁc tokens and hosting advertisement

slogans). We group the remaining TXT records in the

Other category, which includes unclassiﬁed records.

544

TABLE 3: Major events impacting our measurements.

* Date Event

(A) 2015-03-26 Major cloud provider cleans up SPF records

(B) 2015-07-03 DDoS protection service account hashes peak

(D) 2016-04-06 Start measurement .info, .mobi + new gTLDs

(E) 2016-07-07 Start measurement .ca

(F) 2017-03-01 Major webhoster blocks OpenINTEL

(G) 2017-05-05 Major webhoster unblocks OpenINTEL

(H) 2017-06-18 Start measurement .ru + .рф

Figure 2: Evolution of TXT records over the span of three

years (* annotated with events listed in Table 3).

Table 4 shows the distribution of records over our

categories for a single day (2018-12-31), for which 0.46%

of the records are left unclassiﬁed. Figure 2 shows how our

categories evolve over time. We marked events (labeled

A–H) that inﬂuenced the evolution of categories. Most

events relate to the expansion of the OpenINTEL mea-

surements (events C, D, E and H ) or the major webhoster

temporarily blocking the measurements (events F and G).

Other events are large CDNs removing 1.80M SPF records

of the form “v=spf1 -all” at once (A) and 2.64M

account hashes suddenly appearing in TXT records (B).

Reproducibility. We used 101 regular expressions

for our classiﬁcation, which are available on our

website: https://www.tide-project.nl/blog/

wtmc2020.

4. Analysis of the Long Tail

As the structured part of TXT records has already been

explored in the past [18], we focus on the Other category,

i.e., the unstructured tail of TXT records. Figure 3 shows

the evolution over time of the number of TXT records in

the Other category, broken down by the classes identiﬁed

in Table 4. Although the Other category represents on

average only 1.28% of TXT records, it has grown signif-

icantly over the measurement period from 174K to 858K

records (4.9x growth). We identiﬁed several events that

contribute the most to this trend: In September 2015 (A)

and later in August 2018 (

D), we witness a rise of TXT

records of a single character. We discuss these events in

Section 4.1.3. In November 2015 (

B) and later in July

2016 (C), there is a sudden rise in the Base64 Encoded

MX records, which we discuss in Section 4.1.2.

4.1. Undeﬁned Purpose

4.1.1. BaseN Encoded Records. We observe that 8.17%

of all records are encoded with some form of BaseN, e.g.,

Base64. Portier et al. [18] suggested that a one source

of these records is a federation mechanism of Microsoft

Exchange Servers [22], which we indeed ﬁnd for 0.20%

of records in our dataset. Furthermore, we found a major

TABLE 4: DNS TXT record categories on 2018-12-31.

Label

# of Records % of Total Plot

All

Records 72,959,767 100.00%

E-mail 50,304,343

68.95%

SPF

49,656,480 68.06%

DKIM

310,823 0.43%

SenderID

200,991 0.28%

DMARC

118,928 0.16%

Mail

Keywords 17,121 0.02%

eriﬁcation 10,504,491 14.40%

erif. Keywords 10,504,491 14.40%

atterns 3,770,930 5.17%

at. Keywords 3,766,532 5.16%

at. Kwd. Begin 4,396 0.01%

JWT

>0.01%

Encoded 7,215,892

9.89%

BaseN

5,957,428 8.17%

Hash

1,256,263 1.72%

Account-hash

2,201

>0.01%

Crypto

Coins 89

>0.01%

AL 89

>0.01%

Miscellaneous 305,884

0.42%

Misc

Keywords 200,402 0.27%

HTTP

47,089 0.06%

Hosting

19,755 0.03%

Adv

ertising 15,074 0.02%

Google

13,899 0.02%

Domainstatus

9,665 0.01%

Other 858,138

1.18%

Unclassiﬁed

335,920 0.46%

Single

Char 278,540 0.38%

Base64

mail 228,672 0.31%

Empty

14,425 0.02%

mail 221

>0.01%

vascript 178

>0.01%

BEGIN

>0.01%

Pri

vkey 63

>0.01%

ecutables 22

>0.01%

Cmd

>0.01%

Figure 3: Evolution of selected categories of TXT records

over the span of three years.

Figure 4: AS numbers with Base64 encoded MX records.

CDN adding Base64 encoded records to zones pointing to

them (>0.01% of records in our dataset). After contacting

the CDN, they conﬁrmed that they added these records

to the zones, but did not disclose the purpose of these

records. We did not perform an in-depth investigation of

the remaining 7.96% of other BaseN encoded records.

4.1.2. Base64 Encoded MX Records. Of all TXT records

0.31% (228.672) fall in this category. This type of TXT

record has seen two sharp increases in use: ﬁrst on 2015-

11-27 (

B) when 14,039 records of this type were added,

and then between 2016-06-17 and 2016-07-03 (C) when

122,573 records were added.

When decoding these records we observe MX-record-

like patterns (priority, host) in 228,631 domains with such

a record, 99.86% (228,321) of which have an MX record,

yet none of these domains’ MX records matches the

decoded TXT record. Figure 4 shows the top ﬁve AS

numbers from which these records originate, with 93.33%

(213,088) coming from ‘NAMECHEAP’ (AS 22612).

These records may be used in an email system where

the MX address of a domain is obfuscated, i.e., through a

public MX record (the regular MX record), and a ‘private’

MX record Base64 encoded into the TXT record.

545

Figure 5: Single character TXT records over time.

Figure 6: AS numbers hosting ‘~’ TXT records.

4.1.3. Single Character TXT Records. From the start

of the measurement until 2015-09-03 (

A) the number

of records containing a single character has been sta-

ble around 61,000 records. On 2015-09-03 this number

increased to 239,302 records, and increases a day later

to 526,561 records. We note that two metrics change

signiﬁcantly due to massive addition of these records:

Firstly, the composition of characters changes. On 2015-

03-01 the most used character is ‘0’. On 2015-09-03

this changes to ‘~’. The records added on 2015-09-04

did not change the distribution, as mainly ‘~’ records

were added. Additionally, the distribution of characters

does not change signiﬁcantly afterwards, as shown in

Figure 5, which presents a snapshot of the distribution

of characters on the ﬁrst of every month. Between 2017-

08-02 and 2017-08-04 (

D ) 66,826 records consisting of a

single character have been added. These records mainly

consisted of ‘~’ (99.98%).

Due to major rise in the ‘~’ character we investigated

where these TXT records originated from. Figure 6 shows

that the majority (99.99%) of these ‘~’ records origi-

nates from the network of ‘CONFLUENCE-NETWORK-

INC’ (AS 40034), notorious for spreading malware and

having many of their IP addresses listed in numerous

blacklists [23]. Note, that on 2016-02-01, 2016-10-01, and

2016-12-01 these records also come from ‘NEUSTAR-

AS6’ (AS 19905), a DDoS protection service. Disregard-

ing the ‘~’ records, it is likely that single characters

records are the effect of typos when creating TXT records.

However, the ‘~’ records seems like a form of domain

identiﬁcation for a major network, as a large part (94.05%)

shares the same AS number and the domains point to

the same name servers. Both, in the case of ‘Base64

encoded MX record’ and ‘Single character TXT record’,

we speculate that this particular use of TXT records is

heavily guided by management and conﬁguration choices,

in which speciﬁc service providers or ASes use TXT

records as a way to tag their domains.

4.1.4. Other. Finally, we found empty TXT records and

records referencing executables. On 2018-12-31 we mea-

sured 14,425 empty TXT records, associated with 14,422

domains. These TXT records originate from various AS

numbers, with the top contender being ‘UNIFIEDLAYER-

AS-1’ (AS 46606) hosting 12.01% (1.885) of related

domains. These observations suggest this phenomenon is

the result of management mistakes, potentially caused by

improper removal of TXT records, or a default record

set for new domains. Empty records are unlikely to have

security implications.

(a) Domains with a single key (b) Domains with two keys

Figure 7: Statistics of domains with keys in their records.

We found three TXT records referencing executables

(on 2018-12-31). Two of these point to an URL of a

downloadable (Windows) executable. However, when try-

ing to access these URLs we get either a not found or

a permission denied error. The third record consists of

‘calc.exe’. This is interesting, as researchers commonly

use the execution of ‘calc.exe’ in Proof-of-Concept ex-

ploits against Windows systems. However, we were unable

to identify the speciﬁc use of the identiﬁed record.

4.2. Mistakes with a Security Implication

4.2.1. Certiﬁcates. The last day of our dataset contains

43 certiﬁcates and 17 certiﬁcate requests. We processed

each of these with openssl to verify if the records contain

valid certiﬁcates or certiﬁcate requests and found 16.28%

(7) certiﬁcates to be valid. The others were truncated, and

openssl marked them as invalid. Of the certiﬁcate requests,

29.41% (5) are valid. Worryingly, one of these certiﬁcate

requests included the private key.

85.71% (6) of the valid certiﬁcates contained refer-

ences of the domain where the TXT record originated

from. As for the certiﬁcate requests, those were all issued

for the domain where the TXT record originates from.

This tells us that most of these certiﬁcates and requests

are linked to the functioning of the domains.

While domain ownership veriﬁcation through DNS

TXT records to obtain a certiﬁcate is common (RFC

8555), performing certiﬁcate requests via TXT records

is not common practice. Concerning the certiﬁcate them-

selves, RFC 4398 proposed a speciﬁc CERT RR to store

certiﬁcates, but to our knowledge the effort has been aban-

doned, and the small number of certiﬁcates we observe

does not indicate that TXT is commonly used for this

purpose. While publishing certiﬁcates via TXT records

is not a security risk in itself, the fact that we found

private keys accompanying requests still suggests security-

relevant conﬁguration mistakes [24] are being made.

4.2.2. Public and Private Keys. We observed a rise of

TXT records containing public and private keys at the

end of 2018 (not including the aforementioned certiﬁcate

containing a private key). This is due to 28 domains which

have added key-pairs on 2018-10-19. At the end of the

measurement period, the number of domains exposing

keys has grown to 89. On the last day of our dataset (2018-

12-31), 60.67% (54) of the domains have a single key

in their records, and 39.33% (35) of domains have two

keys in their records. We have analyzed these keys and

classiﬁed them into different types, as shown in Figure 7.

Figure 7a shows the key type distribution of do-

mains with a single key. 55.56% (30) of these do-

mains publish a private key in their TXT records, while

24.07% (13) publish a public key. In 16.66% (9) of

546

the cases, the TXT records is used for DKIM which

includes the “----BEGIN RSA PRIVATE KEY----”

part. This means that not only the wrong key is being

published, but it also renders the DKIM record unusable.

One domain publishes a GPG public key through its

TXT record. Furthermore, there is one domain with a

certiﬁcate issue request, with accompanying private key,

as mentioned in Section 4.2.1.

Figure 7b shows the key type distribution of domains

with two keys in their TXT records. In 94.29% (33) of

the cases we found a matching public and private key

pair. One domain published the same private key in two

separate records, one with, and the other without, a leading

dash. One domain published two different public GPG

keys in its TXT record.

The fact that we observe a number of private keys is

worrying not just in itself, but also because the disclosure

of public-private key pairs in practice invalidates security

measures as forgery prevention using DKIM. For example,

if an adversary has access to a domain’s private key used

for DKIM signing, they can send emails in that domain’s

name with the receiving party assuming the origin of the

email is legitimate as it is signed with the correct private

key. Furthermore, the wrongly posted public keys at least

show a misunderstanding of the underlying security tech-

nology. We have notiﬁed the domain-holders who publish

private keys so they can mitigate this issue.

4.3. Malicious Use Cases

4.3.1. Commands. We investigated if the TXT records

in our dataset contain commands, speciﬁcally commands

with malicious intent. In our dataset there are six records

containing Command Line Interface (CLI) commands.

One record targets Windows with a command to kill

Internet Explorer, while the other commands target Linux.

Two of the Linux commands aim to test for the Shellshock

vulnerability [25], for example:

() { :;}; echo "shellshock.fail"

Further two records contain curl commands. The re-

maining command forces apt (the Linux package man-

ager) to retrieve packages over IPv4, after which it makes

sure curl is installed, proceeds to download a script from

runclound.io (with curl) and ﬁnally runs it in bash.

These kind of records may be used as (reliable) shell

script distribution, since DNS trafﬁc is rarely blocked,

while HTTP trafﬁc to speciﬁc websites may be dropped.

4.3.2. JavaScript. On the last day of our dataset 172

records contain JavaScript. The ten most common cat-

egories in these records are shown in Figure 8: 35.71%

(40) of these records are used to load additional JavaScript

ﬁles; 16.07% (18) contain JavaScript code for analyt-

ics purposes, and 16.07% (18) of these records refer-

ence Google Ads. Furthermore, these records are used in

15.18% (17) of cases to test if sites are vulnerable to

cross-site scripting, typically through calling ‘alert()’.

These types or records may be used to stealthily

insert JavaScript code into browsers [26]. As these are

dynamically inserted it is unlikely that they leave a long-

lasting trace.

Figure 8: Distribution of JavaScript on 2018-12-31.

$a=(new-object net.webclient);

$b=$Env:APPDATA;

$w=$Env:WINDIR;

$c=$b+\'//t.txt\';

$g=$b+\'//t.exe\';

$p=$w+\'//Microsoft.NET//Framework\';

if (gci -Path $p | where {$_.Name -like \'v4

\'}) {

try {$a.DownloadFile(\'https://filebin.ca/<CODE A>\', $c);

ren $c t.exe;

start $g }

catch {$a.DownloadFile(\'https://files.fm/down.php?i=<CODE B>\', $c);

ren $c t.exe; start $g }

}

else {

try {$a.DownloadFile(\'https://filebin.ca/<CODE C>\', $c);

ren $c t.exe;

start $g }

catch {$a.DownloadFile(\'https://files.fm/down.php?i=<CODE D>\', $c);

ren $c t.exe;

start $g }

};

sleep 180;

rm $g

Figure 9: Malicious PowerShell code.

4.3.3. PowerShell. Finally, we found one case of Power-

Shell code, see Figure 9 (the actual download URLs have

been replaced). This code was hosted by two domains

between 2017-06-20 and 2018-06-22. VirusTotal marked

the ﬁle as malicious. Interesting about the downloaded

executable is that it will install a scheduled task to perform

additional DNS lookups of the same TXT record, in

essence auto-updating itself via DNS. This behavior is

comparable with the DNSMessenger malware [7], which

gathers PowerShell payloads via DNS TXT records.

5. Ethical Considerations

While the TXT records used in this research are, techni-

cally, publicly available, we have taken care not to expose

information about the individuals, or companies, behind

the domains that might expose security vulnerabilities

caused by improper use of TXT records. This paper is

meant as a learning experience, showing the security

pitfalls related to TXT records, rather than blaming parties

for their “misconﬁguration.” We have notiﬁed the domain

owners with private keys in their TXT records and hope

these keys will be revoked and removed from the records.

6. Discussion and Conclusions

In this paper, we explore the unstructured tail of TXT

records to uncover uses of TXT records which might

have security implications. While analyzing the dataset

we became progressively aware of the pitfalls one will

encounter when attempting such a task, which we will

discuss in this section.

Our analysis, as well as work by Portier et al. [18],

show that the majority of TXT records belongs to well

deﬁned use cases. Our work builds on this observation

and we progressively remove clearly deﬁned categories,

an approach that allows us to classify 99.54% of the TXT

records in our dataset. For the remaining TXT records,

we have not been able to deﬁne clear categories, as these

547

remaining records are highly diverse, both semantically

and syntactically. Any further analysis of this category

would imply slow manual labor coupled with deep domain

knowledge with a likely low “return of investment” in

terms of identifying security-relevant records. Analyzing

the tail of the TXT records is therefore not only a needle

in the haystack problem, but it also becomes a human

intelligence problem.

Still, our quest has not been fruitless. The use of

public-private key pairs clearly points to ﬂaws in using

and understanding more sophisticated mechanisms such as

DKIM. Also, looking for code-speciﬁc regular expressions

brought to light examples of JavaScript injection and mal-

ware auto-updates, which indicate that the DNS is used as

a form of malicious code delivery. Albeit rare, ﬁnding such

samples enhances our understanding of malware behavior

and DNS misuse. With DoH we expect this behavior to

only increase: for instance, instead of relying on TXT

records to deliver miner executables [12], attackers could

also distribute cryptomining scripts in the browser [27].

Another major challenge is the lack of context. Our

active measurement dataset allows us to perform a wide

search of the tail of the TXT records. However, we are not

able to see how these records are used in practice, or if

they are associated with domain names used for malicious

activities. We believe that context information could be

provided, for example, by passive DNS data. Additionally,

active DNS measurements cannot look beyond known

labels (second level, in our case). The malicious activity

might be ‘hiding’ at lower levels.

Finally, we note that the remaining 99.54% of the

TXT records might not necessarily be secure. Our regular

expressions explicitly accommodate for typos which we

commonly see in the data (‘sfp’ instead of ‘spf’, for

example), a common issue in IT operations [24]. The

consequence of these human errors might be severe, since

they might lead to a false sense of security or, e.g., broken

email delivery. A large-scale quantiﬁcation of these types

of errors remains currently open for further analysis.

Mitigation techniques can be viewed in two ways, the

ﬁrst is how to prevent leakage of information through

errors. The most common practice here is consistently

monitoring the correctness of deployed records. This in-

cludes monitoring deployed records for potential informa-

tion leakage. The second part is preventing, e.g., payload

delivery via DNS. Especially with the rise of DoH, this be-

comes a major challenge for defenders in networks. While,

e.g., blocking TXT records in DoH implementations for

browsers might sound like a promising prospect at ﬁrst,

this only mitigates a part of the problem, as ultimately all

RRs might be abused for malware delivery via DoH.

To support further research in this direction, we provide

the 101 regular expressions we used for the classiﬁcation

of DNS TXT records on our website: https://www.

tide-project.nl/blog/wtmc2020.

References

[1] D. Lopera, “Necurs Spam uses DNS TXT Records for

Redirection,” https://www.trustwave.com/en-us/resources/

blogs/spiderlabs-blog/necurs-spam-uses-dns-txt-records-for-

redirection/, 2019.

[2] OpenINTEL, https://www.openintel.nl/.

[3] R. van Rijswijk-Deij, M. Jonker, A. Sperotto, and A. Pras, “A

High-Performance, Scalable Infrastructure for Large-Scale Active

DNS Measurements,” IEEE Journal on Selected Areas in Commu-

nications (JSAC), 2016.

[4] P. Papadopoulos, E. Athanasopoulos, A. A. Chariton, and E. P.

Markatos, “Where’s Wally? How to Privately Discover your

Friends on the Internet,” Proc. of the ACM Asia Conference on

Computer and Communications Security (ASIACCS), 2018.

[5] Y. Jin, M. Tomoishi, K. Fujikawa, and V. P. Kaﬂe, “A Lightweight

and Secure IoT Remote Monitoring Mechanism Using DNS with

Privacy Preservation,” in Proc. of the IEEE Consumer Communi-

cations & Networking Conference (CCNC), 2019.

[6] M. Anagnostopoulos, G. Kambourakis, P. Kopanos,

G. Louloudakis, and S. Gritzalis, “DNS Ampliﬁcation Attack

Revisited,” Computers & Security, 2013.

[7] E. Brumaghin and C. Grady, “Covert Channels and Poor Deci-

sions: The Tale of DNSMessenger,” https://blog.talosintelligence.

com/2017/03/dnsmessenger.html, 2017.

[8] C. J. Dietrich, C. Rossow, F. C. Freiling, H. Bos, M. v. Steen,

and N. Pohlmann, “On Botnets That Use DNS for Command

and Control,” in Proc. of the European Conference on Computer

Network Defense (EC2ND), 2011.

[9] J. White, “Pulling Back the Curtains on EncodedCommand

PowerShell Attacks,” https://unit42.paloaltonetworks.com/unit42-

pulling-back-the-curtains-on-encodedcommand-powershell-

attacks/, 2017.

[10] H. Ichise, Y. Jin, and K. Iida, “Analysis of DNS TXT Record Usage

and Consideration of Botnet Communication Detection,” IEICE

Transactions on Communications, 2018.

[11] C. Mullaney, “Morto worm sets a (DNS) record,”

https://www.symantec.com/connect/blogs/morto-worm-sets-

dns-record, 2011.

[12] Anomali Threat Research Team, “Illicit Cryptomining Threat

Actor Rocke Changes Tactics, Now More Difﬁcult to Detect,”

https://www.anomali.com/blog/illicit-cryptomining-threat-actor-

rocke-changes-tactics-now-more-difﬁcult-to-detect, 2019.

[13] H. Gao, V. Yegneswaran, J. Jiang, Y. Chen, P. Porras, S. Ghosh, and

H. Duan, “Reexamining DNS From a Global Recursive Resolver

Perspective,” IEEE/ACM Transactions on Networking (TON), 2016.

[14] D. Tatang, F. Quinkert, N. Dolecki, and T. Holz, “A Study of

Newly Observed Hostnames and DNS Tunneling in the Wild,”

https://arxiv.org/abs/1902.08454, 2019.

[15] F. Streibelt, J. Böttger, N. Chatzis, G. Smaragdakis, and A. Feld-

mann, “Exploring EDNS-Client-Subnet Adopters in your Free

Time,” in Proc. of the ACM Internet Measurement Conference

(IMC), 2013.

[16] T. Fiebig, K. Borgolte, S. Hao, C. Kruegel, and G. Vigna, “Some-

thing From Nothing (There): Collecting Global IPv6 Datasets From

DNS,” in Proc. of the International Conference on Passive and

Active Network Measurement (PAM), 2017.

[17] T. Fiebig, K. Borgolte, S. Hao, C. Kruegel, G. Vigna, and A. Feld-

mann, “In rDNS We Trust: Revisiting a Common Data-Source’s

Reliability,” in Proc. of the International Conference on Passive

and Active Network Measurement (PAM), 2018.

[18] A. Portier, H. Carter, and C. Lever, “Security In Plain TXT: Ob-

serving the Use of DNS TXT Records in the Wild,” in Proc. of the

International Conference on Detection of Intrusions & Malware,

and Vulnerability Assessment (DIMVA), 2019.

[19] DirectAdmin, “Sending mail to hotmail, yahoo and gmail,” https:

//help.directadmin.com/item.php?id=207, 2015.

[20] J. van Veen, “How to stop your webshop emails going to the

spam or junk folder,” https://joostvanveen.com/a-20/how-to-stop-

your-webshop-emails-going-to-the-spam-or-junk-folder.

[21] OpenINTEL, “Current Coverage,” https://www.openintel.nl/

coverage/.

[22] Microsoft, “Exchange Server 2013: Conﬁgure a federation trust,”

https://docs.microsoft.com/en-us/exchange/conﬁgure-a-federation-

trust-exchange-2013-help.

[23] HostExploit, “World Host Report,” http://hostexploit.com/

downloads/world_hosts_report_201403.pdf, 2014.

[24] C. Dietrich, K. Krombholz, K. Borgolte, and T. Fiebig, “Investigat-

ing System Operators’ Perspective on Security Misconﬁgurations,”

in Proc. of the ACM Conference on Computer and Communications

Security (CCS), 2018.

[25] J. T. Bennett, “Shellshock in the Wild,” https://www.ﬁreeye.com/

blog/threat-research/2014/09/shellshock-in-the-wild.html, 2014.

[26] SkullSecurity, “Stufﬁng Javascript into DNS names,”

https://blog.skullsecurity.org/2010/stufﬁng-javascript-into-dns-

names, 2010.

[27] R. K. Konoth, E. Vineti, V. Moonsamy, M. Lindorfer, C. Kruegel,

H. Bos, and G. Vigna, “Minesweeper: An in-depth look into drive-

by cryptocurrency mining and its defense,” in Proc. of the ACM

Conference on Computer and Communications Security (CCS),

2018.

548