Sqawk:SQL 与 Awk 的融合:将 SQL 应用于文本型数据文件
Sqawk: A fusion of SQL and Awk: Applying SQL to text-based data files

原始链接: https://github.com/jgarzik/sqawk

Sqawk是一个命令行SQL工具,用于处理分隔符分隔的文件,例如CSV和TSV,类似于`awk`。它将数据加载到内存表中,允许你直接对文件内容执行SQL查询。你可以使用`sqawk -s "SELECT * FROM data" data.csv`来查询data.csv。 Sqawk支持标准的SQL操作,例如SELECT、WHERE过滤、UPDATE和DELETE。使用`--write`将修改保存回原始文件。它还可以顺序处理多个SQL语句。 主要功能包括使用`DISTINCT`查找唯一值,使用`INNER JOIN`连接表,以及使用`-F`指定不同的分隔符。例如,`sqawk -F '\t' -s "SELECT * FROM employees" employees.tsv`处理制表符分隔的文件。可以在命令中添加`-v`选项以启用详细日志记录。Sqawk采用MIT许可证授权。

Hacker News users are discussing "Sqawk," a new tool that applies SQL to text-based data files. The initial reaction is mixed, with some expressing disappointment that it's not an Awk script as the name might imply, but rather a Rust-based implementation similar to SQLite or DuckDB, loading the entire dataset into memory. Users point out existing alternatives like csvkit, SQLite's virtual table loading for CSV files, DuckDB's CSV overview, and PostgreSQL's \COPY command. Some jokingly suggest extending Sqawk to support hieroglyphics or ascii animations. The overall sentiment suggests Sqawk may not offer significant advantages over existing solutions for querying text-based data.

原文

Crates.io Docs.rs MIT licensed

Sqawk is an SQL-based command-line tool for processing delimiter-separated files (CSV, TSV, etc.), inspired by the classic awk command. It loads data into in-memory tables, executes SQL queries against these tables, and writes the results back to the console or files.

sqawk -s "SELECT * FROM data" data.csv

This loads data.csv into an in-memory table called "data" and performs a SELECT query.

Filtering data with WHERE clause

sqawk -s "SELECT * FROM employees WHERE salary > 50000" employees.csv
sqawk -s "UPDATE data SET status = 'active' WHERE id = 5" data.csv --write

This updates the status field to 'active' for rows with id = 5 and saves the changes back to data.csv.

sqawk -s "DELETE FROM data WHERE id = 5" data.csv --write

This removes rows with id = 5 and saves the changes back to data.csv.

sqawk -s "UPDATE data SET status = 'inactive' WHERE last_login < '2023-01-01'" -s "DELETE FROM data WHERE status = 'inactive' AND last_login < '2022-01-01'" -s "SELECT * FROM data" data.csv --write

This executes multiple SQL statements in sequence: first marking recent inactive accounts, then removing very old inactive accounts, and finally showing the results.

sqawk -s "SELECT * FROM users" -s "SELECT * FROM orders" users.csv orders.csv

Finding unique values with DISTINCT

# Get unique values from a single column
sqawk -s "SELECT DISTINCT category FROM products" products.csv

# Get unique combinations of columns
sqawk -s "SELECT DISTINCT department, role FROM employees" employees.csv

# Use DISTINCT with ORDER BY for sorted unique values
sqawk -s "SELECT DISTINCT region FROM customers ORDER BY region" customers.csv

Join tables with INNER JOIN

# Join users and orders using INNER JOIN with ON condition
sqawk -s "SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id" users.csv orders.csv

# Join with additional WHERE filtering
sqawk -s "SELECT users.name, orders.product_id, orders.date FROM users INNER JOIN orders ON users.id = orders.user_id WHERE orders.product_id > 100" users.csv orders.csv

# Using DISTINCT with JOINs to find unique customer-product pairs
sqawk -s "SELECT DISTINCT users.name, products.name FROM users INNER JOIN orders ON users.id = orders.user_id INNER JOIN products ON orders.product_id = products.product_id" users.csv orders.csv products.csv
# Process a tab-delimited file (TSV)
sqawk -F '\t' -s "SELECT * FROM employees WHERE salary > 70000" employees.tsv

# Process a colon-delimited file
sqawk -F ':' -s "SELECT id, name, email FROM contacts" contacts.txt
sqawk -s "SELECT * FROM data WHERE value > 100" data.csv -v
sqawk -s "DELETE FROM data WHERE status = 'expired'" data.csv --write

By default, sqawk doesn't modify input files. Use the --write flag to save changes back to the original files.

For more detailed information, see:

Licensed under the MIT License (LICENSE or http://opensource.org/licenses/MIT).

Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion in the work by you shall be licensed as MIT, without any additional terms or conditions.

联系我们 contact @ memedata.com